WO2021042331A1 - Methods and apparatus for graphics and display pipeline management - Google Patents

Methods and apparatus for graphics and display pipeline management Download PDF

Info

Publication number
WO2021042331A1
WO2021042331A1 PCT/CN2019/104557 CN2019104557W WO2021042331A1 WO 2021042331 A1 WO2021042331 A1 WO 2021042331A1 CN 2019104557 W CN2019104557 W CN 2019104557W WO 2021042331 A1 WO2021042331 A1 WO 2021042331A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixels
subgroups
subgroup
release time
display
Prior art date
Application number
PCT/CN2019/104557
Other languages
French (fr)
Inventor
Bo Du
Yongjun XU
Nan Zhang
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Priority to PCT/CN2019/104557 priority Critical patent/WO2021042331A1/en
Publication of WO2021042331A1 publication Critical patent/WO2021042331A1/en

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/363Graphics controllers
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2310/00Command of the display device
    • G09G2310/08Details of timing specific for flat panels, other than clock recovery
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2360/00Aspects of the architecture of display systems
    • G09G2360/12Frame memory handling
    • G09G2360/122Tiling
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2360/00Aspects of the architecture of display systems
    • G09G2360/18Use of a frame buffer in a display terminal, inclusive of the display panel

Definitions

  • the present disclosure relates generally to processing systems and, more particularly, to one or more techniques for display or graphics processing.
  • GPUs graphics processing unit
  • Such computing devices may include, for example, computer workstations, mobile phones such as so-called smartphones, embedded systems, personal computers, tablet computers, and video game consoles.
  • GPUs execute a graphics processing pipeline that includes one or more processing stages that operate together to execute graphics processing commands and output a frame.
  • a central processing unit may control the operation of the GPU by issuing one or more graphics processing commands to the GPU.
  • Modern day CPUs are typically capable of concurrently executing multiple applications, each of which may need to utilize the GPU during execution.
  • An electronic device may execute a program to present graphics content on a display.
  • an electronic device may execute a user interface application, video game application, and the like.
  • the apparatus may be a display processing unit (DPU) , a display engine, a GPU, a CPU, or some other processor for display or graphics processing.
  • the apparatus can determine display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels.
  • the apparatus can also calculate at least one synchronization divider for each of the one or more subgroups of pixels. Further, the apparatus can synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels.
  • the apparatus can determine a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels. Also, the apparatus can send or receive at least one signal corresponding to each of the one or more subgroups of pixels. The apparatus can also release a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
  • FIG. 1 is a block diagram that illustrates an example content generation system in accordance with one or more techniques of this disclosure.
  • FIG. 2 illustrates an example GPU in accordance with one or more techniques of this disclosure.
  • FIG. 3 illustrates an example diagram in accordance with one or more techniques of this disclosure.
  • FIG. 4 illustrates another example diagram in accordance with one or more techniques of this disclosure.
  • FIG. 5 illustrates another example diagram in accordance with one or more techniques of this disclosure.
  • FIG. 6 illustrates an example frame and timeline, respectively, in accordance with one or more techniques of this disclosure.
  • FIG. 7 illustrates an example frame and timeline, respectively, in accordance with one or more techniques of this disclosure.
  • FIG. 8 illustrates an example flowchart of an example method in accordance with one or more techniques of this disclosure.
  • processors include microprocessors, microcontrollers, graphics processing units (GPUs) , general purpose GPUs (GPGPUs) , central processing units (CPUs) , application processors, digital signal processors (DSPs) , reduced instruction set computing (RISC) processors, systems-on-chip (SOC) , baseband processors, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , programmable logic devices (PLDs) , state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure.
  • processors include microprocessors, microcontrollers, graphics processing units (GPUs) , general purpose GPUs (GPGPUs) , central processing units (CPUs) , application processors, digital signal processors (DSPs) , reduced instruction set computing (RISC) processors, systems-on-chip (SOC) , baseband processors, application specific integrated circuits (ASICs) ,
  • One or more processors in the processing system may execute software.
  • Software can be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
  • the term application may refer to software.
  • one or more techniques may refer to an application, i.e., software, being configured to perform one or more functions.
  • the application may be stored on a memory, e.g., on-chip memory of a processor, system memory, or any other memory.
  • Hardware described herein such as a processor may be configured to execute the application.
  • the application may be described as including code that, when executed by the hardware, causes the hardware to perform one or more techniques described herein.
  • the hardware may access the code from a memory and execute the code accessed from the memory to perform one or more techniques described herein.
  • components are identified in this disclosure.
  • the components may be hardware, software, or a combination thereof.
  • the components may be separate components or sub-components of a single component.
  • the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer.
  • such computer-readable media can comprise a random access memory (RAM) , a read-only memory (ROM) , an electrically erasable programmable ROM (EEPROM) , optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable ROM
  • optical disk storage magnetic disk storage
  • magnetic disk storage other magnetic storage devices
  • combinations of the aforementioned types of computer-readable media or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
  • this disclosure describes techniques for having a graphics processing pipeline in a single device or multiple devices, improving the rendering of graphical content, and/or reducing the load of a processing unit, i.e., any processing unit configured to perform one or more techniques described herein, such as a GPU.
  • a processing unit i.e., any processing unit configured to perform one or more techniques described herein, such as a GPU.
  • this disclosure describes techniques for graphics processing in any device that utilizes graphics processing. Other example benefits are described throughout this disclosure.
  • instances of the term “content” may refer to “graphical content, ” “image, ” and vice versa. This is true regardless of whether the terms are being used as an adjective, noun, or other parts of speech.
  • the term “graphical content” may refer to a content produced by one or more processes of a graphics processing pipeline.
  • the term “graphical content” may refer to a content produced by a processing unit configured to perform graphics processing.
  • the term “graphical content” may refer to a content produced by a graphics processing unit.
  • the term “display content” may refer to content generated by a processing unit configured to perform displaying processing.
  • the term “display content” may refer to content generated by a display processing unit.
  • Graphical content may be processed to become display content.
  • a graphics processing unit may output graphical content, such as a frame, to a buffer (which may be referred to as a framebuffer) .
  • a display processing unit may read the graphical content, such as one or more frames from the buffer, and perform one or more display processing techniques thereon to determine display content and/or generate display content.
  • a display processing unit may be configured to perform composition on one or more rendered layers to generate a frame.
  • a display processing unit may be configured to compose, blend, or otherwise combine two or more layers together into a single frame.
  • a display processing unit may be configured to perform scaling, e.g., upscaling or downscaling, on a frame.
  • a frame may refer to a layer.
  • a frame may refer to two or more layers that have already been blended together to form the frame, i.e., the frame includes two or more layers, and the frame that includes two or more layers may subsequently be blended.
  • FIG. 1 is a block diagram that illustrates an example content generation system 100 configured to implement one or more techniques of this disclosure.
  • the content generation system 100 includes a device 104.
  • the device 104 may include one or more components or circuits for performing various functions described herein.
  • one or more components of the device 104 may be components of an SOC.
  • the device 104 may include one or more components configured to perform one or more techniques of this disclosure.
  • the device 104 may include a processing unit 120, and a system memory 124.
  • the device 104 can include a number of optional components, e.g., a communication interface 126, a transceiver 132, a receiver 128, a transmitter 130, a display processor 127, and one or more displays 131.
  • the display 131 may refer to the one or more displays 131.
  • the display 131 may include a single display or multiple displays.
  • the display 131 may include a first display and a second display.
  • the first display may be a left-eye display and the second display may be a right-eye display.
  • the first and second display may receive different frames for presentment thereon.
  • the first and second display may receive the same frames for presentment thereon.
  • the results of the graphics processing may not be displayed on the device, e.g., the first and second display may not receive any frames for presentment thereon. Instead, the frames or graphics processing results may be transferred to another device. In some aspects, this can be referred to as split-rendering.
  • the processing unit 120 may include an internal memory 121.
  • the processing unit 120 may be configured to perform graphics processing, such as in a graphics processing pipeline 107.
  • the device 104 may include a display processor, such as the display processor 127, to perform one or more display processing techniques on one or more frames generated by the processing unit 120 before presentment by the one or more displays 131.
  • the display processor 127 may be configured to perform display processing.
  • the display processor 127 may be configured to perform one or more display processing techniques on one or more frames generated by the processing unit 120.
  • the one or more displays 131 may be configured to display or otherwise present frames processed by the display processor 127.
  • the one or more displays 131 may include one or more of: a liquid crystal display (LCD) , a plasma display, an organic light emitting diode (OLED) display, a projection display device, an augmented reality display device, a virtual reality display device, a head-mounted display, or any other type of display device.
  • LCD liquid crystal display
  • OLED organic light emitting diode
  • a projection display device an augmented reality display device, a virtual reality display device, a head-mounted display, or any other type of display device.
  • Memory external to the processing unit 120 may be accessible to the processing unit 120.
  • the processing unit 120 may be configured to read from and/or write to external memory, such as the system memory 124.
  • the processing unit 120 may be communicatively coupled to the system memory 124 over a bus. In some examples, the processing unit 120 may be communicatively coupled to each other over the bus or a different connection.
  • the internal memory 121 or the system memory 124 may include one or more volatile or non-volatile memories or storage devices.
  • internal memory 121 or the system memory 124 may include RAM, SRAM, DRAM, erasable programmable ROM (EPROM) , electrically erasable programmable ROM (EEPROM) , flash memory, a magnetic data media or an optical storage media, or any other type of memory.
  • the internal memory 121 or the system memory 124 may be a non-transitory storage medium according to some examples.
  • the term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that internal memory 121 or the system memory 124 is non-movable or that its contents are static. As one example, the system memory 124 may be removed from the device 104 and moved to another device. As another example, the system memory 124 may not be removable from the device 104.
  • the processing unit 120 may be a central processing unit (CPU) , a graphics processing unit (GPU) , a general purpose GPU (GPGPU) , or any other processing unit that may be configured to perform graphics processing.
  • the processing unit 120 may be integrated into a motherboard of the device 104.
  • the processing unit 120 may be present on a graphics card that is installed in a port in a motherboard of the device 104, or may be otherwise incorporated within a peripheral device configured to interoperate with the device 104.
  • the processing unit 120 may include one or more processors, such as one or more microprocessors, GPUs, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , arithmetic logic units (ALUs) , digital signal processors (DSPs) , discrete logic, software, hardware, firmware, other equivalent integrated or discrete logic circuitry, or any combinations thereof. If the techniques are implemented partially in software, the processing unit 120 may store instructions for the software in a suitable, non-transitory computer-readable storage medium, e.g., internal memory 121, and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered to be one or more processors.
  • processors such as one or more microprocessors, GPUs, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , arithmetic logic units (A
  • the content generation system 100 can include an optional communication interface 126.
  • the communication interface 126 may include a receiver 128 and a transmitter 130.
  • the receiver 128 may be configured to perform any receiving function described herein with respect to the device 104. Additionally, the receiver 128 may be configured to receive information, e.g., eye or head position information, rendering commands, or location information, from another device.
  • the transmitter 130 may be configured to perform any transmitting function described herein with respect to the device 104. For example, the transmitter 130 may be configured to transmit information to another device, which may include a request for content.
  • the receiver 128 and the transmitter 130 may be combined into a transceiver 132. In such examples, the transceiver 132 may be configured to perform any receiving function and/or transmitting function described herein with respect to the device 104.
  • the graphics processing pipeline 107 may include a determination component 198 configured to determine display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels.
  • the determination component 198 can also be configured to calculate at least one synchronization divider for each of the one or more subgroups of pixels. Additionally, the determination component 198 can be configured to synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels.
  • the determination component 198 can also be configured to determine a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels.
  • the determination component 198 can also be configured to send or receive at least one signal corresponding to each of the one or more subgroups of pixels. Moreover, the determination component 198 can be configured to release a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
  • a device such as the device 104, may refer to any device, apparatus, or system configured to perform one or more techniques described herein.
  • a device may be a server, a base station, user equipment, a client device, a station, an access point, a computer, e.g., a personal computer, a desktop computer, a laptop computer, a tablet computer, a computer workstation, or a mainframe computer, an end product, an apparatus, a phone, a smart phone, a server, a video game platform or console, a handheld device, e.g., a portable video game device or a personal digital assistant (PDA) , a wearable computing device, e.g., a smart watch, an augmented reality device, or a virtual reality device, a non-wearable device, a display or display device, a television, a television set-top box, an intermediate network device, a digital media player, a video streaming device, a content streaming device, an in-car
  • PDA personal digital
  • GPUs can process multiple types of data or data packets in a GPU pipeline.
  • a GPU can process two types of data or data packets, e.g., context register packets and draw call data.
  • a context register packet can be a set of global state information, e.g., information regarding a global register, shading program, or constant data, which can regulate how graphics context will be processed.
  • context register packets can include information regarding a color format.
  • Context states can be utilized to determine how an individual processing unit functions, e.g., a vertex fetcher (VFD) , a vertex shader (VS) , a shader processor, or a geometry processor, and/or in what mode the processing unit functions.
  • GPUs can use context registers and programming data.
  • a GPU can generate a workload, e.g., a vertex or pixel workload, in the pipeline.
  • FIG. 2 illustrates an example GPU 200 in accordance with one or more techniques of this disclosure.
  • GPU 200 includes command processor (CP) 210, draw call data packets 212, VFD 220, VS 222, vertex cache (VPC) 224, triangle setup engine (TSE) 226, rasterizer (RAS) 228, Z process engine (ZPE) 230, pixel interpolator (PI) 232, fragment shader (FS) 234, render backend (RB) 236, L2 cache (UCHE) 238, and system memory 240.
  • FIG. 2 displays that GPU 200 includes processing units 220-238, GPU 200 can include a number of additional processing units. Additionally, processing units 220-238 are merely an example and any combination or order of processing units can be used by GPUs according to the present disclosure.
  • GPU 200 also includes command buffer 250, context register packets 260, and context states 261.
  • a GPU can utilize a CP, e.g., CP 210, or hardware accelerator to parse a command buffer into context register packets, e.g., context register packets 260, and/or draw call data packets, e.g., draw call data packets 212.
  • the CP 210 can then send the context register packets 260 or draw call data packets 212 through separate paths to the processing units or blocks in the GPU.
  • the command buffer 250 can alternate different states of context registers and draw calls.
  • a command buffer can be structured as follows: context register of context N, draw call (s) of context N, context register of context N+1, and draw call (s) of context N+1.
  • aspects of mobile devices or smart phones can utilize buffer mechanisms to distribute or coordinate a buffer between an application rendering side of the device, e.g., a GPU or CPU, and a display or composition side of the device, e.g., a display engine.
  • some mobile devices can utilize a buffer queue mechanism to distribute or coordinate a buffer between an application rendering side and a display or composition side, which can include a buffer compositor, e.g., a surface flinger (SF) or hardware composer (HWC) .
  • the application rendering side can be referred to as a producer, while the display or composition side can be referred to as a consumer.
  • a synchronization divider or fence can be used to synchronize content between the application rendering side and the display or composition side. Accordingly, a fence can be referred to as a synchronization divider, and vice versa.
  • FIG. 3 illustrates diagram 300 in accordance with one or more techniques of this disclosure.
  • diagram 300 includes producer 310, buffer queue mechanism 320 and consumer 330. More specifically, FIG. 3 illustrates how the buffer queue mechanism 320 helps to distribute or coordinate the buffers between the producer 310 and the consumer 330.
  • the producer 310 is the application side, which produces or renders the content for display.
  • the consumer 330 is the display or composition side, which displays the content on the user device.
  • the buffer queue mechanism 320 is used to coordinate the buffers between the producer and consumer sides.
  • the buffers can include multiple states at the producer 310, buffer queue mechanism 320, and the consumer 330.
  • the buffer state can be “dequeued” at the producer 310.
  • the producer 310 can obtain a buffer from the user device to produce content.
  • the buffer Prior to rendering content, the buffer is referred to as a dequeueBuffer.
  • the dequeueBuffer can transfer a free buffer from the buffer queue with a synchronization divider or fence.
  • the synchronization divider or fence can be the same as the releaseBuffer on the consumer 330 side.
  • the GPU driver or kernel graphics support layer (kgsl) driver may wait for the synchronization divider or fence to be signaled before it can access the buffer for rendering.
  • the queued buffer can be sent to the buffer queue mechanism 320, which changes the buffer state to “queued” at the buffer queue mechanism 320.
  • the buffer can be referred to as a queueBuffer.
  • a new rendered buffer with a synchronization divider or fence can be sent to the buffer queue.
  • the synchronization divider or fence can be generated by the GPU driver or kgsl driver at the producer 310. Additionally, the synchronization divider or fence can be signaled by the GPU or kgsl driver when the frame rendering commands are completed.
  • the buffer when a new frame is displayed, the buffer is in an “acquired” state. For instance, the buffer is acquiring new content, so it is in an acquired state.
  • the buffer can be referred to as an acquireBuffer.
  • the consumer 330 or display engine can utilize the new buffer for composition to display content.
  • the display content can be determined and/or generated.
  • the acquireBuffer state can send a queued buffer with a synchronization divider or fence to buffer queue. In some aspects, this synchronization divider or fence can be the same as the queueBuffer at the producer 310 side.
  • the display driver may wait for the synchronization divider or fence to be signaled before it accesses the buffer for a new composition.
  • the consumer 330 can release the buffer, which changes the buffer state to “free. ”
  • the buffer can be referred to as a releaseBuffer.
  • a new released buffer with a synchronization divider or fence can be sent to the buffer queue mechanism 320.
  • the synchronization divider or fence can be generated by the display driver at the consumer 330.
  • the display driver can signal when the buffer completes composition and/or display at the consumer 330. As illustrated in FIG. 3, the aforementioned steps manifest how a buffer is recycled and utilized in different states at the producer 310, the buffer queue mechanism 320, and the consumer 330.
  • a buffer can include a synchronization divider or fence.
  • a fence is a synchronization event or method, such as an alignment point or divider utilized for synchronizing content between two different applications, e.g., a producer and a consumer.
  • a fence can synchronize the producer or application side with the consumer or display side.
  • a fence can inform two different components or applications regarding when to synchronize. For example, if there are two different execution components in an application, they may be synchronized at a synchronization divider or fence.
  • the synchronization divider or fence can be a synchronization method in software or hardware. As such, at the synchronization divider or fence, a component can receive a signal from another component, or send a signal to another component.
  • FIG. 4 illustrates diagram 400 in accordance with one or more techniques of this disclosure.
  • diagram 400 includes queueBuffer 401, layerBuffer 411, fence 421, queueBuffer 402, layerBuffer 412, fence 422, queueBuffer N, layerBuffer N, and fence N.
  • FIG. 4 illustrates a buffer and corresponding synchronization divider or fence that is generated and managed by a producer or application side, e.g., a GPU or CPU.
  • the synchronization divider or fence can be used to synchronize with the consumer or hardware composer side, e.g., a display engine.
  • FIG. 4 shows the buffer or fence usage between a GPU or rendering side and a display engine or composition side.
  • a layerBuffer update e.g., layerBuffer 411, layerBuffer 412, or layerBuffer N
  • each fence e.g., fence 421, fence 422, fence N.
  • layerBuffer 411 can be sent with fence 421, layerBuffer 412 can be sent with fence 422, and layerBuffer N can be sent with fence N.
  • the fences e.g., fence 421, fence 422, fence N, may need to wait in the display engine driver.
  • these fences can be managed by the producer or GPU side, so the GPU driver may need to signal the fences before the hardware composer or display engine can begin composition at the display side.
  • FIG. 5 illustrates diagram 500 in accordance with one or more techniques of this disclosure. More specifically, FIG. 5 illustrates a buffer and corresponding synchronization divider or fence that is generated and managed by a consumer or hardware composer side, e.g., a display engine.
  • the synchronization divider or fence can be used to synchronize with the producer or application side, e.g., a GPU or CPU.
  • the hardware composer or display engine can generate the releaseBuffer.
  • the hardware composer or display engine can generate the frameBuffer fence.
  • the fence can be stored in the GPU or kgsl driver.
  • the GPU can render or utilize the buffer and corresponding fence.
  • the buffer and corresponding fence in FIG. 5 can be generated and managed by the hardware composer or display engine side and be used to synchronize with the GPU or kgsl driver side. As such, the hardware composer or display engine may need to signal the fence before the GPU or CPU can begin rendering content. As shown in FIGs. 4 and 5, one buffer and fence can be generated and managed by the producer or application side, and be utilized at the consumer or display side for the synchronization. Additionally, another fence can be generated and managed by the consumer or display side, and be utilized at the producer or application side.
  • the synchronization divider or fence that is controlled and managed by the producer or application side and used by display side can be referred to as an acquired fence.
  • the synchronization divider or fence that is controlled and managed by the display side and used by the producer or application side can be referred to as a release fence.
  • these synchronization dividers or fences can be utilized by an application side and a display side at a user device.
  • the synchronization dividers or fences can also be utilized at a server.
  • each application process may have its own GPU context or kgsl fence timeline.
  • a GPU or kgsl driver may immediately signal a synchronization divider or fence when the associated frame rendering is completed at the GPU or CPU.
  • the display engine can be a standalone hardware component for the application process, such that it may have a display engine fence timeline. As such, the display engine driver may signal the fence when the frame composition and display processes are completed at the display side.
  • each subgroup of pixels or layers in a frame within a larger group of pixels may be displayed on a screen, e.g., at the display side of a user device.
  • Each layer or subgroup of pixels may cover a portion of the frame or screen.
  • one layer or subgroup of pixels may cover half the frame or screen.
  • each of the subgroup of pixels or layers may release their corresponding fence or synchronization divider at the same time, e.g., at a release time.
  • the buffer corresponding to the layer or subgroup of pixels may be released or cleared.
  • the layers or subgroups of pixels can signal their release at the same time, e.g., when the whole frame composition is completed.
  • the timing in the display driver code can correspond to when the display engine kernel driver receives a frame.
  • the display side can include the following code for signaling the release of a synchronization divider or fence:
  • FIG. 6 illustrates frame or screen 600 and timeline 650.
  • frame or screen 600 includes pixel subgroup or layer 602, pixel subgroup or layer 604, pixel subgroup or layer 606, length 612, length 614, length 616, synchronization divider or fence 622 for layer 602, synchronization divider or fence 624 for layer 604, and synchronization divider or fence 626 for layer 606.
  • timeline 650 includes t 0 660, t 1 661, t 2 662, t 3 663, and release time 670. As illustrated in FIG.
  • t 0 660 corresponds to the start of the composition or rendering of pixel subgroup 602
  • t 1 661 corresponds to synchronization divider 622
  • t 2 662 corresponds to synchronization divider 624
  • t 3 663 corresponds to synchronization divider 626 and release time 670.
  • FIG. 6 shows frame 600 of a device operating in horizontal mode with three different subgroups or layers, e.g., pixel subgroup 602, pixel subgroup 604, and pixel subgroup 606.
  • pixel subgroup 602 covers one half of frame 600
  • pixel subgroup 604 covers the other half of frame 600
  • pixel subgroup 606 is the status bar of frame 600.
  • the composition or rendering of pixel subgroup 602 is between t 0 660 and t 1 661, e.g., 0 ms to 7 ms.
  • synchronization divider 622 corresponds to t 1 661, e.g., 7 ms.
  • the composition or rendering of pixel subgroup 604 is between t 1 661 and t 2 662, e.g., 7 ms to 14 ms.
  • synchronization divider 624 corresponds to t 2 662, e.g., 14 ms.
  • the composition or rendering of pixel subgroup 606 is between t 2 662 to t 3 663, e.g., 14 ms to 14.4 ms.
  • synchronization divider 626 corresponds to t 3 663, e.g., 14.4 ms.
  • release time 670 can occur at t 3 663.
  • Release time 670 can signal the release of each of the synchronization dividers, e.g., synchronization dividers 622, 624, 626. Accordingly, release time 670 can signal the release of all the fences at the same time.
  • the display engine can inform the application side that the particular subgroup of pixels is no longer needed. Accordingly, the corresponding buffer can be released or cleared. Once the buffer is released or cleared, the GPU or CPU at the application side can reuse this buffer for another task.
  • a fence when a fence is generated it can be in an active state. After the rendering or composition task is completed, the GPU or display driver can signal the release of the fence. In some aspects, if one side is using a buffer, then it may not be used by the other side. For example, if a GPU is using a buffer, then the display may not use the buffer. Also, if a display is using a buffer, then the GPU may not use the buffer.
  • this can cause a backup in processing for the device. In turn, this can result in wasting both processing time and energy, as well as power utilized. For instance, this can waste GPU or CPU processing cycles and may cause unexpected wait times at the GPU or CPU before a new frame is rendered. This can introduce a task overload or processing issues for some applications, e.g., janks or interruptions in processing.
  • each subgroup of pixels or layers in a frame is released individually, rather than all at once, then processing time can be saved. This can result in more time for synchronization between the display and the GPU or CPU.
  • each subgroup of pixels or layers can include its own synchronization divider or fence with its own release time. Therefore, each subgroup or layer can use a separate fence and release time once it completes composition or rendering. By doing so, the corresponding buffer for each pixel subgroup or layer can be released or cleared individually.
  • aspects of the present disclosure can include individual display engine or application side fence release signal times, which can reduce latency in the graphics or display pipeline. So each release time for each layer can be signaled when the composition or rendering is completed for that layer. As such, each layer or pixel subgroup can signal the release of its corresponding fence.
  • each layer or subgroup of pixels can have its own synchronization event and/or release time. This can occur at the display side, e.g., when synchronizing with the application side, as well as at the application side, e.g., when synchronizing with the display side.
  • the subgroup of pixels or layers can overlap or blend with other subgroups or layers, such that a fence for one layer may overlap with a portion of another subgroup or layer.
  • FIG. 7 illustrates frame 700 and timeline 750.
  • frame or screen 700 includes pixel subgroup or layer 702, pixel subgroup or layer 704, pixel subgroup or layer 706, length 712, length 714, length 716, synchronization divider or fence 722 for layer 702, synchronization divider or fence 724 for layer 704, and synchronization divider or fence 726 for layer 706.
  • timeline 750 includes t 0 760, t 1 761, t 2 762, t 3 763, release time 772, release time 774, and release time 776. As illustrated in FIG.
  • t 0 760 corresponds to the start of the composition or rendering of pixel subgroup 702
  • t 1 761 corresponds to synchronization divider 722 and release time 772
  • t 2 762 corresponds to synchronization divider 724 and release time 774
  • t 3 763 corresponds to synchronization divider 726 and release time 776.
  • FIG. 7 shows frame 700 of a device operating in horizontal mode with three different subgroups or layers, e.g., pixel subgroup 702, pixel subgroup 704, and pixel subgroup 706.
  • pixel subgroup 702 covers one half of frame 700
  • pixel subgroup 704 covers the other half of frame 700
  • pixel subgroup 706 can be the status bar of frame 700.
  • the composition or rendering of pixel subgroup 702 is between t 0 760 and t 1 761, e.g., 0 ms to 7 ms.
  • synchronization divider 722 can correspond to t 1 761, e.g., 7 ms.
  • release time 772 can occur at t 1 761, e.g., 7 ms.
  • the composition or rendering of pixel subgroup 704 is between t 1 761 and t 2 762, e.g., 7 ms to 14 ms.
  • synchronization divider 724 can correspond to t 2 762, e.g., 14 ms.
  • Release time 774 can also occur at t 2 762, e.g., 14 ms.
  • the composition or rendering of pixel subgroup 706 is between t 2 762 to t 3 763, e.g., 14 ms to 14.4 ms.
  • synchronization divider 726 can correspond to t 3 763, e.g., 14.4 ms.
  • Release time 774 can also occur at t 3 763, e.g., 14.4 ms.
  • pixel subgroups 702, 704, and 706 can each include their own release time, e.g., release time 772, 774, and 776, respectively. Accordingly, the synchronization dividers 722, 724, and 726 can be released individually, rather than all at the same time. Therefore, the buffers corresponding to pixel subgroups 702, 704, and 706 can each be released or cleared individually and at release times, 772, 774, and 776, respectively. As mentioned above, by doing so, aspects of the present disclosure can reduce the graphics or display pipeline latency.
  • composition or rendering time can be better utilized by the display engine or GPU/CPU. For instance, both the performance and power of composition or rendering can be improved. Indeed, by releasing the synchronization dividers or fences for each pixel subgroup individually, this allows the display engine or GPU/CPU to perform tasks individually, as opposed to having to perform multiple tasks at the same time. In turn, this can reduce the display or graphics processing pipeline latency.
  • the rendering or composition task may have a longer time to process, which can reduce pipeline latency.
  • the GPU side before a rendering command can be executed, it may need to wait for the buffer to be released. So if the synchronization divider or fence is released individually, then the rendering command can have more time to execute. This can correspond to an improvement in GPU frame rate or frames per second (FPS) .
  • the composition command can have more time to execute if the fence is released according to individual pixel subgroups.
  • the rendering task will have extra time to execute, e.g., an extra 7.4 ms. Accordingly, at the producer or GPU/CPU side, this extra time can be very useful, as it provides more time for the GPU or CPU to execute the rendering commands. Additionally, at the consumer or display side, this extra time provides more time to execute the composition commands.
  • aspects of the present disclosure can determine display content for a group of pixels in a frame, e.g., frame 700, where the group of pixels includes one or more subgroups of pixels, e.g., pixel subgroups 702, 704, 706.
  • aspects of the present disclosure can determine a position of each of the one or more subgroups of pixels, e.g., pixel subgroups 702, 704, 706, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels, e.g., length 712, 714, 716.
  • the present disclosure can also calculate at least one synchronization divider, e.g., synchronization divider 722, 724, 726, for each of the one or more subgroups of pixels, e.g., pixel subgroups 702, 704, 706.
  • at least one synchronization divider e.g., synchronization divider 722, 724, 726
  • aspects of the present disclosure can also synchronize each of the one or more subgroups of pixels, e.g., pixel subgroups 702, 704, 706, based on the at least one synchronization divider for the subgroup of pixels e.g., synchronization divider 722, 724, 726.
  • At least one synchronization divider for a first subgroup of the one or more subgroups of pixels can correspond to at least one synchronization divider for a second subgroup of the one or more subgroups of pixels e.g., synchronization divider 724 for pixel subgroup 704, when the difference between the position of the first subgroup and the position of the second subgroup is less than a position threshold.
  • each of the one or more subgroups of pixels e.g., pixel subgroups 702, 704, 706, can correspond to a buffer.
  • aspects of the present disclosure can also send or receive at least one signal corresponding to each of the one or more subgroups of pixels, e.g., pixel subgroups 702, 704, 706. Further, each of the one or more subgroups of pixels can include a release time, e.g., release time 772, 774, 776. Aspects of the present disclosure can also release or clear a buffer corresponding to each of the one or more subgroups of pixels, e.g., pixel subgroups 702, 704, 706, at the release time of the subgroup of pixels, e.g., release time 772, 774, 776.
  • a buffer corresponding to a first subgroup of the one or more subgroups of pixels e.g., pixel subgroup 704, and a buffer corresponding to a second subgroup of the one or more subgroups of pixels, e.g., pixel subgroup 706, can be released or cleared simultaneously when the difference between the release time of the first subgroup, e.g., release time 774, and the release time of the second subgroup, e.g., release time 776, is less than a time threshold.
  • the buffer corresponding to each subgroup of pixels e.g., pixel subgroups 702, 704, 706, can be reassigned when the buffer is released or cleared.
  • the release time of each of the one or more subgroups of pixels can correspond to a display engine release time, where the display engine release time is a time when a display engine finishes composing the subgroup of pixels, e.g., pixel subgroups 702, 704, 706.
  • the release time of each of the one or more subgroups of pixels e.g., release times 772, 774, 776, can also correspond to a GPU release time or a CPU release time, where the GPU release time or the CPU release time is a time when a GPU or CPU finishes rendering the subgroup of pixels, e.g., pixel subgroups 702, 704, 706.
  • each of the one or more subgroups of pixels e.g., pixel subgroups 702, 704, 706, can be synchronized between a display engine and a GPU or CPU.
  • aspects of the present disclosure can correspond to a number of different applications.
  • aspects of the present disclosure can apply to foldable displays or screens in mobile devices.
  • a layer may cover half a screen or frame.
  • the layer can be signaled in half the time compared to when the entire screen is utilized.
  • the other half or the back surface of the display may be off or display a static logo without refreshing.
  • the display scan direction and display content direction may have 90 degree rotation.
  • aspects of the present disclosure are applicable to split screens or split screen mode.
  • one application may cover half of the screen. So one application layer may cover one half of the screen.
  • the fence can be released faster if the application covers half the screen, so the GPU can have more time to render the next frame.
  • different application layers can be displayed in different regions of the frame.
  • aspects of the present disclosure are applicable to multiple screen displays.
  • the different displays e.g., a mobile display and an external display
  • an external display may run at different FPS rates for different applications, e.g., 144 Hz for gaming applications and 60 Hz for video applications, and a mobile device may run at yet another FPS rate, e.g., 90 Hz.
  • one display may release pixel subgroups or layers earlier than the other display.
  • the buffers can be shared between the multiple displays. By releasing the layer and releasing or clearing the corresponding buffer as soon as they are rendered or composed, this can provide more flexibility when displaying different frames on different displays.
  • the present disclosure can use a display engine tear check module pointer or interrupt request (IRQ) .
  • IRQ interrupt request
  • the present disclosure can configure the IRQ using a layer position, size, or length.
  • the IRQ can be initiated.
  • the corresponding layer release can be signaled.
  • the next layer position, size, or length can be configured.
  • aspects of the present disclosure can apply de-bounce logic to layers with a similar position. For instance, de-bounce logic can calculate the completion time for each layer, and if the completion time is less than a time threshold, e.g., less than 1 ms, then the layers can be released at the same time.
  • De-bounce logic can simplify the aforementioned processes and reduce the time needed to program the register. As such, if two layers are synchronized too closely together, or if the layers include release times that are less than a time threshold, then they can be synchronized or released at the same time.
  • aspects of the present disclosure can obtain layer positions, sizes, or lengths by configuring a pointer.
  • the present disclosure can also add de-bounce logic to layers with a similar position, and combine them into one release time. As such, some aspects can configure the fence release time with the layer length value. Accordingly, aspects of the present disclosure obtain the position or size of the layers or subgroups of pixels and then coordinate these positions or sizes for the release of the layer fences. Also, the present disclosure may utilize de-bounce logic to combine similar layer position values to reduce the impact on the response load.
  • a frame may have layers 0, 1, 2, 3, a frame length of 2340, and a frame transmission time of 14 ms.
  • Layer 0 may have a length of 200 and a transmission completion time of 1.197 ms
  • layer 1 may have a length of 260 and a transmission completion time of 1.555 ms
  • layer 2 may have a length of 2100 and a transmission completion time of 12.56 ms
  • layer 3 may have a length of 2340 and a transmission completion time of 14 ms.
  • aspects of the present disclosure can utilize de-bounce logic if the length or completion time difference between layers is less than a threshold, and release layers at the same time when the length or completion time difference is within the threshold. For instance, if layers 0 and 1 have similar lengths and transmission completion times, e.g., layer 0 has a length of 200 and a transmission completion time of 1.197 ms and layer 1 has a length of 260 and a transmission completion time of 1.555 ms, the present disclosure can use de-bounce logic and combine the release times for these layers. Additionally, the present disclosure can write 260 to the pointer or IRQ. When the pointer or IRQ is received, layers 0 and 1 can be released. Also, the present disclosure can write 2100 to the pointer or IRQ. When the pointer or IRQ is received, layer 2 can be released. In some aspects, the legacy frame IRQ can release layer 3.
  • the present disclosure can add a kernel thread to check the current write pointer line counter in the display engine tear check hardware block or in another hardware block.
  • This kernel thread can signal the corresponding layer release if the write pointer line counter exceeds the layer position or size limit.
  • de-bounce logic for similarly positioned layers can be applied.
  • This implementation can use kernel software thread polling on the write pointer line counters, which can be computed using layer position coordinates. This can reduce the IRQ, but may increase the CPU loading.
  • aspects of the present disclosure can also utilize de-bounce logic to combine similar layer position values to reduce the impact on CPU loading.
  • a frame may have layers 0, 1, 2, 3, a frame length of 2340, and a frame transmission time of 14 ms.
  • Layer 0 may have a length of 200 and a transmission completion time of 1.197 ms
  • layer 1 may have a length of 260 and a transmission completion time of 1.555 ms
  • layer 2 may have a length of 2100 and a transmission completion time of 12.56 ms
  • layer 3 may have a length of 2340 and a transmission completion time of 14 ms.
  • the present disclosure can de-bounce the time threshold, e.g., if the time threshold is less than 1 ms, and release layers within the time threshold at the same time. As layers 0 and 1 have lengths with a small difference, the present disclosure can use de-bounce logic and combine the release times for these layers. Additionally, the present disclosure can wake up at a certain time, e.g., 1.555 ms + 0.2 ms, and check the line counter register. If the line counter is greater than a threshold, e.g., 260, then layer 0 and 1 can be released. Also, the present disclosure can wake up at another time, e.g., 12.56 ms + 0.2 ms, and check the line counter register. If the line counter is greater than a threshold, e.g., 2100, then layer 2 can be released. The legacy frame IRQ can then release layer 3.
  • a threshold e.g., 260
  • the present disclosure can wake up at another time, e.g., 12.56
  • FIG. 8 illustrates flowchart 800 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by a DPU, a display engine, GPU, CPU, or apparatus for display or graphics processing.
  • the apparatus can determine display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • the apparatus can determine a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • the apparatus can calculate at least one synchronization divider for each of the one or more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • the apparatus can synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • At least one synchronization divider for a first subgroup of the one or more subgroups of pixels can correspond to at least one synchronization divider for a second subgroup of the one or more subgroups of pixels when the difference between the position of the first subgroup and the position of the second subgroup is less than a position threshold, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • each of the one or more subgroups of pixels can correspond to a buffer, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • the apparatus can send or receive at least one signal corresponding to each of the one or more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. Further, each of the one or more subgroups of pixels can include a release time, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. At 812, the apparatus can release a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • a buffer corresponding to a first subgroup of the one or more subgroups of pixels and a buffer corresponding to a second subgroup of the one or more subgroups of pixels can be released simultaneously when the difference between the release time of the first subgroup and the release time of the second subgroup is less than a time threshold, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • the buffer corresponding to each subgroup of pixels can be reassigned when the buffer is released, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • the release time of each of the one or more subgroups of pixels can correspond to a display engine release time, where the display engine release time is a time when a display engine finishes composing the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • the release time of each of the one or more subgroups of pixels can also correspond to a GPU release time or a CPU release time, where the GPU release time or the CPU release time is a time when a GPU or CPU finishes rendering the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • each of the one or more subgroups of pixels can be synchronized between a display engine and a GPU or CPU, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
  • a method or apparatus for display or graphics processing may be a DPU, a display engine, a GPU, a CPU, or some other processor that can perform display or graphics processing.
  • the apparatus may be the processing unit 120 within the device 104, or may be some other hardware within device 104 or another device.
  • the apparatus may include means for determining display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels.
  • the apparatus may also include means for calculating at least one synchronization divider for each of the one or more subgroups of pixels.
  • the apparatus may also include means for synchronizing each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels.
  • the apparatus may also include means for determining a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels.
  • the apparatus may also include means for sending or receiving at least one signal corresponding to each of the one or more subgroups of pixels.
  • the apparatus may also include means for releasing a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
  • the described display or graphics processing techniques can be used by display engines, GPUs, or CPUs to reduce the processing time and/or power used. This can also be accomplished at a low cost compared to other display or graphics processing techniques.
  • the display or graphics processing techniques herein can improve or speed up the processing or execution time. Further, the graphics processing techniques herein can improve the resource or data utilization and/or resource efficiency.
  • the term “or” may be interrupted as “and/or” where context does not dictate otherwise. Additionally, while phrases such as “one or more” or “at least one” or the like may have been used for some features disclosed herein but not others, the features for which such language was not used may be interpreted to have such a meaning implied where context does not dictate otherwise.
  • the functions described herein may be implemented in hardware, software, firmware, or any combination thereof.
  • processing unit has been used throughout this disclosure, such processing units may be implemented in hardware, software, firmware, or any combination thereof. If any function, processing unit, technique described herein, or other module is implemented in software, the function, processing unit, technique described herein, or other module may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media, which is non-transitory or (2) a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, .
  • Disk and disc includes compact disc (CD) , laser disc, optical disc, digital versatile disc (DVD) , floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • a computer program product may include a computer-readable medium.
  • the code may be executed by one or more processors, such as one or more digital signal processors (DSPs) , general purpose microprocessors, application specific integrated circuits (ASICs) , arithmetic logic units (ALUs) , field programmable logic arrays (FPGAs) , or other equivalent integrated or discrete logic circuitry.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • ALUs arithmetic logic units
  • FPGAs field programmable logic arrays
  • the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs, e.g., a chip set.
  • IC integrated circuit
  • Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need realization by different hardware units. Rather, as described above, various units may be combined in any hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The present disclosure relates to methods and apparatus for display or graphics processing. In some aspects, the apparatus can determine display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels. The apparatus can also calculate at least one synchronization divider for each of the one or more subgroups of pixels. Further, the apparatus can synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels. In some aspects, the apparatus can determine a position of each of the one or more subgroups of pixels, wherein the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels.

Description

METHODS AND APPARATUS FOR GRAPHICS AND DISPLAY PIPELINE MANAGEMENT TECHNICAL FIELD
The present disclosure relates generally to processing systems and, more particularly, to one or more techniques for display or graphics processing.
INTRODUCTION
Computing devices often utilize a graphics processing unit (GPU) to accelerate the rendering of graphical data for display. Such computing devices may include, for example, computer workstations, mobile phones such as so-called smartphones, embedded systems, personal computers, tablet computers, and video game consoles. GPUs execute a graphics processing pipeline that includes one or more processing stages that operate together to execute graphics processing commands and output a frame. A central processing unit (CPU) may control the operation of the GPU by issuing one or more graphics processing commands to the GPU. Modern day CPUs are typically capable of concurrently executing multiple applications, each of which may need to utilize the GPU during execution.
An electronic device may execute a program to present graphics content on a display. For example, an electronic device may execute a user interface application, video game application, and the like.
SUMMARY
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
In an aspect of the disclosure, a method, a computer-readable medium, and an apparatus are provided. The apparatus may be a display processing unit (DPU) , a display engine, a GPU, a CPU, or some other processor for display or graphics processing. In some aspects, the apparatus can determine display content for a group  of pixels in a frame, where the group of pixels includes one or more subgroups of pixels. The apparatus can also calculate at least one synchronization divider for each of the one or more subgroups of pixels. Further, the apparatus can synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels. In some aspects, the apparatus can determine a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels. Also, the apparatus can send or receive at least one signal corresponding to each of the one or more subgroups of pixels. The apparatus can also release a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
The details of one or more examples of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a block diagram that illustrates an example content generation system in accordance with one or more techniques of this disclosure.
FIG. 2 illustrates an example GPU in accordance with one or more techniques of this disclosure.
FIG. 3 illustrates an example diagram in accordance with one or more techniques of this disclosure.
FIG. 4 illustrates another example diagram in accordance with one or more techniques of this disclosure.
FIG. 5 illustrates another example diagram in accordance with one or more techniques of this disclosure.
FIG. 6 illustrates an example frame and timeline, respectively, in accordance with one or more techniques of this disclosure.
FIG. 7 illustrates an example frame and timeline, respectively, in accordance with one or more techniques of this disclosure.
FIG. 8 illustrates an example flowchart of an example method in accordance with one or more techniques of this disclosure.
DETAILED DESCRIPTION
Various aspects of systems, apparatuses, computer program products, and methods are described more fully hereinafter with reference to the accompanying drawings. This disclosure may, however, be embodied in many different forms and should not be construed as limited to any specific structure or function presented throughout this disclosure. Rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of this disclosure to those skilled in the art. Based on the teachings herein one skilled in the art should appreciate that the scope of this disclosure is intended to cover any aspect of the systems, apparatuses, computer program products, and methods disclosed herein, whether implemented independently of, or combined with, other aspects of the disclosure. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method which is practiced using other structure, functionality, or structure and functionality in addition to or other than the various aspects of the disclosure set forth herein. Any aspect disclosed herein may be embodied by one or more elements of a claim.
Although various aspects are described herein, many variations and permutations of these aspects fall within the scope of this disclosure. Although some potential benefits and advantages of aspects of this disclosure are mentioned, the scope of this disclosure is not intended to be limited to particular benefits, uses, or objectives. Rather, aspects of this disclosure are intended to be broadly applicable to different wireless technologies, system configurations, networks, and transmission protocols, some of which are illustrated by way of example in the figures and in the following description. The detailed description and drawings are merely illustrative of this disclosure rather than limiting, the scope of this disclosure being defined by the appended claims and equivalents thereof.
Several aspects are presented with reference to various apparatus and methods. These apparatus and methods are described in the following detailed description and illustrated in the accompanying drawings by various blocks, components, circuits, processes, algorithms, and the like (collectively referred to as “elements” ) . These elements may be implemented using electronic hardware, computer software, or any combination thereof. Whether such elements are implemented as hardware or  software depends upon the particular application and design constraints imposed on the overall system.
By way of example, an element, or any portion of an element, or any combination of elements may be implemented as a “processing system” that includes one or more processors (which may also be referred to as processing units) . Examples of processors include microprocessors, microcontrollers, graphics processing units (GPUs) , general purpose GPUs (GPGPUs) , central processing units (CPUs) , application processors, digital signal processors (DSPs) , reduced instruction set computing (RISC) processors, systems-on-chip (SOC) , baseband processors, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , programmable logic devices (PLDs) , state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. One or more processors in the processing system may execute software. Software can be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. The term application may refer to software. As described herein, one or more techniques may refer to an application, i.e., software, being configured to perform one or more functions. In such examples, the application may be stored on a memory, e.g., on-chip memory of a processor, system memory, or any other memory. Hardware described herein, such as a processor may be configured to execute the application. For example, the application may be described as including code that, when executed by the hardware, causes the hardware to perform one or more techniques described herein. As an example, the hardware may access the code from a memory and execute the code accessed from the memory to perform one or more techniques described herein. In some examples, components are identified in this disclosure. In such examples, the components may be hardware, software, or a combination thereof. The components may be separate components or sub-components of a single component.
Accordingly, in one or more examples described herein, the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or  code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise a random access memory (RAM) , a read-only memory (ROM) , an electrically erasable programmable ROM (EEPROM) , optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
In general, this disclosure describes techniques for having a graphics processing pipeline in a single device or multiple devices, improving the rendering of graphical content, and/or reducing the load of a processing unit, i.e., any processing unit configured to perform one or more techniques described herein, such as a GPU. For example, this disclosure describes techniques for graphics processing in any device that utilizes graphics processing. Other example benefits are described throughout this disclosure.
As used herein, instances of the term “content” may refer to “graphical content, ” “image, ” and vice versa. This is true regardless of whether the terms are being used as an adjective, noun, or other parts of speech. In some examples, as used herein, the term “graphical content” may refer to a content produced by one or more processes of a graphics processing pipeline. In some examples, as used herein, the term “graphical content” may refer to a content produced by a processing unit configured to perform graphics processing. In some examples, as used herein, the term “graphical content” may refer to a content produced by a graphics processing unit.
In some examples, as used herein, the term “display content” may refer to content generated by a processing unit configured to perform displaying processing. In some examples, as used herein, the term “display content” may refer to content generated by a display processing unit. Graphical content may be processed to become display content. For example, a graphics processing unit may output graphical content, such as a frame, to a buffer (which may be referred to as a framebuffer) . A display processing unit may read the graphical content, such as one or more frames from the buffer, and perform one or more display processing techniques thereon to determine display content and/or generate display content. For example, a display processing unit may be configured to perform composition on one or more rendered layers to  generate a frame. As another example, a display processing unit may be configured to compose, blend, or otherwise combine two or more layers together into a single frame. A display processing unit may be configured to perform scaling, e.g., upscaling or downscaling, on a frame. In some examples, a frame may refer to a layer. In other examples, a frame may refer to two or more layers that have already been blended together to form the frame, i.e., the frame includes two or more layers, and the frame that includes two or more layers may subsequently be blended.
FIG. 1 is a block diagram that illustrates an example content generation system 100 configured to implement one or more techniques of this disclosure. The content generation system 100 includes a device 104. The device 104 may include one or more components or circuits for performing various functions described herein. In some examples, one or more components of the device 104 may be components of an SOC. The device 104 may include one or more components configured to perform one or more techniques of this disclosure. In the example shown, the device 104 may include a processing unit 120, and a system memory 124. In some aspects, the device 104 can include a number of optional components, e.g., a communication interface 126, a transceiver 132, a receiver 128, a transmitter 130, a display processor 127, and one or more displays 131. Reference to the display 131 may refer to the one or more displays 131. For example, the display 131 may include a single display or multiple displays. The display 131 may include a first display and a second display. The first display may be a left-eye display and the second display may be a right-eye display. In some examples, the first and second display may receive different frames for presentment thereon. In other examples, the first and second display may receive the same frames for presentment thereon. In further examples, the results of the graphics processing may not be displayed on the device, e.g., the first and second display may not receive any frames for presentment thereon. Instead, the frames or graphics processing results may be transferred to another device. In some aspects, this can be referred to as split-rendering.
The processing unit 120 may include an internal memory 121. The processing unit 120 may be configured to perform graphics processing, such as in a graphics processing pipeline 107. In some examples, the device 104 may include a display processor, such as the display processor 127, to perform one or more display processing techniques on one or more frames generated by the processing unit 120 before presentment by the one or more displays 131. The display processor 127 may  be configured to perform display processing. For example, the display processor 127 may be configured to perform one or more display processing techniques on one or more frames generated by the processing unit 120. The one or more displays 131 may be configured to display or otherwise present frames processed by the display processor 127. In some examples, the one or more displays 131 may include one or more of: a liquid crystal display (LCD) , a plasma display, an organic light emitting diode (OLED) display, a projection display device, an augmented reality display device, a virtual reality display device, a head-mounted display, or any other type of display device.
Memory external to the processing unit 120, such as system memory 124, may be accessible to the processing unit 120. For example, the processing unit 120 may be configured to read from and/or write to external memory, such as the system memory 124. The processing unit 120 may be communicatively coupled to the system memory 124 over a bus. In some examples, the processing unit 120 may be communicatively coupled to each other over the bus or a different connection.
The internal memory 121 or the system memory 124 may include one or more volatile or non-volatile memories or storage devices. In some examples, internal memory 121 or the system memory 124 may include RAM, SRAM, DRAM, erasable programmable ROM (EPROM) , electrically erasable programmable ROM (EEPROM) , flash memory, a magnetic data media or an optical storage media, or any other type of memory.
The internal memory 121 or the system memory 124 may be a non-transitory storage medium according to some examples. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that internal memory 121 or the system memory 124 is non-movable or that its contents are static. As one example, the system memory 124 may be removed from the device 104 and moved to another device. As another example, the system memory 124 may not be removable from the device 104.
The processing unit 120 may be a central processing unit (CPU) , a graphics processing unit (GPU) , a general purpose GPU (GPGPU) , or any other processing unit that may be configured to perform graphics processing. In some examples, the processing unit 120 may be integrated into a motherboard of the device 104. In some examples, the processing unit 120 may be present on a graphics card that is installed  in a port in a motherboard of the device 104, or may be otherwise incorporated within a peripheral device configured to interoperate with the device 104. The processing unit 120 may include one or more processors, such as one or more microprocessors, GPUs, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , arithmetic logic units (ALUs) , digital signal processors (DSPs) , discrete logic, software, hardware, firmware, other equivalent integrated or discrete logic circuitry, or any combinations thereof. If the techniques are implemented partially in software, the processing unit 120 may store instructions for the software in a suitable, non-transitory computer-readable storage medium, e.g., internal memory 121, and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered to be one or more processors.
In some aspects, the content generation system 100 can include an optional communication interface 126. The communication interface 126 may include a receiver 128 and a transmitter 130. The receiver 128 may be configured to perform any receiving function described herein with respect to the device 104. Additionally, the receiver 128 may be configured to receive information, e.g., eye or head position information, rendering commands, or location information, from another device. The transmitter 130 may be configured to perform any transmitting function described herein with respect to the device 104. For example, the transmitter 130 may be configured to transmit information to another device, which may include a request for content. The receiver 128 and the transmitter 130 may be combined into a transceiver 132. In such examples, the transceiver 132 may be configured to perform any receiving function and/or transmitting function described herein with respect to the device 104.
Referring again to FIG. 1, in certain aspects, the graphics processing pipeline 107 may include a determination component 198 configured to determine display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels. The determination component 198 can also be configured to calculate at least one synchronization divider for each of the one or more subgroups of pixels. Additionally, the determination component 198 can be configured to synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels. The determination component 198 can also be  configured to determine a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels. The determination component 198 can also be configured to send or receive at least one signal corresponding to each of the one or more subgroups of pixels. Moreover, the determination component 198 can be configured to release a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
As described herein, a device, such as the device 104, may refer to any device, apparatus, or system configured to perform one or more techniques described herein. For example, a device may be a server, a base station, user equipment, a client device, a station, an access point, a computer, e.g., a personal computer, a desktop computer, a laptop computer, a tablet computer, a computer workstation, or a mainframe computer, an end product, an apparatus, a phone, a smart phone, a server, a video game platform or console, a handheld device, e.g., a portable video game device or a personal digital assistant (PDA) , a wearable computing device, e.g., a smart watch, an augmented reality device, or a virtual reality device, a non-wearable device, a display or display device, a television, a television set-top box, an intermediate network device, a digital media player, a video streaming device, a content streaming device, an in-car computer, any mobile device, any device configured to generate graphical content, or any device configured to perform one or more techniques described herein. Processes herein may be described as performed by a particular component (e.g., a GPU) , but, in further embodiments, can be performed using other components (e.g., a CPU) , consistent with disclosed embodiments.
GPUs can process multiple types of data or data packets in a GPU pipeline. For instance, in some aspects, a GPU can process two types of data or data packets, e.g., context register packets and draw call data. A context register packet can be a set of global state information, e.g., information regarding a global register, shading program, or constant data, which can regulate how graphics context will be processed. For example, context register packets can include information regarding a color format. In some aspects of context register packets, there can be a bit that indicates which workload belongs to a context register. Also, there can be multiple functions or programming running at the same time and/or in parallel. For example, functions  or programming can describe a certain operation, e.g., the color mode or color format. Accordingly, a context register can define multiple states of a GPU.
Context states can be utilized to determine how an individual processing unit functions, e.g., a vertex fetcher (VFD) , a vertex shader (VS) , a shader processor, or a geometry processor, and/or in what mode the processing unit functions. In order to do so, GPUs can use context registers and programming data. In some aspects, a GPU can generate a workload, e.g., a vertex or pixel workload, in the pipeline.
FIG. 2 illustrates an example GPU 200 in accordance with one or more techniques of this disclosure. As shown in FIG. 2, GPU 200 includes command processor (CP) 210, draw call data packets 212, VFD 220, VS 222, vertex cache (VPC) 224, triangle setup engine (TSE) 226, rasterizer (RAS) 228, Z process engine (ZPE) 230, pixel interpolator (PI) 232, fragment shader (FS) 234, render backend (RB) 236, L2 cache (UCHE) 238, and system memory 240. Although FIG. 2 displays that GPU 200 includes processing units 220-238, GPU 200 can include a number of additional processing units. Additionally, processing units 220-238 are merely an example and any combination or order of processing units can be used by GPUs according to the present disclosure. GPU 200 also includes command buffer 250, context register packets 260, and context states 261.
As shown in FIG. 2, a GPU can utilize a CP, e.g., CP 210, or hardware accelerator to parse a command buffer into context register packets, e.g., context register packets 260, and/or draw call data packets, e.g., draw call data packets 212. The CP 210 can then send the context register packets 260 or draw call data packets 212 through separate paths to the processing units or blocks in the GPU. Further, the command buffer 250 can alternate different states of context registers and draw calls. For example, a command buffer can be structured as follows: context register of context N, draw call (s) of context N, context register of context N+1, and draw call (s) of context N+1.
Aspects of mobile devices or smart phones can utilize buffer mechanisms to distribute or coordinate a buffer between an application rendering side of the device, e.g., a GPU or CPU, and a display or composition side of the device, e.g., a display engine. For instance, some mobile devices can utilize a buffer queue mechanism to distribute or coordinate a buffer between an application rendering side and a display or composition side, which can include a buffer compositor, e.g., a surface flinger (SF) or hardware composer (HWC) . In some aspects, the application rendering side can  be referred to as a producer, while the display or composition side can be referred to as a consumer. Additionally, a synchronization divider or fence can be used to synchronize content between the application rendering side and the display or composition side. Accordingly, a fence can be referred to as a synchronization divider, and vice versa.
FIG. 3 illustrates diagram 300 in accordance with one or more techniques of this disclosure. As shown in FIG. 3, diagram 300 includes producer 310, buffer queue mechanism 320 and consumer 330. More specifically, FIG. 3 illustrates how the buffer queue mechanism 320 helps to distribute or coordinate the buffers between the producer 310 and the consumer 330. As mentioned above, the producer 310 is the application side, which produces or renders the content for display. The consumer 330 is the display or composition side, which displays the content on the user device. The buffer queue mechanism 320 is used to coordinate the buffers between the producer and consumer sides.
As shown in FIG. 3, the buffers can include multiple states at the producer 310, buffer queue mechanism 320, and the consumer 330. For example, the buffer state can be “dequeued” at the producer 310. In this buffer state, the producer 310 can obtain a buffer from the user device to produce content. Prior to rendering content, the buffer is referred to as a dequeueBuffer. The dequeueBuffer can transfer a free buffer from the buffer queue with a synchronization divider or fence. In some aspects, the synchronization divider or fence can be the same as the releaseBuffer on the consumer 330 side. Also, in some aspects, the GPU driver or kernel graphics support layer (kgsl) driver may wait for the synchronization divider or fence to be signaled before it can access the buffer for rendering.
When the content is produced or rendered, the queued buffer can be sent to the buffer queue mechanism 320, which changes the buffer state to “queued” at the buffer queue mechanism 320. When being transferred to the buffer queue mechanism 320, the buffer can be referred to as a queueBuffer. During the queueBuffer state, a new rendered buffer with a synchronization divider or fence can be sent to the buffer queue. The synchronization divider or fence can be generated by the GPU driver or kgsl driver at the producer 310. Additionally, the synchronization divider or fence can be signaled by the GPU or kgsl driver when the frame rendering commands are completed.
At the consumer 330, when a new frame is displayed, the buffer is in an “acquired” state. For instance, the buffer is acquiring new content, so it is in an acquired state. When transferred to the consumer 330, the buffer can be referred to as an acquireBuffer. The consumer 330 or display engine can utilize the new buffer for composition to display content. In some aspects, the display content can be determined and/or generated. The acquireBuffer state can send a queued buffer with a synchronization divider or fence to buffer queue. In some aspects, this synchronization divider or fence can be the same as the queueBuffer at the producer 310 side. Also, the display driver may wait for the synchronization divider or fence to be signaled before it accesses the buffer for a new composition.
After composition is complete, the consumer 330 can release the buffer, which changes the buffer state to “free. ” When being transferred to the buffer queue mechanism 320, the buffer can be referred to as a releaseBuffer. During the releaseBuffer state, a new released buffer with a synchronization divider or fence can be sent to the buffer queue mechanism 320. The synchronization divider or fence can be generated by the display driver at the consumer 330. Also, the display driver can signal when the buffer completes composition and/or display at the consumer 330. As illustrated in FIG. 3, the aforementioned steps manifest how a buffer is recycled and utilized in different states at the producer 310, the buffer queue mechanism 320, and the consumer 330.
As mentioned above, a buffer can include a synchronization divider or fence. A fence is a synchronization event or method, such as an alignment point or divider utilized for synchronizing content between two different applications, e.g., a producer and a consumer. For instance, a fence can synchronize the producer or application side with the consumer or display side. Accordingly, a fence can inform two different components or applications regarding when to synchronize. For example, if there are two different execution components in an application, they may be synchronized at a synchronization divider or fence. In some aspects, the synchronization divider or fence can be a synchronization method in software or hardware. As such, at the synchronization divider or fence, a component can receive a signal from another component, or send a signal to another component.
FIG. 4 illustrates diagram 400 in accordance with one or more techniques of this disclosure. As shown in FIG. 4, diagram 400 includes queueBuffer 401, layerBuffer 411, fence 421, queueBuffer 402, layerBuffer 412, fence 422, queueBuffer N,  layerBuffer N, and fence N. More specifically, FIG. 4 illustrates a buffer and corresponding synchronization divider or fence that is generated and managed by a producer or application side, e.g., a GPU or CPU. The synchronization divider or fence can be used to synchronize with the consumer or hardware composer side, e.g., a display engine. FIG. 4 shows the buffer or fence usage between a GPU or rendering side and a display engine or composition side.
As further shown in FIG. 4, a layerBuffer update, e.g., layerBuffer 411, layerBuffer 412, or layerBuffer N, can be sent with each fence, e.g., fence 421, fence 422, fence N. For example, layerBuffer 411 can be sent with fence 421, layerBuffer 412 can be sent with fence 422, and layerBuffer N can be sent with fence N. In some aspects, before the layerBuffer can be accessed for composition at the consumer or display side, the fences, e.g., fence 421, fence 422, fence N, may need to wait in the display engine driver. As mentioned above, these fences can be managed by the producer or GPU side, so the GPU driver may need to signal the fences before the hardware composer or display engine can begin composition at the display side.
FIG. 5 illustrates diagram 500 in accordance with one or more techniques of this disclosure. More specifically, FIG. 5 illustrates a buffer and corresponding synchronization divider or fence that is generated and managed by a consumer or hardware composer side, e.g., a display engine. The synchronization divider or fence can be used to synchronize with the producer or application side, e.g., a GPU or CPU. As shown in FIG. 5, at 502, the hardware composer or display engine can generate the releaseBuffer. At 504, the hardware composer or display engine can generate the frameBuffer fence. At 506, the fence can be stored in the GPU or kgsl driver. At 508, the GPU can render or utilize the buffer and corresponding fence.
As indicated above, the buffer and corresponding fence in FIG. 5 can be generated and managed by the hardware composer or display engine side and be used to synchronize with the GPU or kgsl driver side. As such, the hardware composer or display engine may need to signal the fence before the GPU or CPU can begin rendering content. As shown in FIGs. 4 and 5, one buffer and fence can be generated and managed by the producer or application side, and be utilized at the consumer or display side for the synchronization. Additionally, another fence can be generated and managed by the consumer or display side, and be utilized at the producer or application side.
In some aspects of mobile devices or smart phones, the synchronization divider or fence that is controlled and managed by the producer or application side and used by display side can be referred to as an acquired fence. Also, the synchronization divider or fence that is controlled and managed by the display side and used by the producer or application side can be referred to as a release fence. As mentioned above, these synchronization dividers or fences can be utilized by an application side and a display side at a user device. However, the synchronization dividers or fences can also be utilized at a server.
In some instances, because a consumer or application side may support multi-context concurrent processing, each application process may have its own GPU context or kgsl fence timeline. Additionally, a GPU or kgsl driver may immediately signal a synchronization divider or fence when the associated frame rendering is completed at the GPU or CPU. In some aspects, the display engine can be a standalone hardware component for the application process, such that it may have a display engine fence timeline. As such, the display engine driver may signal the fence when the frame composition and display processes are completed at the display side.
In some aspects, there may be more than one subgroup of pixels or layers in a frame within a larger group of pixels. This frame may be displayed on a screen, e.g., at the display side of a user device. Each layer or subgroup of pixels may cover a portion of the frame or screen. For example, in split display mode, one layer or subgroup of pixels may cover half the frame or screen. However, in some instances, each of the subgroup of pixels or layers may release their corresponding fence or synchronization divider at the same time, e.g., at a release time. When the fence or synchronization divider is released at the release time, the buffer corresponding to the layer or subgroup of pixels may be released or cleared. As such, in some aspects, the layers or subgroups of pixels can signal their release at the same time, e.g., when the whole frame composition is completed.
Additionally, the timing in the display driver code can correspond to when the display engine kernel driver receives a frame. In some aspects, the display side can include the following code for signaling the release of a synchronization divider or fence:
Figure PCTCN2019104557-appb-000001
Figure PCTCN2019104557-appb-000002
FIG. 6 illustrates frame or screen 600 and timeline 650. As shown in FIG. 6, frame or screen 600 includes pixel subgroup or layer 602, pixel subgroup or layer 604, pixel subgroup or layer 606, length 612, length 614, length 616, synchronization divider or fence 622 for layer 602, synchronization divider or fence 624 for layer 604, and synchronization divider or fence 626 for layer 606. As shown in FIG. 6, timeline 650 includes t 0 660, t 1 661, t 2 662, t 3 663, and release time 670. As illustrated in FIG. 6, t 0 660 corresponds to the start of the composition or rendering of pixel subgroup 602, t 1 661 corresponds to synchronization divider 622, t 2 662 corresponds to synchronization divider 624, and t 3 663 corresponds to synchronization divider 626 and release time 670.
FIG. 6 shows frame 600 of a device operating in horizontal mode with three different subgroups or layers, e.g., pixel subgroup 602, pixel subgroup 604, and pixel subgroup 606. For instance, pixel subgroup 602 covers one half of frame 600, pixel subgroup 604 covers the other half of frame 600, and pixel subgroup 606 is the status bar of frame 600. As shown in FIG. 6, the composition or rendering of pixel subgroup 602 is between t 0 660 and t 1 661, e.g., 0 ms to 7 ms. For example, synchronization divider 622 corresponds to t 1 661, e.g., 7 ms. The composition or rendering of pixel subgroup 604 is between t 1 661 and t 2 662, e.g., 7 ms to 14 ms. For example, synchronization divider 624 corresponds to t 2 662, e.g., 14 ms. Further, the composition or rendering of pixel subgroup 606 is between t 2 662 to t 3 663, e.g., 14 ms to 14.4 ms. For example, synchronization divider 626 corresponds to t 3 663, e.g., 14.4 ms.
As shown in FIG. 6, release time 670 can occur at t 3 663. Release time 670 can signal the release of each of the synchronization dividers, e.g.,  synchronization dividers  622,  624, 626. Accordingly, release time 670 can signal the release of all the fences at the same time. As mentioned herein, when the release of a fence is signaled, the display engine can inform the application side that the particular subgroup of pixels is no longer needed. Accordingly, the corresponding buffer can be released or cleared. Once the buffer is released or cleared, the GPU or CPU at the application side can reuse this buffer for another task.
In some aspects, when a fence is generated it can be in an active state. After the rendering or composition task is completed, the GPU or display driver can signal the release of the fence. In some aspects, if one side is using a buffer, then it may not be used by the other side. For example, if a GPU is using a buffer, then the display may not use the buffer. Also, if a display is using a buffer, then the GPU may not use the buffer.
In some aspects, if the release of all the fences in a frame is signaled simultaneously, this can cause a backup in processing for the device. In turn, this can result in wasting both processing time and energy, as well as power utilized. For instance, this can waste GPU or CPU processing cycles and may cause unexpected wait times at the GPU or CPU before a new frame is rendered. This can introduce a task overload or processing issues for some applications, e.g., janks or interruptions in processing.
However, if each subgroup of pixels or layers in a frame is released individually, rather than all at once, then processing time can be saved. This can result in more time for synchronization between the display and the GPU or CPU. For instance, each subgroup of pixels or layers can include its own synchronization divider or fence with its own release time. Therefore, each subgroup or layer can use a separate fence and release time once it completes composition or rendering. By doing so, the corresponding buffer for each pixel subgroup or layer can be released or cleared individually.
At least some advantages of separating the fence release time and/or buffer release or clearance for each pixel subgroup is saving processing time and/or power. Accordingly, aspects of the present disclosure can include individual display engine or application side fence release signal times, which can reduce latency in the graphics or display pipeline. So each release time for each layer can be signaled when the composition or rendering is completed for that layer. As such, each layer or pixel subgroup can signal the release of its corresponding fence.
By separating the fence release time for each pixel subgroup, a division can be created between the layers or subgroups of pixels, such that each layer or subgroup of pixels can have its own synchronization event and/or release time. This can occur at the display side, e.g., when synchronizing with the application side, as well as at the application side, e.g., when synchronizing with the display side. In some aspects, the subgroup of pixels or layers can overlap or blend with other subgroups or layers, such that a fence for one layer may overlap with a portion of another subgroup or layer.
FIG. 7 illustrates frame 700 and timeline 750. As shown in FIG. 7, frame or screen 700 includes pixel subgroup or layer 702, pixel subgroup or layer 704, pixel subgroup or layer 706, length 712, length 714, length 716, synchronization divider or fence 722 for layer 702, synchronization divider or fence 724 for layer 704, and synchronization divider or fence 726 for layer 706. As shown in FIG. 7, timeline 750 includes t 0 760, t 1 761, t 2 762, t 3 763, release time 772, release time 774, and release time 776. As illustrated in FIG. 7, t 0 760 corresponds to the start of the composition or rendering of pixel subgroup 702, t 1 761 corresponds to synchronization divider 722 and release time 772, t 2 762 corresponds to synchronization divider 724 and release time 774, t 3 763 corresponds to synchronization divider 726 and release time 776.
FIG. 7 shows frame 700 of a device operating in horizontal mode with three different subgroups or layers, e.g., pixel subgroup 702, pixel subgroup 704, and pixel subgroup 706. For instance, pixel subgroup 702 covers one half of frame 700, pixel subgroup 704 covers the other half of frame 700, and pixel subgroup 706 can be the status bar of frame 700. As shown in FIG. 7, the composition or rendering of pixel subgroup 702 is between t 0 760 and t 1 761, e.g., 0 ms to 7 ms. For example, synchronization divider 722 can correspond to t 1 761, e.g., 7 ms. Further, release time 772 can occur at t 1 761, e.g., 7 ms. The composition or rendering of pixel subgroup 704 is between t 1 761 and t 2 762, e.g., 7 ms to 14 ms. For example, synchronization divider 724 can correspond to t 2 762, e.g., 14 ms. Release time 774 can also occur at t 2 762, e.g., 14 ms. Also, the composition or rendering of pixel subgroup 706 is between t 2 762 to t 3 763, e.g., 14 ms to 14.4 ms. For example, synchronization divider 726 can correspond to t 3 763, e.g., 14.4 ms. Release time 774 can also occur at t 3 763, e.g., 14.4 ms.
As shown in FIG. 7,  pixel subgroups  702, 704, and 706 can each include their own release time, e.g.,  release time  772, 774, and 776, respectively. Accordingly, the  synchronization dividers  722, 724, and 726 can be released individually, rather than all at the same time. Therefore, the buffers corresponding to  pixel subgroups  702,  704, and 706 can each be released or cleared individually and at release times, 772, 774, and 776, respectively. As mentioned above, by doing so, aspects of the present disclosure can reduce the graphics or display pipeline latency.
As mentioned above, if a layer or pixel subgroup is released individually, such that the corresponding buffer is released or cleared individually, the composition or rendering time can be better utilized by the display engine or GPU/CPU. For instance, both the performance and power of composition or rendering can be improved. Indeed, by releasing the synchronization dividers or fences for each pixel subgroup individually, this allows the display engine or GPU/CPU to perform tasks individually, as opposed to having to perform multiple tasks at the same time. In turn, this can reduce the display or graphics processing pipeline latency.
When a pixel subgroup or layer is released individually, the rendering or composition task may have a longer time to process, which can reduce pipeline latency. At the GPU side, before a rendering command can be executed, it may need to wait for the buffer to be released. So if the synchronization divider or fence is released individually, then the rendering command can have more time to execute. This can correspond to an improvement in GPU frame rate or frames per second (FPS) . Likewise, at the display side, before a composition command can be executed, it may need to wait for the buffer to be released. Accordingly, the composition command can have more time to execute if the fence is released according to individual pixel subgroups.
Referring to FIG. 7, if pixel subgroup 702 is released at t 1 761, e.g., 7 ms, as opposed to t 3 763, e.g., 14.4 ms, the rendering task will have extra time to execute, e.g., an extra 7.4 ms. Accordingly, at the producer or GPU/CPU side, this extra time can be very useful, as it provides more time for the GPU or CPU to execute the rendering commands. Additionally, at the consumer or display side, this extra time provides more time to execute the composition commands.
As shown in FIG. 7, aspects of the present disclosure can determine display content for a group of pixels in a frame, e.g., frame 700, where the group of pixels includes one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706. Aspects of the present disclosure can determine a position of each of the one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels, e.g.,  length  712, 714, 716. The present disclosure can also  calculate at least one synchronization divider, e.g.,  synchronization divider  722, 724, 726, for each of the one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706.
Aspects of the present disclosure can also synchronize each of the one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706, based on the at least one synchronization divider for the subgroup of pixels e.g.,  synchronization divider  722, 724, 726. In some aspects, at least one synchronization divider for a first subgroup of the one or more subgroups of pixels, e.g., synchronization divider 722 for pixel subgroup 702, can correspond to at least one synchronization divider for a second subgroup of the one or more subgroups of pixels e.g., synchronization divider 724 for pixel subgroup 704, when the difference between the position of the first subgroup and the position of the second subgroup is less than a position threshold. Additionally, each of the one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706, can correspond to a buffer.
Aspects of the present disclosure can also send or receive at least one signal corresponding to each of the one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706. Further, each of the one or more subgroups of pixels can include a release time, e.g.,  release time  772, 774, 776. Aspects of the present disclosure can also release or clear a buffer corresponding to each of the one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706, at the release time of the subgroup of pixels, e.g.,  release time  772, 774, 776. In some aspects, a buffer corresponding to a first subgroup of the one or more subgroups of pixels, e.g., pixel subgroup 704, and a buffer corresponding to a second subgroup of the one or more subgroups of pixels, e.g., pixel subgroup 706, can be released or cleared simultaneously when the difference between the release time of the first subgroup, e.g., release time 774, and the release time of the second subgroup, e.g., release time 776, is less than a time threshold. Also, the buffer corresponding to each subgroup of pixels, e.g.,  pixel subgroups  702, 704, 706, can be reassigned when the buffer is released or cleared.
In some aspects, the release time of each of the one or more subgroups of pixels, e.g.,  release times  772, 774, 776, can correspond to a display engine release time, where the display engine release time is a time when a display engine finishes composing the subgroup of pixels, e.g.,  pixel subgroups  702, 704, 706. The release time of each of the one or more subgroups of pixels e.g.,  release times  772, 774, 776, can also correspond to a GPU release time or a CPU release time, where the GPU release time  or the CPU release time is a time when a GPU or CPU finishes rendering the subgroup of pixels, e.g.,  pixel subgroups  702, 704, 706. Moreover, each of the one or more subgroups of pixels, e.g.,  pixel subgroups  702, 704, 706, can be synchronized between a display engine and a GPU or CPU.
Aspects of the present disclosure can correspond to a number of different applications. For instance, aspects of the present disclosure can apply to foldable displays or screens in mobile devices. In some aspects of foldable displays, a layer may cover half a screen or frame. For example, when using an application on a foldable display device, when half of the foldable screen is being utilized, the layer can be signaled in half the time compared to when the entire screen is utilized. In these instances, the other half or the back surface of the display may be off or display a static logo without refreshing. Additionally, in foldable display devices, the display scan direction and display content direction may have 90 degree rotation.
Also, aspects of the present disclosure are applicable to split screens or split screen mode. For example, in split screen mode, one application may cover half of the screen. So one application layer may cover one half of the screen. As such, the fence can be released faster if the application covers half the screen, so the GPU can have more time to render the next frame. Additionally, different application layers can be displayed in different regions of the frame.
In addition, aspects of the present disclosure are applicable to multiple screen displays. For instance, in a multiple screen display, the different displays, e.g., a mobile display and an external display, may have a different FPS rate. For example, an external display may run at different FPS rates for different applications, e.g., 144 Hz for gaming applications and 60 Hz for video applications, and a mobile device may run at yet another FPS rate, e.g., 90 Hz. Based on this, one display may release pixel subgroups or layers earlier than the other display. In some aspects, the buffers can be shared between the multiple displays. By releasing the layer and releasing or clearing the corresponding buffer as soon as they are rendered or composed, this can provide more flexibility when displaying different frames on different displays.
Aspects of the present disclosure can utilize multiple approaches to implement the aforementioned release fence signal timing. In some aspects, the present disclosure can use a display engine tear check module pointer or interrupt request (IRQ) . For instance, the present disclosure can configure the IRQ using a layer position, size, or length. In some instances, when the IRQ is initiated, the corresponding layer release  can be signaled. In turn, the next layer position, size, or length can be configured. Also, aspects of the present disclosure can apply de-bounce logic to layers with a similar position. For instance, de-bounce logic can calculate the completion time for each layer, and if the completion time is less than a time threshold, e.g., less than 1 ms, then the layers can be released at the same time. De-bounce logic can simplify the aforementioned processes and reduce the time needed to program the register. As such, if two layers are synchronized too closely together, or if the layers include release times that are less than a time threshold, then they can be synchronized or released at the same time.
As mentioned herein, aspects of the present disclosure can obtain layer positions, sizes, or lengths by configuring a pointer. The present disclosure can also add de-bounce logic to layers with a similar position, and combine them into one release time. As such, some aspects can configure the fence release time with the layer length value. Accordingly, aspects of the present disclosure obtain the position or size of the layers or subgroups of pixels and then coordinate these positions or sizes for the release of the layer fences. Also, the present disclosure may utilize de-bounce logic to combine similar layer position values to reduce the impact on the response load.
For example, a frame may have layers 0, 1, 2, 3, a frame length of 2340, and a frame transmission time of 14 ms. Layer 0 may have a length of 200 and a transmission completion time of 1.197 ms, layer 1 may have a length of 260 and a transmission completion time of 1.555 ms, layer 2 may have a length of 2100 and a transmission completion time of 12.56 ms, and layer 3 may have a length of 2340 and a transmission completion time of 14 ms.
Aspects of the present disclosure can utilize de-bounce logic if the length or completion time difference between layers is less than a threshold, and release layers at the same time when the length or completion time difference is within the threshold. For instance, if layers 0 and 1 have similar lengths and transmission completion times, e.g., layer 0 has a length of 200 and a transmission completion time of 1.197 ms and layer 1 has a length of 260 and a transmission completion time of 1.555 ms, the present disclosure can use de-bounce logic and combine the release times for these layers. Additionally, the present disclosure can write 260 to the pointer or IRQ. When the pointer or IRQ is received, layers 0 and 1 can be released. Also, the present disclosure can write 2100 to the pointer or IRQ. When the pointer or IRQ is received, layer 2 can be released. In some aspects, the legacy frame IRQ can release layer 3.
In another aspect, the present disclosure can add a kernel thread to check the current write pointer line counter in the display engine tear check hardware block or in another hardware block. This kernel thread can signal the corresponding layer release if the write pointer line counter exceeds the layer position or size limit. Also, de-bounce logic for similarly positioned layers can be applied. This implementation can use kernel software thread polling on the write pointer line counters, which can be computed using layer position coordinates. This can reduce the IRQ, but may increase the CPU loading. Aspects of the present disclosure can also utilize de-bounce logic to combine similar layer position values to reduce the impact on CPU loading.
Similar to the example above, a frame may have layers 0, 1, 2, 3, a frame length of 2340, and a frame transmission time of 14 ms. Layer 0 may have a length of 200 and a transmission completion time of 1.197 ms, layer 1 may have a length of 260 and a transmission completion time of 1.555 ms, layer 2 may have a length of 2100 and a transmission completion time of 12.56 ms, and layer 3 may have a length of 2340 and a transmission completion time of 14 ms.
In this aspect, the present disclosure can de-bounce the time threshold, e.g., if the time threshold is less than 1 ms, and release layers within the time threshold at the same time. As layers 0 and 1 have lengths with a small difference, the present disclosure can use de-bounce logic and combine the release times for these layers. Additionally, the present disclosure can wake up at a certain time, e.g., 1.555 ms + 0.2 ms, and check the line counter register. If the line counter is greater than a threshold, e.g., 260, then layer 0 and 1 can be released. Also, the present disclosure can wake up at another time, e.g., 12.56 ms + 0.2 ms, and check the line counter register. If the line counter is greater than a threshold, e.g., 2100, then layer 2 can be released. The legacy frame IRQ can then release layer 3.
FIG. 8 illustrates flowchart 800 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by a DPU, a display engine, GPU, CPU, or apparatus for display or graphics processing. At 802, the apparatus can determine display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. At 804, the apparatus can determine a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or  more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
At 806, the apparatus can calculate at least one synchronization divider for each of the one or more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. At 808, the apparatus can synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. In some aspects, at least one synchronization divider for a first subgroup of the one or more subgroups of pixels can correspond to at least one synchronization divider for a second subgroup of the one or more subgroups of pixels when the difference between the position of the first subgroup and the position of the second subgroup is less than a position threshold, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. Additionally, each of the one or more subgroups of pixels can correspond to a buffer, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
At 810, the apparatus can send or receive at least one signal corresponding to each of the one or more subgroups of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. Further, each of the one or more subgroups of pixels can include a release time, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. At 812, the apparatus can release a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
In some aspects, a buffer corresponding to a first subgroup of the one or more subgroups of pixels and a buffer corresponding to a second subgroup of the one or more subgroups of pixels can be released simultaneously when the difference between the release time of the first subgroup and the release time of the second subgroup is less than a time threshold, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. Also, the buffer corresponding to each subgroup of pixels can be reassigned when the buffer is released, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
In some aspects, the release time of each of the one or more subgroups of pixels can correspond to a display engine release time, where the display engine release time is a time when a display engine finishes composing the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. The release time of each of the one or more subgroups of pixels can also correspond to a GPU release time or  a CPU release time, where the GPU release time or the CPU release time is a time when a GPU or CPU finishes rendering the subgroup of pixels, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7. Moreover, each of the one or more subgroups of pixels can be synchronized between a display engine and a GPU or CPU, as described in connection with the examples in FIGs. 3, 4, 5, 6, and 7.
In one configuration, a method or apparatus for display or graphics processing is provided. The apparatus may be a DPU, a display engine, a GPU, a CPU, or some other processor that can perform display or graphics processing. In one aspect, the apparatus may be the processing unit 120 within the device 104, or may be some other hardware within device 104 or another device. The apparatus may include means for determining display content for a group of pixels in a frame, where the group of pixels includes one or more subgroups of pixels. The apparatus may also include means for calculating at least one synchronization divider for each of the one or more subgroups of pixels. The apparatus may also include means for synchronizing each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels. The apparatus may also include means for determining a position of each of the one or more subgroups of pixels, where the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels. The apparatus may also include means for sending or receiving at least one signal corresponding to each of the one or more subgroups of pixels. The apparatus may also include means for releasing a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
The subject matter described herein can be implemented to realize one or more benefits or advantages. For instance, the described display or graphics processing techniques can be used by display engines, GPUs, or CPUs to reduce the processing time and/or power used. This can also be accomplished at a low cost compared to other display or graphics processing techniques. Moreover, the display or graphics processing techniques herein can improve or speed up the processing or execution time. Further, the graphics processing techniques herein can improve the resource or data utilization and/or resource efficiency.
In accordance with this disclosure, the term “or” may be interrupted as “and/or” where context does not dictate otherwise. Additionally, while phrases such as “one or more” or “at least one” or the like may have been used for some features disclosed herein  but not others, the features for which such language was not used may be interpreted to have such a meaning implied where context does not dictate otherwise.
In one or more examples, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. For example, although the term “processing unit” has been used throughout this disclosure, such processing units may be implemented in hardware, software, firmware, or any combination thereof. If any function, processing unit, technique described herein, or other module is implemented in software, the function, processing unit, technique described herein, or other module may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media, which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, . Disk and disc, as used herein, includes compact disc (CD) , laser disc, optical disc, digital versatile disc (DVD) , floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. A computer program product may include a computer-readable medium.
The code may be executed by one or more processors, such as one or more digital signal processors (DSPs) , general purpose microprocessors, application specific integrated circuits (ASICs) , arithmetic logic units (ALUs) , field programmable logic arrays (FPGAs) , or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor, ” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs, e.g., a chip set. Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need realization by different hardware units. Rather, as described above, various units may be combined in any hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (27)

  1. A method of display processing, comprising:
    determining display content for a group of pixels in a frame, wherein the group of pixels includes one or more subgroups of pixels;
    calculating at least one synchronization divider for each of the one or more subgroups of pixels; and
    synchronizing each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels.
  2. The method of claim 1, further comprising:
    determining a position of each of the one or more subgroups of pixels, wherein the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels.
  3. The method of claim 2, wherein at least one synchronization divider for a first subgroup of the one or more subgroups of pixels corresponds to at least one synchronization divider for a second subgroup of the one or more subgroups of pixels when the difference between the position of the first subgroup and the position of the second subgroup is less than a position threshold.
  4. The method of claim 1, wherein each of the one or more subgroups of pixels corresponds to a buffer.
  5. The method of claim 1, wherein synchronizing each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels further comprises:
    sending or receiving at least one signal corresponding to each of the one or more subgroups of pixels.
  6. The method of claim 1, wherein each of the one or more subgroups of pixels includes a release time.
  7. The method of claim 6, further comprising:
    releasing a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
  8. The method of claim 7, wherein a buffer corresponding to a first subgroup of the one or more subgroups of pixels and a buffer corresponding to a second subgroup of the one or more subgroups of pixels are released simultaneously when the difference between the release time of the first subgroup and the release time of the second subgroup is less than a time threshold.
  9. The method of claim 7, wherein the buffer corresponding to each subgroup of pixels is reassigned when the buffer is released.
  10. The method of claim 6, wherein the release time of each of the one or more subgroups of pixels corresponds to a display engine release time, wherein the display engine release time is a time when a display engine finishes composing the subgroup of pixels.
  11. The method of claim 6, wherein the release time of each of the one or more subgroups of pixels corresponds to a graphics processing unit (GPU) release time or a central processing unit (CPU) release time, wherein the GPU release time or the CPU release time is a time when a GPU or CPU finishes rendering the subgroup of pixels.
  12. The method of claim 1, wherein each of the one or more subgroups of pixels are synchronized between a display engine and a graphics processing unit (GPU) or central processing unit (CPU) .
  13. An apparatus for display processing, comprising:
    a memory; and
    at least one processor coupled to the memory and configured to:
    determine display content for a group of pixels in a frame, wherein the group of pixels includes one or more subgroups of pixels;
    calculate at least one synchronization divider for each of the one or more subgroups of pixels; and
    synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels.
  14. The apparatus of claim 13, wherein the at least one processor is further configured to:
    determine a position of each of the one or more subgroups of pixels, wherein the position of each of the one or more subgroups of pixels includes one or more coordinates of the one or more subgroups of pixels.
  15. The apparatus of claim 14, wherein at least one synchronization divider for a first subgroup of the one or more subgroups of pixels corresponds to at least one synchronization divider for a second subgroup of the one or more subgroups of pixels when the difference between the position of the first subgroup and the position of the second subgroup is less than a position threshold.
  16. The apparatus of claim 13, wherein each of the one or more subgroups of pixels corresponds to a buffer.
  17. The apparatus of claim 13, wherein to synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels further comprises the at least one processor configured to:
    send or receive at least one signal corresponding to each of the one or more subgroups of pixels.
  18. The apparatus of claim 13, wherein each of the one or more subgroups of pixels includes a release time.
  19. The apparatus of claim 18, wherein the at least one processor is further configured to:
    release a buffer corresponding to each of the one or more subgroups of pixels at the release time of the subgroup of pixels.
  20. The apparatus of claim 19, wherein a buffer corresponding to a first subgroup of the one or more subgroups of pixels and a buffer corresponding to a second subgroup of the one or more subgroups of pixels are released simultaneously when the difference between  the release time of the first subgroup and the release time of the second subgroup is less than a time threshold.
  21. The apparatus of claim 19, wherein the buffer corresponding to each subgroup of pixels is reassigned when the buffer is released.
  22. The apparatus of claim 18, wherein the release time of each of the one or more subgroups of pixels corresponds to a display engine release time, wherein the display engine release time is a time when a display engine finishes composing the subgroup of pixels.
  23. The apparatus of claim 18, wherein the release time of each of the one or more subgroups of pixels corresponds to a graphics processing unit (GPU) release time or central processing unit (CPU) release time, wherein the GPU release time or the CPU release time is a time when a GPU or CPU finishes rendering the subgroup of pixels.
  24. The apparatus of claim 13, wherein each of the one or more subgroups of pixels are synchronized between a display engine and a GPU or CPU.
  25. The apparatus of claim 13, wherein the apparatus is a display.
  26. The apparatus of claim 13, wherein the apparatus is a wireless communication device.
  27. A computer-readable medium storing computer executable code for display processing, comprising code to:
    determine display content for a group of pixels in a frame, wherein the group of pixels includes one or more subgroups of pixels;
    calculate at least one synchronization divider for each of the one or more subgroups of pixels; and
    synchronize each of the one or more subgroups of pixels based on the at least one synchronization divider for the subgroup of pixels.
PCT/CN2019/104557 2019-09-05 2019-09-05 Methods and apparatus for graphics and display pipeline management WO2021042331A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/104557 WO2021042331A1 (en) 2019-09-05 2019-09-05 Methods and apparatus for graphics and display pipeline management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/104557 WO2021042331A1 (en) 2019-09-05 2019-09-05 Methods and apparatus for graphics and display pipeline management

Publications (1)

Publication Number Publication Date
WO2021042331A1 true WO2021042331A1 (en) 2021-03-11

Family

ID=74853062

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/104557 WO2021042331A1 (en) 2019-09-05 2019-09-05 Methods and apparatus for graphics and display pipeline management

Country Status (1)

Country Link
WO (1) WO2021042331A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060170691A1 (en) * 2005-01-28 2006-08-03 Ruen-Rone Lee Apparatus and method for frame buffer control
US7522167B1 (en) * 2004-12-16 2009-04-21 Nvidia Corporation Coherence of displayed images for split-frame rendering in multi-processor graphics system
CN102968974A (en) * 2012-12-10 2013-03-13 深圳市华星光电技术有限公司 Liquid crystal display and display driving method thereof
CN106098022A (en) * 2016-06-07 2016-11-09 北京小鸟看看科技有限公司 A kind of method and apparatus shortening picture delay
CN106095366A (en) * 2016-06-07 2016-11-09 北京小鸟看看科技有限公司 A kind of shorten the method for picture delay, device and virtual reality device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7522167B1 (en) * 2004-12-16 2009-04-21 Nvidia Corporation Coherence of displayed images for split-frame rendering in multi-processor graphics system
US20060170691A1 (en) * 2005-01-28 2006-08-03 Ruen-Rone Lee Apparatus and method for frame buffer control
CN102968974A (en) * 2012-12-10 2013-03-13 深圳市华星光电技术有限公司 Liquid crystal display and display driving method thereof
CN106098022A (en) * 2016-06-07 2016-11-09 北京小鸟看看科技有限公司 A kind of method and apparatus shortening picture delay
CN106095366A (en) * 2016-06-07 2016-11-09 北京小鸟看看科技有限公司 A kind of shorten the method for picture delay, device and virtual reality device

Similar Documents

Publication Publication Date Title
US10748239B1 (en) Methods and apparatus for GPU context register management
US11625806B2 (en) Methods and apparatus for standardized APIs for split rendering
US11037358B1 (en) Methods and apparatus for reducing memory bandwidth in multi-pass tessellation
US20200311859A1 (en) Methods and apparatus for improving gpu pipeline utilization
WO2021000220A1 (en) Methods and apparatus for dynamic jank reduction
WO2021151228A1 (en) Methods and apparatus for adaptive frame headroom
US11055808B2 (en) Methods and apparatus for wave slot management
US11574380B2 (en) Methods and apparatus for optimizing GPU kernel with SIMO approach for downscaling utilizing GPU cache
US20210358079A1 (en) Methods and apparatus for adaptive rendering
US20220013087A1 (en) Methods and apparatus for display processor enhancement
US20230040998A1 (en) Methods and apparatus for partial display of frame buffers
WO2021042331A1 (en) Methods and apparatus for graphics and display pipeline management
WO2021096883A1 (en) Methods and apparatus for adaptive display frame scheduling
US20200380745A1 (en) Methods and apparatus for viewpoint visibility management
WO2021000226A1 (en) Methods and apparatus for optimizing frame response
US12002142B2 (en) Performance overhead optimization in GPU scoping
US11087431B2 (en) Methods and apparatus for reducing draw command information
US11893654B2 (en) Optimization of depth and shadow pass rendering in tile based architectures
WO2021196175A1 (en) Methods and apparatus for clock frequency adjustment based on frame latency
US12033603B2 (en) Methods and apparatus for plane planning for overlay composition
US11373267B2 (en) Methods and apparatus for reducing the transfer of rendering information
WO2021142780A1 (en) Methods and apparatus for reducing frame latency
US11176734B1 (en) GPU hardware-based depth buffer direction tracking
US20220172695A1 (en) Methods and apparatus for plane planning for overlay composition
US20220284536A1 (en) Methods and apparatus for incremental resource allocation for jank free composition convergence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19944394

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19944394

Country of ref document: EP

Kind code of ref document: A1