WO2021142780A1 - Methods and apparatus for reducing frame latency - Google Patents

Methods and apparatus for reducing frame latency Download PDF

Info

Publication number
WO2021142780A1
WO2021142780A1 PCT/CN2020/072777 CN2020072777W WO2021142780A1 WO 2021142780 A1 WO2021142780 A1 WO 2021142780A1 CN 2020072777 W CN2020072777 W CN 2020072777W WO 2021142780 A1 WO2021142780 A1 WO 2021142780A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
rendering
duration
tasks
compositing
Prior art date
Application number
PCT/CN2020/072777
Other languages
French (fr)
Inventor
Zhibin Wang
Yanshan WEN
Bingfei LI
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Priority to PCT/CN2020/072777 priority Critical patent/WO2021142780A1/en
Publication of WO2021142780A1 publication Critical patent/WO2021142780A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/001Arbitration of resources in a display system, e.g. control of access to frame buffer by video controller and/or main processor
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/18Timing circuits for raster scan displays
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/363Graphics controllers
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2320/00Control of display operating conditions
    • G09G2320/02Improving the quality of display appearance
    • G09G2320/0252Improving the response speed
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2360/00Aspects of the architecture of display systems
    • G09G2360/08Power processing, i.e. workload management for processors involved in display operations, such as CPUs or GPUs

Definitions

  • the present disclosure relates generally to processing systems and, more particularly, to one or more techniques for graphics or display processing.
  • GPUs graphics processing unit
  • Such computing devices may include, for example, computer workstations, mobile phones such as so-called smartphones, embedded systems, personal computers, tablet computers, and video game consoles.
  • GPUs execute a graphics processing pipeline that includes one or more processing stages that operate together to execute graphics processing commands and output a frame.
  • An application processor or a central processing unit (CPU) may control the operation of the GPU by issuing one or more graphics processing commands to the GPU.
  • Modern day CPUs are typically capable of concurrently executing multiple applications, each of which may need to utilize the GPU during execution.
  • a user’s experience on a computing device can be affected by how smoothly the user interface (UI) animation runs on the device for any particular application.
  • UI user interface
  • an application may generate a frame rendering instruction to facilitate rendering a frame for display.
  • frame latency associated with when the frame rendering instruction is generated and when the corresponding rendered frame is presented. Accordingly, there has developed an increased need for reducing frame latency for presenting graphical content on displays.
  • the apparatus may be an application processor, a CPU, a graphics processor, a graphics processing unit (GPU) , a display processor, a display processing unit (DPU) , or a video processor.
  • the apparatus can perform first processor rendering tasks for rendering a frame.
  • the apparatus can perform second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks.
  • the apparatus can perform compositing rendering tasks for rendering frame at a second start time based on an estimated duration for performing the compositing rendering tasks.
  • the apparatus can perform display rendering tasks to display the frame.
  • the apparatus can synchronize completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks with an occurrence of a same VSYNC pulse. In some examples, the apparatus can start the performing of the display rendering tasks after the occurrence of the same VSYNC pulse.
  • the apparatus can select the first start time by estimating the duration for performing the second processor rendering tasks.
  • the apparatus can also determine a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration.
  • the apparatus can also determine the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks.
  • the apparatus can estimate the duration for performing the second processor rendering tasks by sampling second processor rendering tasks durations for a quantity of previous frames, selecting a sampled second processor rendering tasks duration, and adding a padding duration to the sampled second processor rendering tasks duration.
  • the apparatus can select the sampled second processor rendering tasks duration by identifying a maximum duration of the sampled second processor rendering tasks durations.
  • the apparatus can select the sampled second processor rendering tasks duration by calculating an average duration of the sampled second processor rendering tasks durations. In some examples, the apparatus can sample the second processor rendering tasks duration for a first previous frame by calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame.
  • the apparatus can select the second start time by estimating the duration for performing the compositing rendering tasks, determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration, and determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse.
  • the apparatus can estimate the duration for performing the compositing rendering tasks by sampling compositing rendering tasks durations for a quantity of previous frames, selecting a sampled compositing rendering tasks duration, and adding a padding duration to the sampled compositing rendering tasks duration.
  • the apparatus may perform the first processor rendering tasks with an application processor or a CPU. In some examples, the apparatus may perform the second processor rendering tasks with a graphics processor or a GPU.
  • FIG. 1 is an example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline.
  • FIG. 2 is another example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline.
  • FIG. 3 is a block diagram that illustrates an example device, in accordance with one or more techniques of this disclosure.
  • FIG. 4 is an example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline, in accordance with one or more techniques of this disclosure.
  • FIG. 5 is another example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline, in accordance with one or more techniques of this disclosure.
  • FIGs. 6 to 13 illustrate example flowcharts of example methods, in accordance with one or more techniques of this disclosure.
  • FIG. 14 is a block diagram that illustrates an example content generation system, in accordance with one or more techniques of this disclosure.
  • an apparatus may include a rendering pipeline to facilitate rendering a frame and for the presentment of the rendered frame.
  • an application such as a game
  • executing via the apparatus may generate a rendering instruction to facilitate rendering a frame.
  • a first stage of the rendering pipeline may include an application rendering stage to process an application rendering workload based on the rendering instruction.
  • a processing unit or a component (s) of the processing unit
  • the application rendering workload may be split between an application processor (e.g., a CPU) and a graphics processor (e.g., a GPU) .
  • the application processor may perform an application processor rendering task (or tasks) associated with the application rendering workload and the graphics processor may perform a graphics processor rendering task (or tasks) associated with the application rendering workload.
  • the application processor rendering task may include the application processor generating one or more rendering commands for execution by the graphics processor and/or the application processor generating a rendered frame based on the rendering instruction.
  • the graphics processor rendering task may include the graphics processor executing the one or more rendering commands and/or the graphics processor performing post-processing techniques on the rendered frame generated by the application processor.
  • the example rendering pipeline may also include a second stage during which composition on the rendered frame may be performed.
  • a compositing component may perform a rendering task to facilitate performing composition on the rendered frame.
  • the compositing component may also provide information to a display to facilitate the presentment of the rendered frame.
  • the rendering task performed by the compositing component may cause the compositing component to provide information regarding a buffer for presentment via the display.
  • the compositing component may identify a particular buffer storing the rendered frame to the display for presentment.
  • the example rendering pipeline may also include a third stage during which presentment of the rendered frame may be performed.
  • the display may perform a rendering task to facilitate the presentment of the rendered frame.
  • the display may monitor the buffer identified by the compositing component to determine when the rendering of the frame is complete and the rendered frame is ready for presentment. For example, the display may wait until the graphics processor completes performing the graphics processor rendering task before the display attempts to present the corresponding frame.
  • the term “render” may refer to 3D rendering and/or 2D rendering.
  • the graphics processor may utilize OpenGL instructions to render 3D graphics surfaces, or may utilize OpenVG instructions to render 2D graphics surfaces.
  • any standards, methods, or techniques for rendering graphics may be utilized by the graphics processor.
  • FIG. 1 is an example timing diagram 100 depicting active periods for an application processor 102, a graphics processor 104, a compositing component 106, and a display 108 operating on a frame (frame A) in a rendering pipeline.
  • the timing diagram 100 illustrates example stages of an example rendering pipeline for rendering a frame.
  • an application executing on the application processor 102 may generate a rendering instruction 110 for rendering a frame (e.g., the frame A) .
  • the application processor 102 and the graphics processor 104 may split certain of the rendering tasks associated with the rendering instruction.
  • the application processor 102 may perform application processor rendering tasks 112 and the graphics processor 104 may perform graphics processor rendering tasks 114.
  • the application processor 102 may be configured to perform the application processor rendering tasks 112 by generating rendering commands for the graphics processor 104 based on the rendering instruction and/or performing some level of rendering on a frame.
  • the graphics processor 104 begins performing the graphics processor rendering tasks 114 for the frame.
  • the graphics processor 104 may be configured to execute the rendering commands generated by the application processor 102 and/or perform post-processing techniques on the frame rendered by the application processor 102.
  • the application processor rendering task 112 may include a command (e.g., a “swap buffer” command) that instructs the graphics processor 104 to begin performing the graphics processor rendering tasks 114.
  • the compositing component 106 begins performing compositing tasks 116 after the application processor 102 completes performing the application processor rendering tasks 112.
  • the compositing tasks 116 may include configuring the display 108 to display a composited frame.
  • the application processor rendering tasks 112 executed by the application processor 102 may include a command (e.g., a “queue buffer” command) that indicates to the compositing component 106 that there is a frame being prepared for display (e.g., the application processor 102 has provided rendering commands to the graphics processor 104 for generating a rendered frame) .
  • the compositing component 106 may then configure the display 108 for the displaying of the corresponding frame by passing information regarding the respective buffer to the display 108.
  • the graphics processor 104 may store the output of the graphics processor rendering tasks 114 in a first frame buffer and the performing of the compositing tasks 116 may include the compositing component 106 providing information identifying the first frame buffer to the display 108.
  • the display 108 begins performing display rendering tasks 118 after the compositing component 106 completes performing the compositing tasks 116.
  • the display rendering tasks 118 may include determining when the graphics processor 104 has completed performing the graphics processor rendering tasks 114 (e.g., by monitoring the buffer identified by the compositing component 106 (e.g., via the compositing tasks 116) ) and displaying the rendered frame (e.g., frame A) .
  • VSYNC pulses 120 are indicated by vertical lines in the figure. It should be appreciated that the VYSNC pulses 120 may be associated with a periodicity based on the refresh rate of the display 108. For example, a display with a 60 Hz refresh rate may have a VSYNC pulse period of 16.67 ms (e.g., 1/60) . That is, a duration between a first VSYNC pulse 120a and a second VSYNC pulse 120b may be 16.67 ms.
  • a “VSYNC” is a pulse within a computing system that synchronizes certain events to the refresh cycle of the display. Applications may start drawing on a VSYNC boundary, and a compositing component (hardware or software) may start compositing on VSYNC boundaries. This allows for smooth application rendering (time-based animation) synchronized by the periodicity of the VSYNC pulse.
  • the VSYNC pulses 120 may be generated by the display 108.
  • the display 108 may generate a VSYNC pulse 120 after completing the performing of the display rendering tasks 118 (and/or as a step of the display rendering tasks 118) .
  • the VSYNC pulses 120 may instruct the application processor 102 to begin performing application processor rendering tasks 112 for a subsequent frame.
  • the performing of the application processor rendering tasks 112 is initiated when the rendering instruction 110 is received (e.g., by the application processor 102 from an application) .
  • the performing of the graphics processor rendering tasks 114 is initiated when the application processor rendering tasks 112 are completed.
  • the application processor rendering tasks 112 may include a command (e.g., a “swap buffer” command) that instructs the graphics processor 104 to begin performing the graphics processor rendering tasks 114.
  • the performing of the compositing tasks 116 and the performing of the display rendering tasks 118 are synchronized with VSYNC pulses 120.
  • the compositing component 106 may wait until the next VSYNC pulse (e.g., a second VSYNC pulse 120b) after the application processor 102 completes performing the application processor rendering tasks 112 before performing the compositing tasks 116.
  • the display rendering tasks 118 may include the presentment of the rendered frame output by the graphics processor 104.
  • the display 108 uses information provided by the compositing component 106 to determine which buffer to monitor (e.g., information provided via the compositing tasks 116) . Accordingly, the display 108 may wait until the next VSYNC pulse after the graphics processor 104 performs the graphics processor rendering tasks 114 and the compositing component 106 performs the compositing tasks 116 before performing the display rendering tasks 118. For example, in the illustrated example of FIG. 1, the display 108 waits until a third VSYNC pulse 120c before performing the display rendering tasks 118.
  • a frame latency 150 corresponds to the period between when the frame A is displayed (e.g., at a fourth VSYNC pulse 120d) and when the rendering instruction 110 was received by the application processor 102.
  • the frame latency 150 includes at least three VSYNC pulse periods and the duration between when the rendering instruction 110 is received and the first VSYNC pulse 120a.
  • FIG. 2 is another example timing diagram 200 depicting active periods for an application processor 202, a graphics processor 204, a compositing component 206, and a display 208 operating on a frame (frame A) in a rendering pipeline.
  • One or more aspects of the application processor 202 may be implemented by the application processor 102 of FIG. 1.
  • One or more aspects of the graphics processor 204 may be implemented by the graphics processor 104 of FIG. 1.
  • One or more aspects of the compositing component 206 may be implemented by the compositing component 106 of FIG. 1.
  • One or more aspects of the display 208 may be implemented by the display 108 of FIG. 1.
  • the example timing diagram 200 of FIG. 2 is similar to the example timing diagram 100 of FIG. 1 and includes application processor rendering tasks 212 executed by the application processor 202, graphics processor rendering tasks 214 executed by the graphics processor 204, compositing tasks 216 executed by the compositing component 206, and display rendering tasks 218 executed by the display 208. Furthermore, as shown in FIG. 2, the application processor rendering tasks 212 start when a rendering instruction 210 is received by the application processor 202 and the application processor rendering tasks 212 complete after a first VSYNC pulse 220a.
  • the graphics processor rendering tasks 214 As the performing of the graphics processor rendering tasks 214 is triggered via the completion of the application processor rendering tasks 212 (e.g., via a “swap buffer” command of application processor rendering tasks 212) , the graphics processor rendering tasks 214 start after the completion of the application processor rendering tasks 212.
  • the duration of the graphics processor rendering tasks 214 e.g., the interval between when the graphics processor rendering tasks 214 complete and when the graphics processor rendering tasks 214 begin) causes the completion of the graphics processor rendering tasks 214 to occur after a second VSYNC pulse 220b.
  • the performing of the compositing tasks 216 may also be triggered by the performing of the application processor rendering tasks 212.
  • the application processor rendering tasks 212 executed by the application processor 202 may include a command (e.g., a “queue buffer” command) that indicates to the compositing component 206 that there is a frame being prepared for display (e.g., the application processor 202 has provided rendering commands to the graphics processor 204 for generating a rendered frame) .
  • the compositing component 206 may then perform the compositing tasks 216 to configure the display 208 for the displaying of the corresponding frame by passing information regarding the respective buffer to the display 208.
  • the performing of the compositing tasks 216 by the compositing component 206 may start at the next VSYNC pulse (e.g., the second VSYNC pulse 220b) after the application processor 202 performs the application processor rendering tasks 212 before performing the compositing tasks 216.
  • the next VSYNC pulse e.g., the second VSYNC pulse 220b
  • the display rendering tasks 218 may include the presentment of the rendered frame output by the graphics processor 204. However, the display 208 uses information provided by the compositing component 206 to determine which buffer to monitor (e.g., information provided via the compositing tasks 216) . Accordingly, the display 208 may wait until the next VSYNC pulse after the graphics processor 204 performs the graphics processor rendering tasks 214 and the compositing component 206 performs the compositing tasks 216 before performing the display rendering tasks 218. For example, in the illustrated example of FIG. 2, the display 208 waits until a third VSYNC pulse 220c before performing the display rendering tasks 218.
  • the example timing diagram 200 of FIG. 2 includes a frame latency 250 corresponding to the period between when the frame A is displayed (e.g., at a fourth VSYNC pulse 220d) and when the rendering instruction 210 was received by the application processor 202.
  • the frame latency 250 includes at least three VSYNC pulse periods and the duration between when the rendering instruction 210 is received and the first VSYNC pulse 220a.
  • the start of the graphics processor rendering tasks may be triggered by the completion of the application processor rendering tasks and is not synchronized with a VSYNC pulse.
  • the start of the compositing tasks may also be triggered by the completion of the application processor rendering tasks, but is synchronized with a VSYNC pulse (e.g., the compositing component starts performing the compositing tasks at the next VSYNC pulse after the application processor rendering tasks are complete) .
  • the start of the display rendering tasks is performed after the graphics processor rendering tasks and the compositing tasks are complete, but is also synchronized with a VSYNC pulse (e.g., the display start performing the display rendering tasks at the next VSYNC pulse after the graphics processor rendering tasks are complete and the compositing tasks are complete) .
  • the performing of the compositing tasks 216 may include the compositing component 206 being configured to operate in a work mode 216a and an idle mode 216b.
  • the compositing component 206 may be configured to operate in the work mode 216a at the start of the second VSYNC pulse 220b and may correspond to the duration of the compositing tasks 216 during which the compositing component 206 is performing the compositing tasks 216.
  • the compositing component 206 may be configured to operate in the idle mode 216b after the work mode 216a and may correspond to the remaining duration of the respective VSYNC pulse period (e.g., the period between the second VSYNC pulse 220b and the third VSYNC pulse 220c) during which the compositing component 206 is not performing compositing tasks 216.
  • the respective VSYNC pulse period e.g., the period between the second VSYNC pulse 220b and the third VSYNC pulse 220c
  • the example rendering pipeline depicted in the example timing diagrams 100 and 200 of FIGs. 1 and 2, respectively, may include inefficiencies resulting in a relatively large frame latency.
  • a relatively large duration corresponding to the idle mode 216b associated with the compositing tasks 216 may result in a relatively large frame latency.
  • a relatively large frame latency may be caused by a relatively large gap between when the graphics processor completes the graphics processor rendering tasks and the display is able to start the display rendering tasks.
  • relatively large frame latency between when an application generates a rendering instruction for a frame and the corresponding frame is displayed via a display may result in a decreased user experience.
  • a user may be playing a game on a computing device and the game may cause a sequence of frames to be rendered.
  • the game may generate a rendering instruction to render a first frame, but the computing device may be unable to display the corresponding first frame for three or more VSYNC pulse periods. Meanwhile, the game may continue to generate rendering instructions to render subsequent frames of the sequence of frames.
  • the frame that the user is interacting with may not be the correct frame for processing purposes.
  • the frame latency 250 associated with the rendering pipeline of FIG. 2 may be four VYSNC pulse periods.
  • the game may generate the rendering instruction 210 for the frame A during a first VSYNC pulse period, but the corresponding frame A may not be displayed by the display 208 until a fourth VYSNC pulse period.
  • the game may be processing a different subsequent frame (e.g., a frame D) . Accordingly, if the user provides a touch input during the fourth VSYNC pulse period, the game applies the touch input to the most recent frame (e.g., the frame D) , which may be a frame that the user has not been presented. Thus, the touch input is applied to an incorrect frame, which may result in a negative user experience for the user.
  • Examples disclosed herein provide techniques for reducing frame latency by improving the rendering pipeline.
  • disclosed techniques may modify the performing of the application processor rendering tasks to include an application processor sleep interval prior to the performing of the command (e.g., a “swap buffer” command) that instructs the graphics processor to begin performing the graphics processor rendering tasks.
  • the application processor sleep interval may be selected so that the completion of the graphics processor rendering tasks aligns with a VSYNC pulse.
  • disclosed techniques may estimate a duration for completing the graphics processor rendering tasks for a frame.
  • Disclosed techniques may then select the application processor sleep interval and modify the performing of the application processor rendering tasks for the frame (e.g., via the application processor sleep interval) so that the start of the graphics processor rendering tasks for the frame is delayed, which may result in the completion of the graphics processor rendering tasks for the frame to align with a VSYNC pulse.
  • Example techniques disclosed herein may also modify the performing of the compositing tasks by delaying the start of the work mode of the compositing component during the performing of the compositing tasks during a VSYNC pulse period. For example, disclosed techniques may modify the performing of the compositing tasks to cause the compositing component to operate first in the idle mode for an compositing sleep duration and then operate in the work mode so that the completion of the compositing tasks aligns with a VSYNC pulse.
  • disclosed techniques may estimate a duration for completing the compositing tasks for a frame.
  • Disclosed techniques may then select the compositing sleep duration and modify the performing of the compositing tasks for the frame (e.g., via the compositing sleep duration) so that the start of the work mode of the compositing component for performing the compositing tasks for the frame is delayed, which may result in the completion of the compositing tasks for the frame to align with a VSYNC pulse.
  • the disclosed techniques may cause the completion of the compositing tasks and the graphics processor rendering tasks to align with the same VSYNC pulse.
  • the gap between when the graphics processor completes performing the graphics processor rendering tasks and the performing of the display rendering tasks may be reduced.
  • disclosed techniques facilitate aligning the compositing completion of the compositing tasks and the graphics processor rendering tasks
  • the display may begin performing the display rendering tasks, including the presentment of the corresponding rendered frame. Accordingly, disclosed techniques may facilitate reducing frame latency due to the rendering pipeline from, for example, four or more VSYNC pulse periods to, for example, two or three VSYNC pulse periods.
  • processors include microprocessors, microcontrollers, graphics processors, graphics processing units (GPUs) , general purpose GPUs (GPGPUs) , central processing units (CPUs) , application processors, digital signal processors (DSPs) , reduced instruction set computing (RISC) processors, systems-on-chip (SOC) , baseband processors, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , programmable logic devices (PLDs) , state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure.
  • processors include microprocessors, microcontrollers, graphics processors, graphics processing units (GPUs) , general purpose GPUs (GPGPUs) , central processing units (CPUs) , application processors, digital signal processors (DSPs) , reduced instruction set computing (RISC) processors, systems-on-chip (SOC) , baseband processors, application specific integrated circuit
  • One or more processors in the processing system may execute software.
  • Software can be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
  • the term application may refer to software.
  • one or more techniques may refer to an application (e.g., software) being configured to perform one or more functions.
  • the application may be stored on a memory (e.g., on-chip memory of a processor, memory, system memory, or any other memory) .
  • Hardware described herein such as a processor may be configured to execute the application.
  • the application may be described as including code that, when executed by the hardware, causes the hardware to perform one or more techniques described herein.
  • the hardware may access the code from a memory and execute the code accessed from the memory to perform one or more techniques described herein.
  • components are identified in this disclosure.
  • the components may be hardware, software, or a combination thereof.
  • the components may be separate components or sub-components of a single component.
  • the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer.
  • such computer-readable media can comprise a random access memory (RAM) , a read-only memory (ROM) , an electrically erasable programmable ROM (EEPROM) , optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable ROM
  • optical disk storage magnetic disk storage
  • magnetic disk storage other magnetic storage devices
  • combinations of the aforementioned types of computer-readable media or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
  • examples disclosed herein provide techniques for reducing frame latency.
  • Example techniques may improve performance and reduce power consumption by reducing the interval between a rendering instruction for a frame being generated and the corresponding frame being presented.
  • disclosed techniques may estimate a duration for performing graphics processor rendering tasks for a frame and modify the performing of the application processor rendering tasks for the frame to delay the start of the graphics processor rendering tasks so that the completion of the graphics processor rendering tasks aligns with a VSYNC pulse.
  • Disclosed techniques may also estimate a duration for performing compositing tasks for the frame and cause the compositing component to first operate in an idle mode for a duration so that the duration during which the compositing component operates in the work mode results in the completion of the compositing tasks to align with the same VSYNC pulse as the graphics processor rendering tasks.
  • examples disclosed herein provide techniques for reducing the frame latency associated with displaying a frame.
  • this disclosure describes techniques for graphics processing in any device that utilizes a rendering pipeline. Other example benefits are described throughout this disclosure.
  • instances of the term “content” may refer to “graphical content, ” “image, ” and vice versa. This is true regardless of whether the terms are being used as an adjective, noun, or other parts of speech.
  • the term “graphical content” may refer to content produced by one or more processes of a graphics processing pipeline.
  • the term “graphical content” may refer to content produced by a processing unit configured to perform graphics processing.
  • the term “graphical content” may refer to content produced by a graphics processing unit.
  • the term “display content” may refer to content generated by a processing unit configured to perform display processing.
  • the term “display content” may refer to content generated by a display processing unit.
  • Graphical content may be processed to become display content.
  • a graphics processing unit may output graphical content, such as a frame, to a buffer (which may be referred to as a framebuffer) .
  • a display processing unit may read the graphical content, such as one or more frames from the buffer, and perform one or more display processing techniques thereon to generate display content.
  • a display processing unit may be configured to perform composition on one or more rendered layers to generate a frame.
  • a display processing unit may be configured to compose, blend, or otherwise combine two or more layers together into a single frame.
  • a display processing unit may be configured to perform scaling (e.g., upscaling or downscaling) on a frame.
  • a frame may refer to a layer.
  • a frame may refer to two or more layers that have already been blended together to form the frame (e.g., the frame includes two or more layers and the frame that includes two or more layers may subsequently be blended) .
  • FIG. 3 is a block diagram illustrating components of a device 300, in accordance with aspects of this disclosure.
  • the device 300 includes an application processor 310, a memory 320, a graphics processor 330, and a display 340.
  • the application processor 310, the memory 320, the graphics processor 330, and the display 340 may be in communication via one or more busses that may be implemented using any combination of bus structures and/or bus protocols.
  • the application processor 310 may include one or more processors that are configured to execute an application 312, an application rendering component 314, and a compositing component 316.
  • the application processor 310 may be configured to execute instructions that cause the application processor 310 to perform one or more of the example techniques disclosed herein.
  • the memory 320 may store one or more commands 322 and a rendered frame buffer 324.
  • the memory 320 may also store instructions that, when executed, cause the application processor 310, the graphics processor 330, and/or the display 340 to perform one or more of the example techniques disclosed herein.
  • the graphics processor 330 may include one or more processors that are configured to render a frame.
  • the graphics processor 330 may be configured to execute one or more rendering commands to render a frame.
  • the graphics processor 330 may be configured to execute instructions that cause the graphics processor 330 to perform one or more of the example techniques disclosed herein.
  • the display 340 may include a display panel, a display client, and/or a screen to facilitate presentment of a rendered frame.
  • the display 340 may be configured to execute instructions that cause the display 340 to perform one or more example techniques disclosed herein.
  • the application 312 may be a graphics application that may use the graphics processor 330 to render one or more graphics objects into an image or frame to be displayed (e.g., via the display 340) .
  • the application 312 may include operations that are performed via a rendering pipeline.
  • the application 312 generates a rendering instruction to cause the rendering of a frame, such as example frame A of FIGs. 1 and/or 2.
  • the rendering instruction is passed from the application 312 to the application rendering component 314.
  • the application rendering component 314 may be configured to perform the application processor rendering tasks disclosed herein.
  • the application processor rendering tasks may be based on the rendering instruction received from the application 312.
  • the application rendering component 314 may be configured to analyze the rendering instruction and generate one or more rendering commands that may be executed by the graphics processor 330.
  • the application processor rendering tasks may additionally or alternatively include rendering a frame based on the rendering instruction.
  • one or more aspects of the application rendering component 314 may be implemented by an application programming interface (API) and/or a driver.
  • API application programming interface
  • an API may be a runtime service that translates the rendering instruction received from the application 312 into a format that is consumable by a driver and/or the graphics processor 330.
  • the application rendering component 314 stores the rendering commands in the commands buffer 322 of the memory 320.
  • the rendering commands may include draw call commands and/or other graphics commands to cause the graphics processor 330 to perform graphics operations to render one or more frames for presentment (e.g., via the display 340) .
  • a draw call command may instruct the graphics processor 330 to render an object defined by a group of one or more vertices stored in the memory 320 (e.g., in a vertices buffer) .
  • the geometry defined by the group of one or more vertices may, in some examples, correspond to one or more primitives (e.g., points, lines, triangles, patches, etc. ) to be rendered.
  • a draw call command may cause the graphics processor 330 to render all of the vertices stored in a section of the memory 320 (e.g., in the vertices buffer) .
  • the application rendering component 314 performing the application processor rendering tasks may include the application rendering component 314 providing a command (e.g., a “swap buffer” command) to the graphics processor 330 that instructs the graphics processor 330 to begin performing the graphics processor rendering tasks associated with a respective frame.
  • the application rendering component 314 performing the application processor rendering tasks may include the application rendering component 314 providing a command (e.g., a “queue buffer” command) to the compositing component 316 that indicates to the compositing component 316 that there is a frame being prepared for display (e.g., the application rendering component 314 provided a command to the graphics processor 330 to perform graphics processor rendering tasks to render a frame) .
  • the application rendering component 314 may be implemented via a library, such as an application processor-side-application-rendering library.
  • the application rendering component 314 may be configured to use the graphics processor 330 to provide hardware acceleration by using an application programming interface (API) , such as the OpenGLES API.
  • API application programming interface
  • the application processor 310 may use the graphics processor 330 to perform hardware accelerated application rendering.
  • the compositing component 316 may be configured to perform one or more compositing tasks disclosed herein.
  • the compositing component 316 may be configured to receive a command (e.g., the “queue buffer” command) from the application rendering component 314 that indicates to the compositing component 316 that there is a frame being prepared (e.g., rendered) for display.
  • the compositing component 316 may then configure the display 340 for the displaying of the corresponding frame by passing information regarding the respective buffer to the display 340.
  • the graphics processor 330 may store the output of the graphics processor rendering tasks in the rendered frame buffer 324 and the performing of the compositing tasks by the compositing component 316 may include the compositing component 316 providing information identifying the rendered frame buffer 324 (and/or a location of the rendered frame buffer corresponding to the rendered frame) to the display 340.
  • a compositing component refers to an analogue or digital circuit that programs display hardware to display rendered frame data or animation data to a display (e.g., the display 340) .
  • the compositing component may include an input for the rendered data and an output for the data and/or instructions to the display hardware (e.g., the display 340) .
  • the compositing component may reside in hardware or may be implemented in software running on the application processor 310.
  • a “Surface Flinger” (sometimes referred to as a “Surface Flinger component” or a “Surface Flinger engine” ) is a software equivalent of the compositing component running at a user-space level in the application processor (e.g., CPU) in the ANDROID operating system.
  • the Surface Flinger may additionally or alternatively reside at a kernel level of an application processor (e.g., a CPU) .
  • composition functionality and/or programming of the display hardware may be distributed between two or more of hardware components, software components, and/or firmware components.
  • the graphics processor 330 may be configured to perform the graphics processor rendering tasks disclosed herein.
  • the graphics processor 330 may be configured to execute the rendering commands stored in the commands buffer 322 and render a frame. It should be appreciated that in some examples, one or more aspects of the graphics processor rendering tasks may be implemented via a graphics processing pipeline.
  • the graphics processor 330 stores the rendered frame in the rendered frame buffer 324 of the memory 320.
  • the graphics processor 330 may monitor for a command (e.g., the “swap buffer” command” ) received from the application rendering component 314 before starting to perform the graphics processor rendering tasks.
  • a command e.g., the “swap buffer” command”
  • the application processor rendering tasks performed by the application rendering component 314 may include providing the swap buffer command to the graphics processor 330 after the application rendering component 314 generates and stores the rendering commands for a frame in the commands buffer 322.
  • the graphics processor 330 may monitor commands received from the application processor 310 and/or the application rendering component 314 for the swap buffer command.
  • the graphics processor 330 may be configured to generate an indication at the completion of the performing of the graphics processor rendering tasks to indicate to the application processor 310 and/or the display 340 that the rendering of the frame is complete.
  • the indication may indicate to the display 340 that a rendered frame is stored at the rendered frame buffer 324 and that the rendered frame is available for presentment.
  • the indication generated by the graphics processor 330 may be associated with a synchronization fence that is available to the application rendering component 314.
  • the respective buffer may be associated with a synchronization fence.
  • the synchronization fence may indicate to components of the device 300 that the graphics processor 330 is operating on a buffer (e.g., the rendered frame buffer 324) and to prevent the other components of the device 300 from also operating on the buffer.
  • the application rendering component 314 may also issue a command enabling a synchronization fence for the buffer.
  • the application processor 310 and the graphics processor 330 may perform tasks concurrently (e.g., in parallel or nearly in parallel) and without concerns of performing tasks that overlap on the buffer.
  • an indication may be signaled indicating that the corresponding buffer is available and that the corresponding synchronization fence is disabled.
  • the display 340 may monitor for the indication to determine when the rendered frame stored at the buffer is ready for presentment.
  • the graphics processor 330 may signal the indication indicating that the synchronization fence is disabled.
  • the display 340 may be configured to perform one or more display rendering tasks disclosed herein.
  • the display 340 may be configured to display the rendered frame.
  • the display 340 may be configured to receive information from the compositing component 316 that identifies a rendered frame buffer (and/or a location of the rendered frame buffer corresponding to the rendered frame) .
  • the display 340 may be configured to monitor for an indication that the rendering of the frame is complete and that the rendered frame is available at the identified rendered frame buffer for presentment.
  • the display 340 may monitor for an indication that a synchronization fence associated with the identified buffer is disabled.
  • the display 340 after receiving the indication that the synchronization fence is disabled, the display 340 may be configured to display the corresponding rendered frame.
  • the display 340 may access the rendered frame at the rendered frame buffer 322 for presentment.
  • the display 340 may generate a VSYNC pulse.
  • the VSYNC pulse may indicate, for example, that the corresponding buffer is available.
  • a frame may be associated with a corresponding buffer.
  • the application rendering component 314 may designate a rendered frame buffer 324 for storing the rendered frame.
  • the graphics processor 330 may store the rendered frame in the designated rendered frame buffer 324.
  • the compositing component 316 may provide information to the display 340 that identifies the designated rendered frame buffer 324.
  • the display 340 may monitor for an indication that the designated rendered frame buffer 324 is available for presentment. In some such examples, generating the VSYNC pulse after the presentment of the rendered frame enables the application rendering component 314 to determine that the designated rendered frame buffer 324 may be designated for storing a subsequent rendered frame.
  • the generating of the VSYNC pulse may be a periodic occurrence, may be an a-periodic occurrence, may be a one-time occurrence, and/or may be an event-based occurrence.
  • the occurrences of the VYSNC pulses may be associated with a periodicity based on the refresh rate of the display 340.
  • a display with a 60 Hz refresh rate may have a VSYNC pulse period of 16.67 milliseconds (ms) (e.g., 1/60) . That is, a duration between a first VSYNC pulse and a second VSYNC pulse may be 16.67 ms.
  • the application 312 may generate a rendering instruction to facilitate rendering a frame.
  • the application rendering component 314 may receive the rendering instruction and start performing application processor rendering tasks (e.g., the example application processor rendering tasks 112 of FIG. 1 and/or the example application processor rendering tasks 212 of FIG. 2) .
  • the application rendering component 314 may designate a buffer for storing the rendered frame (e.g., may enable a synchronization fence for the designated buffer) , may generate rendering commands 322 for execution by the graphics processor 330, may send a queue buffer command to the compositing component 316 (e.g., to identify the designated buffer) , and may send a swap buffer command to the graphics processor 330 to indicate to the graphics processor 330 that the graphics processor 330 may start performing the graphics processor rendering tasks.
  • a buffer for storing the rendered frame e.g., may enable a synchronization fence for the designated buffer
  • the application rendering component 314 may designate a buffer for storing the rendered frame (e.g., may enable a synchronization fence for the designated buffer) , may generate rendering commands 322 for execution by the graphics processor 330, may send a queue buffer command to the compositing component 316 (e.g., to identify the designated buffer) , and may send a swap buffer command to the graphics processor 330 to indicate to the
  • the graphics processor 330 may receive the swap buffer command and start performing graphics processor rendering tasks (e.g., the example graphics processor rendering tasks 114, 214) .
  • graphics processor 330 may execute the rendering commands 322 associated with the frame, may write the rendered frame to the designated buffer, and may signal an indication when the writing to the designated buffer is complete (e.g., may disable the corresponding synchronization fence) .
  • the compositing component 316 may receive the queue buffer command and start performing compositing tasks (e.g., the example compositing tasks 116, 216) .
  • the composing component 316 may provide information identifying the designated buffer to the display 340, and may perform compositing of the rendered frame.
  • the display 340 may receive the information identifying the designated buffer and start performing display rendering tasks (e.g., the example display rendering tasks 118, 218) .
  • the display 340 may monitor for the indication that writing the designated buffer is complete, may display the rendered frame, and may generate a VSYNC pulse.
  • the interval between when the application 312 generates the rendering instruction to render a frame and the display 340 displays the corresponding rendered frame may be referred to as frame latency.
  • the compositing component 316 may synchronize the performing of the compositing tasks based on occurrences of VSYNC pulses.
  • the display 340 may generate a VSYNC pulse that is received by the compositing component 316.
  • the compositing component 316 may begin the performing of the compositing tasks after the VSYNC pulse is received.
  • the compositing component 316 may receive a queue buffer command from the application rendering component 314 and wait for receipt of a subsequent VSYNC pulse before starting the performing of the compositing tasks.
  • the compositing component 316 may be capable of performing the compositing tasks in a duration that is less than the VSYNC pulse period, which may result in the compositing component 316 operating in a work mode and an idle mode.
  • the graphics processor 330 may begin performing graphics processor rendering tasks after receiving the swap buffer command from the application rendering component 314 and the display 340 may not begin displaying a rendered frame until the display 340 receives an indication that the performing of the graphics processor rendering tasks is complete (e.g., until the synchronization fence associated with the corresponding buffer is disabled) .
  • the display 340 may synchronize the presentment of a rendered frame based on occurrence of VSYNC pulses. For example, the display 340 may receive an indication indicating that the rendered frame is ready for presentment and wait for the next VSYNC pulse before starting the presentment of the rendered frame.
  • the performing of the graphics processor rendering tasks may not be synchronized with a VSYNC pulse, which may result in an interval after the graphics processor 330 completes performing the graphics processor rendering tasks and before the display 340 begins performing the display rendering tasks.
  • Examples disclosed herein provide techniques for reducing frame latency by improving the timing of performing tasks within the rendering pipeline. For example, disclosed techniques facilitate modifying the performing of certain tasks so that the completion of the graphics processor rendering tasks and the completion of the compositing tasks may be synchronized with the same VSYNC pulse.
  • the display rendering tasks depend on the completion of the graphics processor rendering tasks (e.g., the display waits for the rendered frame to be stored in the rendered frame buffer 324 before displaying the rendered frame) and waits for information from the compositing tasks (e.g., information identifying which buffer to monitor for completion before presentment of the rendered frame) .
  • disclosed techniques facilitate reducing wait time between the graphics processor rendering tasks completing and the display rendering tasks starting (as shown in the example timing diagrams 100, 200 of FIGs. 1 and 2, respectively) .
  • FIG. 4 is an example timing diagram 400 depicting active periods for an application processor 402, a graphics processor 404, a compositing component 406, and a display 408 operating on a frame (frame A) in a rendering pipeline, in accordance with one or more techniques of this disclosure.
  • One or more aspects of the application processor 402 may be implemented by the application rendering component 314 of FIG. 3.
  • One or more aspects of the graphics processor 404 may be implemented by the graphics processor 330 of FIG. 3.
  • One or more aspects of the compositing component 406 may be implemented by the compositing component 316 of FIG. 3.
  • One or more aspects of the display 408 may be implemented by the display 340 of FIG. 3.
  • the example timing diagram 400 of FIG. 4 is similar to the example timing diagrams 100, 200 of FIGs. 1 and 2, respectively, and includes application processor rendering tasks 412 executed by the application processor 402, graphics processor rendering tasks 414 executed by the graphics processor 404, compositing tasks 416 executed by the compositing component 406, and display rendering tasks 418 executed by the display 408. Furthermore, as shown in FIG. 4, the application processor rendering tasks 412 start when a rendering instruction 410 is received by the application processor 402 and the application processor rendering tasks 412 complete after a first VSYNC pulse 420a.
  • the graphics processor rendering tasks 414 As the performing of the graphics processor rendering tasks 414 is triggered via the completion of the application processor rendering tasks 412 (e.g., via a “swap buffer” command of application processor rendering tasks 412) , the graphics processor rendering tasks 414 start after the completion of the application processor rendering tasks 412.
  • the start time of the graphics processor rendering tasks 414 and the duration of the graphics processor rendering tasks 414 e.g., the interval between when the graphics processor rendering tasks 414 complete and when the graphics processor rendering tasks 414 start
  • the completion of the compositing tasks 416 aligns with the occurrence of the second VSYNC pulse 420b.
  • the performing of the display rendering tasks 418 may then begin after the occurrence of the second VSYNC pulse 420b as the display 408 receives information identifying a buffer configured to store the rendered frame A (e.g., during the performing of the compositing tasks 416) and also receives an indication that the rendered frame A is stored in the identified buffer (e.g., during the performing of the graphics processor rendering tasks 414) .
  • example techniques disclosed herein facilitate modifying the performing of the application processor rendering tasks 412.
  • disclosed techniques modify the performing of the application processor rendering tasks 412 to include an application processor sleep duration during the performing of the application processor rendering tasks.
  • the example application processor sleep duration may be configured to delay the end of the application processor rendering tasks and, thus, to delay the start of the performing of the graphics processor rendering tasks (e.g., by delaying the transmitting of the swap buffer command from the application processor to the graphics processor) .
  • Example techniques disclosed herein may also modify the performing of the compositing tasks 416 by delaying the start of the work mode of the compositing component 406 during the performing of the compositing tasks 416 during a VSYNC pulse period (e.g., the period between the occurrence of the second VSYNC pulse 420b and the occurrence of the first VSYNC pulse 420a) .
  • a VSYNC pulse period e.g., the period between the occurrence of the second VSYNC pulse 420b and the occurrence of the first VSYNC pulse 420a
  • disclosed techniques may modify the performing of the compositing tasks 416 to cause the compositing component 406 to operate first in the idle mode for an compositing sleep duration and then operate in the work mode so that the completion of the compositing tasks 416 aligns with the occurrence of the second VSYNC pulse 420b.
  • FIG. 5 is an example timing diagram 500 depicting active periods for an application processor 502, a graphics processor 504, a compositing component 506, and a display 508 operating on a frame (frame A) in a rendering pipeline, in accordance with one or more techniques of this disclosure.
  • One or more aspects of the application processor 502 may be implemented by the application rendering component 314 of FIG. 3 and/or the application processor 402 of FIG. 4.
  • One or more aspects of the graphics processor 504 may be implemented by the graphics processor 330 of FIG. 3 and/or the graphics processor 404 of FIG. 4.
  • One or more aspects of the compositing component 506 may be implemented by the compositing component 316 of FIG. 3 and/or the compositing component 406 of FIG. 4.
  • One or more aspects of the display 508 may be implemented by the display 340 of FIG. 3 and/or the display 408 of FIG. 4.
  • the example timing diagram 500 of FIG. 5 is similar to the example timing diagram 400 of FIG. 4 and includes application processor rendering tasks 512 executed by the application processor 502, graphics processor rendering tasks 514 executed by the graphics processor 504, compositing tasks 516 executed by the compositing component 506, and display rendering tasks 518 executed by the display 508. Furthermore, as shown in FIG. 5, the application processor rendering tasks 512 start when a rendering instruction 510 is received by the application processor 502 and the application processor rendering tasks 512 complete after a first VSYNC pulse 520a.
  • example techniques disclosed herein facilitate modifying the performing of the application processor rendering tasks 512.
  • disclosed techniques modify the performing of the application processor rendering tasks 512 to include an application processor sleep duration 513 during the performing of the application processor rendering tasks 512.
  • the example application processor sleep duration 513 may be configured to delay the end of the application processor rendering tasks 512 and, thus, to delay the start of the performing of the graphics processor rendering tasks 514 (e.g., by delaying the transmitting of the swap buffer command from the application processor 502 to the graphics processor 504) .
  • the application processor rendering tasks 512 may be modified so that the transmitting of the swap buffer command that instructs the graphics processor 504 to begin performing the graphics processor rendering tasks 504 occurs after the application processor sleep duration 513.
  • the duration of the application processor sleep duration 513 may be selected so that the completion of the graphics processor rendering tasks 514 aligns with a VSYNC pulse (e.g. the occurrence of the second VSYNC pulse 520b) .
  • the example application rendering component 314 of FIG. 3 includes an example application processor delay component 350 configured to delay the transmitting of the swap buffer command.
  • the application processor delay component 350 may determine a application processor delay duration and include the application processor delay duration in the performing of the application processor rendering tasks so that the transmitting of the swap buffer command from the application rendering component 314 to the graphics processor 330 is delayed.
  • the example application processor delay component 350 may estimate a duration of performing the graphics processor rendering tasks. For example, the application processor delay component 350 may sample the duration of performing the graphics processor rendering tasks for a quantity of previous frames and estimate a duration of performing the graphics processor rendering tasks for the current frame.
  • the quantity of previous frames may be a preconfigured quantity, such as five frames, ten frames, or any other suitable quantity of frames. In some examples, the quantity of previous frames may vary.
  • the application processor delay component 350 may estimate the duration of performing the graphics processor rendering tasks for the current frame by selecting the longest duration of the sampled durations.
  • the application processor delay component 350 may include a buffer duration (e.g., a padding duration) to the estimated duration.
  • the application processor delay component 350 may estimate the duration of performing the graphics processor rendering tasks for the current frame by sampling the duration of performing the graphics processor rendering tasks for five previous frames, selecting the duration of the five sampled durations that is the longest duration (e.g., a maximum duration) , and adding a buffer duration to the selected duration.
  • the application processor delay component 350 may estimate the duration of performing the graphics processor rendering tasks for the current frame by averaging the durations of the sampled durations.
  • the application processor delay component 350 may maintain information regarding the start time of the performing of the graphics processor rendering tasks for the previous frame and the stop time of the performing of the graphics processor rendering tasks for the previous frame. For example, when the application rendering component 314 transmits the swap buffer command to the graphics processor 330 for a frame N, the application processor delay component 350 may record a timestamp associated with the start of the graphics processor rendering tasks for the frame N. The application processor delay component 350 may also monitor for the indication from the graphics processor 330 disabling the synchronization fence associated with the buffer to which the graphics processor 330 stores the rendered frame N.
  • the application processor delay component 350 may record a timestamp associated with the completion of the graphics processor rendering tasks for the frame N.
  • the example application processor delay component 350 may then calculate the difference between the recorded completion timestamp and the start timestamp for the frame N to sample the duration of the performing of the graphics processor rendering tasks for the frame N.
  • the application processor delay component 350 may determine the application processor sleep duration to facilitate aligning the end of the completion of the graphics processor rendering tasks with a VSYNC pulse. For example, the application processor delay component 350 may determine the application processor sleep duration based on the VSYNC pulse period, the estimated duration of performing the graphics processor rendering tasks, and the duration of performing the unmodified application processor rendering tasks. For example, the application processor delay component 350 may calculate a start time for the performing of the graphics processor rendering tasks, relative to an occurrence of a VSYNC pulse, based on a difference in the VSYNC pulse period and the estimated duration of performing the graphics processor rendering tasks. The application processor delay component 350 may then calculate the application processor sleep duration as a difference between the start time for the performing of the graphics processor rendering tasks and the end time of the performing of the unmodified application processor rendering tasks.
  • the example timing diagram 500 illustrates the application processor 502 (e.g., the application rendering component 314) receiving the rendering instruction 510 at a time t0 and initiating the performing of the application processor rendering tasks 512.
  • the completion of the unmodified application processor rendering tasks 512 e.g., the application processor rendering tasks without the application processor delay duration
  • the estimated duration of the graphics processor rendering tasks is the difference between time t5 and time t3, and the start time of the graphics processor rendering tasks is at the time t3.
  • the application processor delay component 350 may then determine that the application processor delay duration is the duration between the time t3 (e.g., the determined start time for performing the graphics processor rendering tasks 514) and the time t2 (e.g., the completion time of the unmodified application processor rendering tasks 512) . After determining the application processor delay duration 513, the example application processor delay component 350 may then modify the application processor rendering tasks so that the application processor delay duration 513 is included in the application processor rendering tasks and is performed before the transmitting of the swap buffer command to the graphics processor to trigger the start of the graphics processor rendering tasks.
  • the application processor delay component 350 may also modify the performing of the application processor rendering tasks so that the transmitting of the queue buffer command from the application rendering component 314 to the compositing component 316 is transmitted when the queue buffer command is transmitted from the application rendering component 314 to the graphics processor 330. That is, in some examples, the application processor delay component 350 may also delay the transmitting of the queue buffer command to facilitate delaying the start of the work mode of the compositing component 316.
  • disclosed techniques may cause the compositing component 506 to wait an compositing sleep duration 517 prior to performing the compositing tasks 516 for the current frame (e.g., frame A) .
  • the compositing sleep duration 517 prior to starting the performing of the compositing tasks 516
  • disclosed techniques enable the completion of the performing of the compositing tasks 516 to align with the occurrence of a VSYNC pulse (e.g., the second example VSYNC pulse 520b) .
  • disclosed techniques may modify the performing of the compositing tasks to cause the compositing component to operate first in the idle mode for an compositing sleep duration and then operate in the work mode so that the completion of the compositing tasks aligns with a VSYNC pulse.
  • the example compositing component 316 of FIG. 3 includes an example compositing sleep delay component 352 configured to delay the performing of the compositing tasks within a VSYNC cycle.
  • the compositing sleep delay component 352 may determine an compositing sleep duration and include the compositing sleep duration in the performing of the compositing tasks so that the compositing component 316 operates in the idle mode for the compositing sleep duration and then operates in the work mode to facilitate performing and completing the performing of the compositing tasks at the occurrence of a VSYNC pulse.
  • the example compositing sleep delay component 352 may estimate a duration of performing the compositing tasks. For example, the compositing sleep delay component 352 may sample the duration of performing the compositing tasks for a quantity of previous frames and estimate a duration of performing the compositing tasks for the current frame.
  • the quantity of previous frames may be a preconfigured quantity, such as five frames, ten frames, or any other suitable quantity of frames. In some examples, the quantity of previous frames may vary.
  • the compositing sleep delay component 352 may estimate the duration of performing the compositing tasks for the current frame by selecting the longest duration of the sampled durations.
  • the compositing sleep delay component 352 may include a buffer duration (e.g., a padding duration) to the estimated duration.
  • the compositing sleep delay component 352 may estimate the duration of performing the compositing tasks for the current frame by sampling the duration of performing the compositing tasks for five previous frames, selecting the duration of the five sampled durations that is the longest duration (e.g., a maximum duration) , and adding a buffer duration to the selected duration.
  • the compositing sleep delay component 352 may estimate the duration of performing the compositing tasks for the current frame by averaging the durations of the sampled durations.
  • the compositing sleep delay component 352 may maintain information regarding the start time of the performing of the compositing tasks for the previous frame and the stop time of the performing of the compositing tasks for the previous frame. For example, when the compositing component 316 starts performing the compositing tasks for a frame N, the compositing sleep delay component 352 may record a timestamp associated with the start of the compositing tasks for the frame N. The compositing sleep delay component 352 may also monitor for an indication of when the performing of the compositing tasks for the frame N is complete.
  • the compositing sleep delay component 352 may record a timestamp associated with the completion of the compositing tasks for the frame N. The example compositing sleep delay component 352 may then calculate the difference between the recorded completion timestamp and the start timestamp for the frame N to sample the duration of the performing of the compositing tasks for the frame N.
  • the compositing sleep delay component 352 may determine the compositing sleep duration to facilitate aligning the end of the completion of the compositing tasks with a VSYNC pulse. For example, the compositing sleep delay component 352 may determine the compositing sleep duration based on the VSYNC pulse period and the estimated duration of performing the compositing tasks. For example, the compositing sleep delay component 352 may calculate a start time for the performing of the compositing tasks, relative to an occurrence of a VSYNC pulse, based on a difference in the VSYNC pulse period and the estimated duration of performing the compositing tasks. The compositing sleep delay component 352 may then calculate the compositing sleep duration as a difference between the start time for the performing of the compositing tasks and the occurrence of a previous VSYNC pulse.
  • the example timing diagram 500 illustrates that the estimated duration of the compositing tasks is the difference between time t5 and time t4, and the start time of the compositing tasks is at the time t4.
  • the compositing sleep delay component 352 may then determine that the compositing sleep duration is the difference between the VSYNC pulse period and the estimated duration of the compositing tasks.
  • the example compositing sleep delay component 352 may then determine the start time of the compositing tasks 516 based on the compositing sleep duration and a timestamp associated with a previous VSYNC pulse (e.g., the occurrence of the first VSYNC pulse 520a at the time t1) .
  • the gap between when the graphics processor completes performing the graphics processor rendering tasks and the performing of the display rendering tasks may be reduced.
  • disclosed techniques facilitate aligning the compositing completion of the compositing tasks and the graphics processor rendering tasks
  • the display may begin performing the display rendering tasks, including the presentment of the corresponding rendered frame. Accordingly, disclosed techniques may facilitate reducing frame latency due to the rendering pipeline from, for example, four or more VSYNC pulse periods to, for example, two or three VSYNC pulse periods.
  • the example device 300 may include the application processor 310 configured to execute the application 312, which may generate a rendering instruction to facilitate the rendering of a frame.
  • the application rendering component 314 may receive the rendering instruction.
  • the application rendering component 314 may determine whether to employ example frame latency reducing techniques disclosed herein.
  • the application rendering component 314 (and/or the application processor delay component 350) may access a data structure (e.g., a file, a list, etc. ) that indicates which applications are permitted to utilize the example frame latency reducing techniques disclosed herein and/or which applications are not permitted to utilize the example frame latency reducing techniques disclosed herein.
  • applications that may be permitted to access the example frame latency reducing techniques disclosed herein may include applications that employ frame rendering, while applications that may not be permitted to access the example frame latency reducing techniques disclosed herein may include applications that do not employ frame rendering.
  • the data structure may include a white list of applications that are permitted to use the example frame latency reducing techniques disclosed herein.
  • additional or alternative techniques for determining which applications may use the example frame latency reducing techniques disclosed herein and/or which applications may not use the example frame latency reducing techniques disclosed herein may also be used.
  • the application rendering component 314 may enable a frame latency reducing indicator that may be used by the application processor delay component 350 to determine whether to modify the performing of the application processor rendering tasks and/or may be used by the compositing sleep delay component 352 to determine whether to perform the compositing task after an compositing sleep duration during a VSYNC cycle.
  • the application rendering component 314 may disable the frame latency reducing indication when an application is determined to be not permitted to employ the example frame latency reducing techniques disclosed herein.
  • the example application rendering component 314 may start performing the application processor rendering tasks associated with rendering a frame.
  • the application processor delay component 350 may be configured to select a start time for performing graphics processor rendering tasks for the frame. For example, the application processor delay component 350 may estimate a duration for performing the graphics processor rendering tasks for the current frame and determine a start time for performing the graphics processor rendering tasks based on the estimated duration for performing the graphics processor rendering tasks. Based on the selected start time, the application processor delay component 350 may be configured to delay the transmitting of the swap buffer command to the graphics processor 330 to trigger the graphics processor 330 to start performing the graphics processor rendering tasks. In some examples, the application processor delay component 350 may also be configured to delay the transmitting of the queue buffer command to the compositing component 316 to delay the start of the work mode of the compositing component 316 when performing the compositing tasks for the frame.
  • the example graphics processor 330 may start performing the graphics processor rendering tasks.
  • the application processor delay component 350 may record a timestamp associated with the start of the graphics processor 330 performing the graphics processor rendering tasks.
  • the graphics processor 330 may store the rendered frame in the rendered frame buffer 324 and signal an indication to disable the synchronization fence associated with the rendered frame buffer 324.
  • the application processor delay component 350 may record a timestamp associated with the indication indicating that the synchronization fence associated with the frame and the rendered frame buffer 324 is disabled.
  • the example compositing component 316 may start performing the compositing tasks after receiving the queue buffer command from the application rendering component 314.
  • the compositing sleep delay component 352 may be configured to delay the start of the work mode of the compositing component 316 during the performing of the compositing tasks.
  • the compositing sleep delay component 352 may estimate a duration for performing the compositing tasks for the current frame and determine a start time for performing the compositing tasks based on the estimated duration for performing the compositing tasks. Based on the selected start time, the compositing sleep delay component 352 may be configured to cause the compositing component 316 to start the VSYNC cycle in the idle mode and then cause the compositing component 316 to transition to the work mode at the selected start time.
  • the example display 340 may be configured to start the performing of the display rendering tasks.
  • the completion of the graphics processor rendering tasks and the compositing tasks may be aligned with the occurrence of a VSYNC pulse and the start of the display rendering tasks may be aligned with the occurrence of the same VSYNC pulse.
  • FIG. 6 illustrates an example flowchart 600 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, the example application processor delay component 350, the example compositing component 316, the example compositing sleep delay component 352, the example memory 320, the example graphics processor 330, and/or the example display 340.
  • an apparatus such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, the example application processor delay component 350, the example compositing component 316, the example compositing sleep delay component 352, the example memory 320, the example graphics processor 330, and/or the example display 340.
  • the apparatus may perform application processor rendering tasks for rendering a frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application rendering component 314 may be configured to perform the application processor rendering tasks 412, 512 of FIGs. 4 and 5, respectively.
  • the apparatus may select a start time for performing graphics processor rendering tasks (e.g., performed by a GPU) for the frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may be configured to determine the start time for performing the graphics processor rendering tasks for the frame. Example techniques for selecting the start time for performing the graphics processor rendering tasks are described in connection with FIGs. 7 and 8.
  • the apparatus may perform the graphics processor rendering tasks for the frame at the selected start time, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the graphics processor 330 may be configured to perform the graphics processor rendering tasks 414, 514.
  • the apparatus may select a start time for performing compositing tasks for the frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may be configured to determine the start time for performing the compositing tasks for the frame. Example techniques for selecting the start time for performing the compositing tasks are described in connection with FIGs. 9 and 10.
  • the apparatus may perform the compositing tasks for the frame at the selected start time, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing component 316 may be configured to perform the compositing tasks 416, 516.
  • the apparatus may perform the display rendering tasks for the frame at the selected start time, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the display 340 may be configured to perform the display rendering tasks 418, 518.
  • FIG. 7 illustrates an example flowchart 700 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, and/or the example application processor delay component 350.
  • the example flowchart 700 may facilitate selecting the start time for performing graphics processor rendering tasks.
  • the apparatus may estimate a duration for performing the graphics processor rendering tasks for a current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may be configured to estimate the duration for performing the graphics processor rendering asks for the current frame. Example techniques for estimating the duration for performing the graphics processor rendering tasks for the current frame are described in connection with FIG. 8.
  • the apparatus may determine a application processor sleep duration based on a difference in a VSYNC pulse period and the estimated graphics processor rendering tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may be configured to calculate a difference between the VSYNC pulse period and the estimated graphics processor rendering tasks duration to determine the application processor sleep duration.
  • the apparatus may determine the start time for performing the graphics processor rendering tasks for the current frame based on the application processor sleep duration and an end time of the unmodified application processor rendering tasks, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may be configured to determine the start time for performing the graphics processor rendering tasks by calculating a difference between a timestamp corresponding to the end of the unmodified application processor rendering tasks and the application processor sleep duration to determine the start time for performing the graphics processor rendering tasks.
  • FIG. 8 illustrates an example flowchart 800 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, and/or the example application processor delay component 350.
  • the example flowchart 800 may facilitate estimating a duration for performing the graphics processor rendering tasks for a current frame.
  • the apparatus may sample a graphics processor rendering tasks duration for a previous frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may be configured to calculate a difference between a stop time of the graphics processor rendering tasks of the previous frame and a start time of the graphics processor rendering tasks of the previous frame.
  • the apparatus may sample graphics processor rendering tasks durations for a quantity of previous frames, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may sample a preconfigured quantity of frames (e.g., five frames, ten frames, etc. ) .
  • the apparatus may select a maximum sampled graphics processor rendering tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may determine which of the sampled durations is a longest duration and select the corresponding duration.
  • the apparatus may add a padding duration to the selected sampled graphics processor rendering tasks duration to estimate the duration for performing the graphics processor rendering tasks for the current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may add a buffer duration to account for examples in which the duration of performing graphics processor rendering tasks for the current frame is longer than the estimated duration of performing the graphics processor rendering tasks for the current frame.
  • FIG. 9 illustrates an example flowchart 900 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example compositing component 316, and/or the example compositing sleep delay component 352.
  • the example flowchart 900 may facilitate selecting the start time for performing compositing tasks.
  • the apparatus may estimate a duration for performing the compositing tasks for a current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may be configured to estimate the duration for performing the compositing tasks for the current frame. Example techniques for estimating the duration for performing the compositing tasks for the current frame are described in connection with FIG. 10.
  • the apparatus may determine an compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may be configured to calculate a difference between the VSYNC pulse period and the estimated compositing tasks duration to determine the compositing sleep duration.
  • the apparatus may determine the start time for performing the compositing tasks for the current frame based on the compositing sleep duration and a timestamp associated with a previous VSYNC pulse, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may be configured to determine the start time for performing the compositing tasks by calculating a difference between the compositing sleep duration and a timestamp corresponding to the previous VSYNC pulse to determine the start time for performing the compositing tasks.
  • FIG. 10 illustrates an example flowchart 1000 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example compositing component 316, and/or the example compositing sleep delay component 352.
  • the example flowchart 1000 may facilitate estimating a duration for performing the compositing tasks for a current frame.
  • the apparatus may sample an compositing tasks duration for a previous frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may be configured to calculate a difference between a stop time of the compositing tasks of the previous frame and a start time of the compositing tasks of the previous frame.
  • the apparatus may sample compositing tasks durations for a quantity of previous frames, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may sample a preconfigured quantity of frames (e.g., five frames, ten frames, etc. ) .
  • the apparatus may select a maximum sampled compositing tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may determine which of the sampled durations is a longest duration and select the corresponding duration.
  • the apparatus may add a padding duration to the selected sampled compositing tasks duration to estimate the duration for performing the compositing tasks for the current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may add a buffer duration to account for examples in which the duration of performing compositing tasks for the current frame is longer than the estimated duration of performing the compositing tasks for the current frame.
  • FIG. 11 illustrates an example flowchart 1100 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application 312, the example application rendering component 314, and/or the example application processor delay component 350.
  • the example flowchart 1100 may facilitate initialize the example frame latency reducing techniques disclosed herein for different applications.
  • the apparatus may initialize a display connection for an application, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application rendering component 314 may initialize the display connection for the application 312 after receiving a rendering instruction from the application 312.
  • the apparatus may utilize an API to initialize the display connection.
  • EGL an interface layer between rendering APIs, may provide mechanisms for initializing the display connection.
  • the apparatus may call “eglInitialize” to initialize an EGL display connection.
  • other examples may use additional or alternative APIs and/or interfaces for initializing the display connection.
  • the apparatus may determine whether the application is approved for utilizing the example frame latency reducing techniques disclosed herein, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application rendering component 314 and/or the application processor delay component 350 may compare an identifier associated with the application 312 to applications included in a data structure to determine whether the application 312 is permitted to use the example frame latency reducing techniques disclosed herein.
  • the apparatus may disable a frame latency reducing indicator, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application rendering component 314 and/or the application processor delay component 350 may disable the frame latency reducing indicator.
  • the apparatus may enable the frame latency reducing indicator, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application rendering component 314 and/or the application processor delay component 350 may enable the frame latency reducing indicator.
  • the apparatus may register the display connection to receive VSYNC pulse timestamps, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application rendering component 314 and/or the application processor delay component 350 may register the display connection with an operating system so that the application rendering component 314 (and/or the application processor delay component 350) may receive timestamps for VSYNC pulses.
  • the apparatus may use operating system-level APIs to allow the application to register and to receive VSYNC pulse occurrences.
  • FIG. 12 illustrates an example flowchart 1200 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, and/or the example application processor delay component 350.
  • the example flowchart 1200 may facilitate applying the example frame latency reducing techniques disclosed herein by synchronizing the completion of the graphics processor rendering tasks with a VSYNC pulse.
  • the apparatus may receive an indication that the performing of the application processor rendering tasks is complete, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may determine that the application rendering component 314 is ready to transmit the swap buffer command to the graphics processor 330.
  • the apparatus may determine whether the frame latency reducing indicator is enabled for the application, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may not modify the performing the application processor rendering tasks by including an application processor sleep duration.
  • calling the swap buffer command may also include calling the queue buffer command.
  • the apparatus may schedule and apply an application processor sleep duration based on the estimated duration for performing the graphics processor rendering tasks for the current frame, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may utilize the example techniques described in connection with FIGs. 7 and/or 8 to schedule and apply the application processor sleep duration.
  • the apparatus may call the swap buffer command, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the application processor delay component 350 may cause the application rendering component 314 to transmit the swap buffer command to the graphics processor 330 after the application processor sleep duration.
  • calling the swap buffer command may also include calling the queue buffer command.
  • FIG. 13 illustrates an example flowchart 1300 of an example method in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example compositing component 316, and/or the example compositing sleep delay component 352.
  • the example flowchart 1300 may facilitate applying the example frame latency reducing techniques disclosed herein by synchronizing the completion of the compositing tasks with a VSYNC pulse.
  • the apparatus may save a VSYNC pulse timestamp for a received VSYNC pulse, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the compositing component 316 may be configured to save the VSYNC pulse timestamp when a VSYNC pulse is received from the display 340.
  • the apparatus may receive a queue buffer command, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the compositing component 316 may receive the queue buffer command from the application rendering component 314.
  • the apparatus may determine whether the frame latency reducing indicator is enabled for the application, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • control proceeds to 1310 to save a timestamp at a start of the performing of the compositing tasks.
  • the apparatus may schedule and apply an compositing sleep duration based on the saved VSYNC pulse timestamp and estimated compositing task duration, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the compositing sleep delay component 352 may utilize the example techniques described in connection with FIGs. 9 and/or 10 to schedule and apply the compositing sleep duration.
  • the apparatus may save a timestamp at a start of the performing of the compositing tasks, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the compositing component 316 may be configured to save the timestamp associated with the start time of the performing of the compositing tasks.
  • the apparatus may perform the compositing tasks, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the compositing component 316 may be configured to perform the compositing tasks 416, 516 of FIGs. 4 and 5, respectively.
  • the apparatus may save a timestamp at a completion of the performing of the compositing tasks, as described in connection with the examples in FIGs. 3, 4, and/or 5.
  • the compositing component 316 may be configured to save the timestamp associated with the completion of the performing of the compositing tasks.
  • FIG. 14 is a block diagram that illustrates an example content generation system 1400 configured to implement one or more techniques of this disclosure.
  • the content generation system 1400 includes a device 1404.
  • the device 1404 may include one or more components or circuits for performing various functions described herein. Aspects of the device 1404 may be implemented by the example device 300 of FIG. 3. In some examples, one or more components of the device 1404 may be components of an SOC.
  • the device 1404 may include one or more components configured to perform one or more techniques of this disclosure.
  • the device 1404 includes a processing unit 1420 and a memory 1424.
  • the device 1404 can include a number of additional or alternative components, such as a communication interface 1426, a transceiver 1432, a receiver 1428, a transmitter 1430, a display processor 1427, and a display client 1431.
  • a communication interface 1426 a transceiver 1432, a receiver 1428, a transmitter 1430, a display processor 1427, and a display client 1431.
  • the processing unit 1420 includes an internal memory 1421.
  • the processing unit 1420 may be configured to perform graphics processing, such as in a graphics processing pipeline 1407.
  • aspects of the rendering pipeline may be implemented by the graphics processing pipeline 1407.
  • the device 1404 may include a display processor, such as the display processor 1427, to perform one or more display processing techniques on one or more frames generated by the processing unit 1420 before presentment by the display client 1431.
  • the display processor 1427 may be configured to perform display processing.
  • the display processor 1427 may be configured to perform one or more display processing techniques on one or more frames generated by the processing unit 1420.
  • Reference to the display client 1431 may refer to one or more displays, such as the example display 340 of FIG. 3.
  • the display client 1431 may include a single display or multiple displays.
  • the display client 1431 may include a first display and a second display.
  • the results of the graphics processing may not be displayed on the device (e.g., the first and second displays may not receive any frames for presentment thereon) . Instead, the frames or graphics processing results may be transferred to another device.
  • the display client 1431 may be configured to display or otherwise present frames processed by the display processor 1427.
  • the display client 1431 may include one or more of: a liquid crystal display (LCD) , a plasma display, an organic light emitting diode (OLED) display, a projection display device, an augmented reality display device, a virtual reality display device, a head-mounted display, or any other type of display device.
  • LCD liquid crystal display
  • OLED organic light emitting diode
  • a projection display device an augmented reality display device, a virtual reality display device, a head-mounted display, or any other type of display device.
  • Memory external to the processing unit 1420 may be accessible to the processing unit 1420.
  • the processing unit 1420 may be configured to read from and/or write to external memory, such as the memory 1424.
  • the processing unit 1420 may be communicatively coupled to the memory 1424 over a bus.
  • the processing unit 1420 and the memory 1424 may be communicatively coupled to each other over the bus or a different connection.
  • the device 1404 may include a content encoder/decoder configured to receive graphical and/or display content from any source, such as the memory 1424 and/or the communication interface 1426.
  • the memory 1424 may be configured to store received encoded or decoded content.
  • the content encoder/decoder may be configured to receive encoded or decoded content (e.g., from the memory 1424 and/or the communication interface 1426) in the form of encoded pixel data.
  • the content encoder/decoder may be configured to encode or decode any content.
  • the internal memory 1421 or the memory 1424 may include one or more volatile or non-volatile memories or storage devices.
  • the internal memory 1421 or the memory 1424 may include RAM, SRAM, DRAM, erasable programmable ROM (EPROM) , electrically erasable programmable ROM (EEPROM) , flash memory, a magnetic data media or an optical storage media, or any other type of memory.
  • the internal memory 1421 or the memory 1424 may be a non-transitory storage medium according to some examples.
  • the term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal.
  • the term “non-transitory” should not be interpreted to mean that the internal memory 1421 or the memory 1424 is non-movable or that its contents are static.
  • the memory 1424 may be removed from the device 1404 and moved to another device.
  • the memory 1424 may not be removable from the device 1404.
  • the processing unit 1420 may be an application processor, a central processing unit (CPU) , a graphics processor, a graphics processing unit (GPU) , a general purpose GPU (GPGPU) , or any other processing unit that may be configured to perform system processing, such as graphics processing, compute processing, etc.
  • aspects of the application processor 310 and/or the graphics processor 330 may be implemented by the processing unit 1420.
  • the processing unit 1420 may be integrated into a motherboard of the device 1404.
  • the processing unit 1420 may be present on a graphics card that is installed in a port in a motherboard of the device 1404, or may be otherwise incorporated within a peripheral device configured to interoperate with the device 1404.
  • the processing unit 1420 may include one or more processors, such as one or more microprocessors, GPUs, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , arithmetic logic units (ALUs) , digital signal processors (DSPs) , discrete logic, software, hardware, firmware, other equivalent integrated or discrete logic circuitry, or any combinations thereof. If the techniques are implemented partially in software, the processing unit 1420 may store instructions for the software in a suitable, non-transitory computer-readable storage medium (e.g., the internal memory 1421) and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered to be one or more processors.
  • processors such as one or more microprocessors, GPUs, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , arithmetic logic units
  • the content generation system 1400 can include a communication interface 1426.
  • the communication interface 1426 may include a receiver 1428 and a transmitter 1430.
  • the receiver 1428 may be configured to perform any receiving function described herein with respect to the device 1404. Additionally, the receiver 1428 may be configured to receive information (e.g., eye or head position information, rendering commands, and/or location information) from another device.
  • the transmitter 1430 may be configured to perform any transmitting function described herein with respect to the device 1404. For example, the transmitter 1430 may be configured to transmit information to another device, which may include a request for content.
  • the receiver 1428 and the transmitter 1430 may be combined into a transceiver 1432. In such examples, the transceiver 1432 may be configured to perform any receiving function and/or transmitting function described herein with respect to the device 1404.
  • the graphical content from the processing unit 1420 for display via the display client 1431 is not static and may be changing. Accordingly, the display processor 1427 may periodically refresh the graphical content displayed via the display client 1431. For example, the display processor 1427 may periodically retrieve graphical content from the memory 1424, where the graphical content may have been updated by the execution of an application (and/or the processing unit 1420) that outputs the graphical content to the memory 1424.
  • the display client 1431 (sometimes referred to as a “display panel” ) may include the display processor 1427.
  • the processing unit 1420 may include the display processor 1427.
  • the processing unit 1420 may be configured to perform frame latency reducing techniques disclosed herein.
  • the processing unit 1420 may include a frame latency reducing component 1498 configured to facilitate reducing frame latency in the rendering pipeline. Aspects of the frame latency reducing component 1498 may be implemented by the device 300 of FIG. 3.
  • the frame latency reducing component 1498 may be configured to perform first processor rendering tasks for rendering a frame.
  • the example frame latency reducing component 1498 may also be configured to perform second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks.
  • the example frame latency reducing component 1498 may also be configured to perform compositing rendering tasks for rendering the frame at a second start time based on an estimated duration for performing the compositing rendering tasks.
  • the example frame latency reducing component 1498 may also be configured to perform display rendering tasks to display the frame.
  • the example frame latency reducing component 1498 may be configured to synchronize completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks with an occurrence of a same VSYNC pulse. In some examples, the example frame latency reducing component 1498 may be configured to start the performing of the display rendering tasks after the occurrence of the same VSYNC pulse.
  • the example frame latency reducing component 1498 may be configured to select the first start time by estimating the duration for performing the second processor rendering tasks.
  • the example frame latency reducing component 1498 may also be configured to determine a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration.
  • the example frame latency reducing component 1498 may also be configured to determine the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks.
  • the example frame latency reducing component 1498 may be configured to estimate the duration for performing the second processor rendering tasks by sampling second processor rendering tasks durations for a quantity of previous frames, selecting a sampled second processor rendering tasks duration, and adding a padding duration to the sampled second processor rendering tasks duration. In some examples, the example frame latency reducing component 1498 may be configured to select the sampled second processor rendering tasks duration by identifying a maximum duration of the sampled second processor rendering tasks durations. In some examples, the example frame latency reducing component 1498 may be configured to select the sampled second processor rendering tasks duration by calculating an average duration of the sampled second processor rendering tasks durations.
  • the example frame latency reducing component 1498 may be configured to sample second processor rendering tasks duration for a first previous frame by calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame.
  • the example frame latency reducing component 1498 may be configured to select the second start time by estimating the duration for performing the compositing rendering tasks, determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration, and determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse.
  • the example frame latency reducing component 1498 may be configured to estimate the duration for performing the compositing rendering tasks by sampling compositing rendering tasks durations for a quantity of previous frames, selecting a sampled compositing rendering tasks duration, and adding a padding duration to the sampled compositing rendering tasks duration.
  • the first processor rendering tasks may be performed by an application processor or a CPU.
  • the second processor rendering tasks may be performed by a graphics processor or a GPU.
  • a device such as the device 1404, may refer to any device, apparatus, or system configured to perform one or more techniques described herein.
  • a device may be a server, a base station, user equipment, a client device, a station, an access point, a computer (e.g., a personal computer, a desktop computer, a laptop computer, a tablet computer, a computer workstation, or a mainframe computer) , an end product, an apparatus, a phone, a smart phone, a server, a video game platform or console, a handheld device (e.g., a portable video game device or a personal digital assistant (PDA) ) , a wearable computing device (e.g., a smart watch, an augmented reality device, or a virtual reality device) , a non-wearable device, a display or display device, a television, a television set-top box, an intermediate network device, a digital media player, a video streaming device, a content streaming device, an in
  • PDA personal digital
  • a method or apparatus for graphics processing may be a processing unit, an application processor, a CPU, a graphics processor, a GPU, a display processor, a DPU, a video processor, or some other processor that can perform display processing.
  • the apparatus may be the processing unit 1420 within the device 1404, or may be some other hardware within the device 1404, or another device.
  • the apparatus may include means for performing first processor rendering tasks for rendering a frame.
  • the apparatus may also include means for performing second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks.
  • the apparatus may also include means for performing compositing rendering tasks for rendering the frame at a second start time based on an estimated duration for performing the compositing rendering tasks.
  • the apparatus may also include means for performing display rendering tasks to display the frame.
  • the apparatus may also include means for synchronizing completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks with an occurrence of a same VSYNC pulse.
  • the apparatus may also include means for starting the performing of the display rendering tasks after the occurrence of the same VSYNC pulse.
  • the apparatus may also include means for estimating the duration for performing the second processor rendering tasks.
  • the apparatus may also include means for determining a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration.
  • the apparatus may also include means for determining the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks.
  • the apparatus may also include means for sampling second processor rendering tasks durations for a quantity of previous frames.
  • the apparatus may also include means for selecting a sampled second processor rendering tasks duration.
  • the apparatus may also include means for adding a padding duration to the sampled second processor rendering tasks duration.
  • the apparatus may also include means for selecting the sampled second processor rendering tasks duration by identifying a maximum duration of the sampled second processor rendering tasks durations.
  • the apparatus may also include means for selecting the sampled second processor rendering tasks duration by calculating an average duration of the sampled second processor rendering tasks durations.
  • the apparatus may also include means for sampling the second processor rendering tasks duration for a first previous frame by calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame.
  • the apparatus may also include means for estimating the duration for performing the compositing rendering tasks.
  • the apparatus may also include means for determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration.
  • the apparatus may also include means for determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse.
  • the apparatus may also include means for sampling compositing rendering tasks durations for a quantity of previous frames.
  • the apparatus may also include means for selecting a sampled compositing rendering tasks duration.
  • the apparatus may also include means for adding a padding duration to the sampled compositing rendering tasks duration.
  • the described frame latency reducing techniques can be used by an application processor, a CPU, a graphics processor, a GPU, a display processor, a DPU, or a video processor or some other processor that can perform graphical rendering of a frame.
  • the frame latency reducing techniques disclosed herein can improve or speed up data processing or execution.
  • the frame latency reducing techniques herein can improve resource or data utilization and/or resource efficiency.
  • aspects of the present disclosure can reduce frame latency in a rendering pipeline.
  • the term “or” may be interrupted as “and/or” where context does not dictate otherwise. Additionally, while phrases such as “one or more” or “at least one” or the like may have been used for some features disclosed herein but not others, the features for which such language was not used may be interpreted to have such a meaning implied where context does not dictate otherwise.
  • the functions described herein may be implemented in hardware, software, firmware, or any combination thereof.
  • processing unit has been used throughout this disclosure, such processing units may be implemented in hardware, software, firmware, or any combination thereof. If any function, processing unit, technique described herein, or other module is implemented in software, the function, processing unit, technique described herein, or other module may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media, which is non-transitory or (2) a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, .
  • Disk and disc includes compact disc (CD) , laser disc, optical disc, digital versatile disc (DVD) , floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • a computer program product may include a computer-readable medium.
  • the code may be executed by one or more processors, such as one or more digital signal processors (DSPs) , general purpose microprocessors, application specific integrated circuits (ASICs) , arithmetic logic units (ALUs) , field programmable logic arrays (FPGAs) , or other equivalent integrated or discrete logic circuitry.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • ALUs arithmetic logic units
  • FPGAs field programmable logic arrays
  • the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set) .
  • IC integrated circuit
  • a set of ICs e.g., a chip set
  • Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need realization by different hardware units. Rather, as described above, various units may be combined in any hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

The present disclosure relates to methods and apparatus for graphics processing. For example, disclosed techniques facilitate reducing frame latency in a graphical rendering pipeline. Aspects of the present disclosure can perform first processor rendering tasks for rendering a frame. Aspects of the present disclosure can also perform second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks. Further, aspects of the present disclosure can perform compositing rendering tasks for rendering frame at a second start time based on an estimated duration for performing the compositing rendering tasks. Additionally, aspects of the present disclosure can perform display rendering tasks to display the frame.

Description

METHODS AND APPARATUS FOR REDUCING FRAME LATENCY TECHNICAL FIELD
The present disclosure relates generally to processing systems and, more particularly, to one or more techniques for graphics or display processing.
INTRODUCTION
Computing devices often utilize a graphics processing unit (GPU) to accelerate the rendering of graphical data for display. Such computing devices may include, for example, computer workstations, mobile phones such as so-called smartphones, embedded systems, personal computers, tablet computers, and video game consoles. GPUs execute a graphics processing pipeline that includes one or more processing stages that operate together to execute graphics processing commands and output a frame. An application processor or a central processing unit (CPU) may control the operation of the GPU by issuing one or more graphics processing commands to the GPU. Modern day CPUs are typically capable of concurrently executing multiple applications, each of which may need to utilize the GPU during execution.
A user’s experience on a computing device can be affected by how smoothly the user interface (UI) animation runs on the device for any particular application. In some examples, an application may generate a frame rendering instruction to facilitate rendering a frame for display. However, there may be a frame latency associated with when the frame rendering instruction is generated and when the corresponding rendered frame is presented. Accordingly, there has developed an increased need for reducing frame latency for presenting graphical content on displays.
SUMMARY
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
In an aspect of the disclosure, a method, a computer-readable medium, and an apparatus are provided. The apparatus may be an application processor, a CPU, a graphics processor, a graphics processing unit (GPU) , a display processor, a display processing unit (DPU) , or a video processor. The apparatus can perform first processor rendering tasks for rendering a frame. The apparatus can perform second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks. Additionally, the apparatus can perform compositing rendering tasks for rendering frame at a second start time based on an estimated duration for performing the compositing rendering tasks. Further, the apparatus can perform display rendering tasks to display the frame.
In some examples, the apparatus can synchronize completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks with an occurrence of a same VSYNC pulse. In some examples, the apparatus can start the performing of the display rendering tasks after the occurrence of the same VSYNC pulse.
In some examples, the apparatus can select the first start time by estimating the duration for performing the second processor rendering tasks. The apparatus can also determine a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration. The apparatus can also determine the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks. In some examples, the apparatus can estimate the duration for performing the second processor rendering tasks by sampling second processor rendering tasks durations for a quantity of previous frames, selecting a sampled second processor rendering tasks duration, and adding a padding duration to the sampled second processor rendering tasks duration. In some examples, the apparatus can select the sampled second processor rendering tasks duration by identifying a maximum duration of the sampled second processor rendering tasks durations. In some examples, the apparatus can select the sampled second processor rendering tasks duration by calculating an average duration of the sampled second processor rendering tasks durations. In some examples, the apparatus can sample the second processor rendering tasks duration for a first previous frame by calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame.
In some examples, the apparatus can select the second start time by estimating the duration for performing the compositing rendering tasks, determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration, and determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse. In some examples, the apparatus can estimate the duration for performing the compositing rendering tasks by sampling compositing rendering tasks durations for a quantity of previous frames, selecting a sampled compositing rendering tasks duration, and adding a padding duration to the sampled compositing rendering tasks duration.
In some examples, the apparatus may perform the first processor rendering tasks with an application processor or a CPU. In some examples, the apparatus may perform the second processor rendering tasks with a graphics processor or a GPU.
The details of one or more examples of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is an example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline.
FIG. 2 is another example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline.
FIG. 3 is a block diagram that illustrates an example device, in accordance with one or more techniques of this disclosure.
FIG. 4 is an example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline, in accordance with one or more techniques of this disclosure.
FIG. 5 is another example timing diagram depicting active periods for an application processor, a graphics processor, a compositing component, and a display operating on a frame in a rendering pipeline, in accordance with one or more techniques of this disclosure.
FIGs. 6 to 13 illustrate example flowcharts of example methods, in accordance with one or more techniques of this disclosure.
FIG. 14 is a block diagram that illustrates an example content generation system, in accordance with one or more techniques of this disclosure.
DETAILED DESCRIPTION
In general, examples disclosed herein provide techniques for reducing frame latency and, thus, improving user experience. In some examples, an apparatus may include a rendering pipeline to facilitate rendering a frame and for the presentment of the rendered frame. For example, an application (such as a game) executing via the apparatus may generate a rendering instruction to facilitate rendering a frame. In some such examples, a first stage of the rendering pipeline may include an application rendering stage to process an application rendering workload based on the rendering instruction. For example, a processing unit (or a component (s) of the processing unit) may be configured to generate a rendered frame based on the rendering instruction. In some examples, the application rendering workload may be split between an application processor (e.g., a CPU) and a graphics processor (e.g., a GPU) . For example, the application processor may perform an application processor rendering task (or tasks) associated with the application rendering workload and the graphics processor may perform a graphics processor rendering task (or tasks) associated with the application rendering workload. In some examples, the application processor rendering task may include the application processor generating one or more rendering commands for execution by the graphics processor and/or the application processor generating a rendered frame based on the rendering instruction. In some examples, the graphics processor rendering task may include the graphics processor executing the one or more rendering commands and/or the graphics processor performing post-processing techniques on the rendered frame generated by the application processor.
The example rendering pipeline may also include a second stage during which composition on the rendered frame may be performed. For example, a compositing component may perform a rendering task to facilitate performing composition on the rendered frame. In some examples, the compositing component may also provide information to a display to facilitate the presentment of the rendered frame. For example, the rendering task performed by the  compositing component may cause the compositing component to provide information regarding a buffer for presentment via the display. For example, the compositing component may identify a particular buffer storing the rendered frame to the display for presentment.
The example rendering pipeline may also include a third stage during which presentment of the rendered frame may be performed. For example, the display may perform a rendering task to facilitate the presentment of the rendered frame. In some examples, the display may monitor the buffer identified by the compositing component to determine when the rendering of the frame is complete and the rendered frame is ready for presentment. For example, the display may wait until the graphics processor completes performing the graphics processor rendering task before the display attempts to present the corresponding frame.
As used herein, the term “render” (and variants thereof) may refer to 3D rendering and/or 2D rendering. For example, the graphics processor may utilize OpenGL instructions to render 3D graphics surfaces, or may utilize OpenVG instructions to render 2D graphics surfaces. However, it should be appreciated that in additional or alternative examples, any standards, methods, or techniques for rendering graphics may be utilized by the graphics processor.
FIG. 1 is an example timing diagram 100 depicting active periods for an application processor 102, a graphics processor 104, a compositing component 106, and a display 108 operating on a frame (frame A) in a rendering pipeline. In the illustrated example of FIG. 1, the timing diagram 100 illustrates example stages of an example rendering pipeline for rendering a frame. For example, an application executing on the application processor 102 may generate a rendering instruction 110 for rendering a frame (e.g., the frame A) . In some examples, the application processor 102 and the graphics processor 104 may split certain of the rendering tasks associated with the rendering instruction. For example, the application processor 102 may perform application processor rendering tasks 112 and the graphics processor 104 may perform graphics processor rendering tasks 114. For example, after receiving the rendering instruction, the application processor 102 (or a component of the application processor 102) may be configured to perform the application processor rendering tasks 112 by generating rendering commands for the graphics processor 104 based on the rendering instruction and/or performing some level of rendering on a frame.
As shown in FIG. 1, after the application processor 102 completes performing the application processor rendering task 112, the graphics processor 104 begins performing the graphics  processor rendering tasks 114 for the frame. For example, the graphics processor 104 may be configured to execute the rendering commands generated by the application processor 102 and/or perform post-processing techniques on the frame rendered by the application processor 102. In some examples, the application processor rendering task 112 may include a command (e.g., a “swap buffer” command) that instructs the graphics processor 104 to begin performing the graphics processor rendering tasks 114.
In the illustrated example of FIG. 1, the compositing component 106 begins performing compositing tasks 116 after the application processor 102 completes performing the application processor rendering tasks 112. In some examples, the compositing tasks 116 may include configuring the display 108 to display a composited frame. For example, the application processor rendering tasks 112 executed by the application processor 102 may include a command (e.g., a “queue buffer” command) that indicates to the compositing component 106 that there is a frame being prepared for display (e.g., the application processor 102 has provided rendering commands to the graphics processor 104 for generating a rendered frame) . The compositing component 106 may then configure the display 108 for the displaying of the corresponding frame by passing information regarding the respective buffer to the display 108. For example, the graphics processor 104 may store the output of the graphics processor rendering tasks 114 in a first frame buffer and the performing of the compositing tasks 116 may include the compositing component 106 providing information identifying the first frame buffer to the display 108.
In the illustrated example of FIG. 1, the display 108 begins performing display rendering tasks 118 after the compositing component 106 completes performing the compositing tasks 116. In some examples, the display rendering tasks 118 may include determining when the graphics processor 104 has completed performing the graphics processor rendering tasks 114 (e.g., by monitoring the buffer identified by the compositing component 106 (e.g., via the compositing tasks 116) ) and displaying the rendered frame (e.g., frame A) .
In the illustrated example of FIG. 1, VSYNC pulses 120 are indicated by vertical lines in the figure. It should be appreciated that the VYSNC pulses 120 may be associated with a periodicity based on the refresh rate of the display 108. For example, a display with a 60 Hz refresh rate may have a VSYNC pulse period of 16.67 ms (e.g., 1/60) . That is, a duration between a first VSYNC pulse 120a and a second VSYNC pulse 120b may be 16.67 ms.
As used herein, a “VSYNC” is a pulse within a computing system that synchronizes certain events to the refresh cycle of the display. Applications may start drawing on a VSYNC boundary, and a compositing component (hardware or software) may start compositing on VSYNC boundaries. This allows for smooth application rendering (time-based animation) synchronized by the periodicity of the VSYNC pulse. In some examples, the VSYNC pulses 120 may be generated by the display 108. For example, the display 108 may generate a VSYNC pulse 120 after completing the performing of the display rendering tasks 118 (and/or as a step of the display rendering tasks 118) . In some examples, the VSYNC pulses 120 may instruct the application processor 102 to begin performing application processor rendering tasks 112 for a subsequent frame.
In the illustrated example of FIG. 1, the performing of the application processor rendering tasks 112 is initiated when the rendering instruction 110 is received (e.g., by the application processor 102 from an application) . The performing of the graphics processor rendering tasks 114 is initiated when the application processor rendering tasks 112 are completed. For example, the application processor rendering tasks 112 may include a command (e.g., a “swap buffer” command) that instructs the graphics processor 104 to begin performing the graphics processor rendering tasks 114.
As shown in FIG. 1, the performing of the compositing tasks 116 and the performing of the display rendering tasks 118 are synchronized with VSYNC pulses 120. For example, the compositing component 106 may wait until the next VSYNC pulse (e.g., a second VSYNC pulse 120b) after the application processor 102 completes performing the application processor rendering tasks 112 before performing the compositing tasks 116.
As described above, the display rendering tasks 118 may include the presentment of the rendered frame output by the graphics processor 104. However, the display 108 uses information provided by the compositing component 106 to determine which buffer to monitor (e.g., information provided via the compositing tasks 116) . Accordingly, the display 108 may wait until the next VSYNC pulse after the graphics processor 104 performs the graphics processor rendering tasks 114 and the compositing component 106 performs the compositing tasks 116 before performing the display rendering tasks 118. For example, in the illustrated example of FIG. 1, the display 108 waits until a third VSYNC pulse 120c before performing the display rendering tasks 118.
In the illustrated example of FIG. 1, a frame latency 150 corresponds to the period between when the frame A is displayed (e.g., at a fourth VSYNC pulse 120d) and when the rendering instruction 110 was received by the application processor 102. For example, in the illustrated example of FIG. 1, the frame latency 150 includes at least three VSYNC pulse periods and the duration between when the rendering instruction 110 is received and the first VSYNC pulse 120a.
FIG. 2 is another example timing diagram 200 depicting active periods for an application processor 202, a graphics processor 204, a compositing component 206, and a display 208 operating on a frame (frame A) in a rendering pipeline. One or more aspects of the application processor 202 may be implemented by the application processor 102 of FIG. 1. One or more aspects of the graphics processor 204 may be implemented by the graphics processor 104 of FIG. 1. One or more aspects of the compositing component 206 may be implemented by the compositing component 106 of FIG. 1. One or more aspects of the display 208 may be implemented by the display 108 of FIG. 1.
The example timing diagram 200 of FIG. 2 is similar to the example timing diagram 100 of FIG. 1 and includes application processor rendering tasks 212 executed by the application processor 202, graphics processor rendering tasks 214 executed by the graphics processor 204, compositing tasks 216 executed by the compositing component 206, and display rendering tasks 218 executed by the display 208. Furthermore, as shown in FIG. 2, the application processor rendering tasks 212 start when a rendering instruction 210 is received by the application processor 202 and the application processor rendering tasks 212 complete after a first VSYNC pulse 220a. As the performing of the graphics processor rendering tasks 214 is triggered via the completion of the application processor rendering tasks 212 (e.g., via a “swap buffer” command of application processor rendering tasks 212) , the graphics processor rendering tasks 214 start after the completion of the application processor rendering tasks 212. In the illustrated example of FIG. 2, the duration of the graphics processor rendering tasks 214 (e.g., the interval between when the graphics processor rendering tasks 214 complete and when the graphics processor rendering tasks 214 begin) causes the completion of the graphics processor rendering tasks 214 to occur after a second VSYNC pulse 220b.
As explained above, the performing of the compositing tasks 216 may also be triggered by the performing of the application processor rendering tasks 212. For example, the application  processor rendering tasks 212 executed by the application processor 202 may include a command (e.g., a “queue buffer” command) that indicates to the compositing component 206 that there is a frame being prepared for display (e.g., the application processor 202 has provided rendering commands to the graphics processor 204 for generating a rendered frame) . The compositing component 206 may then perform the compositing tasks 216 to configure the display 208 for the displaying of the corresponding frame by passing information regarding the respective buffer to the display 208. Accordingly, the performing of the compositing tasks 216 by the compositing component 206 may start at the next VSYNC pulse (e.g., the second VSYNC pulse 220b) after the application processor 202 performs the application processor rendering tasks 212 before performing the compositing tasks 216.
The display rendering tasks 218 may include the presentment of the rendered frame output by the graphics processor 204. However, the display 208 uses information provided by the compositing component 206 to determine which buffer to monitor (e.g., information provided via the compositing tasks 216) . Accordingly, the display 208 may wait until the next VSYNC pulse after the graphics processor 204 performs the graphics processor rendering tasks 214 and the compositing component 206 performs the compositing tasks 216 before performing the display rendering tasks 218. For example, in the illustrated example of FIG. 2, the display 208 waits until a third VSYNC pulse 220c before performing the display rendering tasks 218.
Similar to the example timing diagram 100 of FIG. 1, the example timing diagram 200 of FIG. 2 includes a frame latency 250 corresponding to the period between when the frame A is displayed (e.g., at a fourth VSYNC pulse 220d) and when the rendering instruction 210 was received by the application processor 202. For example, in the illustrated example of FIG. 2, the frame latency 250 includes at least three VSYNC pulse periods and the duration between when the rendering instruction 210 is received and the first VSYNC pulse 220a.
As shown in the example timing diagrams 100 and 200 of FIGs. 1 and 2, respectively, it should be appreciated that the start of the graphics processor rendering tasks may be triggered by the completion of the application processor rendering tasks and is not synchronized with a VSYNC pulse. Also, it should be appreciated that the start of the compositing tasks may also be triggered by the completion of the application processor rendering tasks, but is synchronized with a VSYNC pulse (e.g., the compositing component starts performing the compositing tasks at the next VSYNC pulse after the application processor rendering tasks are complete) . It  should also be appreciated that the start of the display rendering tasks is performed after the graphics processor rendering tasks and the compositing tasks are complete, but is also synchronized with a VSYNC pulse (e.g., the display start performing the display rendering tasks at the next VSYNC pulse after the graphics processor rendering tasks are complete and the compositing tasks are complete) .
Furthermore, as shown in FIG. 2, as the performing of the compositing tasks 216 is synchronized with a VSYNC pulse (e.g., the second VSYNC pulse 220b) , it should be appreciated that in some examples, the performing of the compositing tasks 216 may include the compositing component 206 being configured to operate in a work mode 216a and an idle mode 216b. For example, the compositing component 206 may be configured to operate in the work mode 216a at the start of the second VSYNC pulse 220b and may correspond to the duration of the compositing tasks 216 during which the compositing component 206 is performing the compositing tasks 216. The compositing component 206 may be configured to operate in the idle mode 216b after the work mode 216a and may correspond to the remaining duration of the respective VSYNC pulse period (e.g., the period between the second VSYNC pulse 220b and the third VSYNC pulse 220c) during which the compositing component 206 is not performing compositing tasks 216.
Thus, it should be appreciated that the example rendering pipeline depicted in the example timing diagrams 100 and 200 of FIGs. 1 and 2, respectively, may include inefficiencies resulting in a relatively large frame latency. For example, a relatively large duration corresponding to the idle mode 216b associated with the compositing tasks 216 may result in a relatively large frame latency. In some examples, a relatively large frame latency may be caused by a relatively large gap between when the graphics processor completes the graphics processor rendering tasks and the display is able to start the display rendering tasks.
It should be appreciated relatively large frame latency between when an application generates a rendering instruction for a frame and the corresponding frame is displayed via a display may result in a decreased user experience. For example, a user may be playing a game on a computing device and the game may cause a sequence of frames to be rendered. In some such examples, the game may generate a rendering instruction to render a first frame, but the computing device may be unable to display the corresponding first frame for three or more VSYNC pulse periods. Meanwhile, the game may continue to generate rendering instructions  to render subsequent frames of the sequence of frames. However, if the user interacts with a frame (e.g., via a touch input) , the frame that the user is interacting with may not be the correct frame for processing purposes.
For example, the frame latency 250 associated with the rendering pipeline of FIG. 2 may be four VYSNC pulse periods. In some such examples, the game may generate the rendering instruction 210 for the frame A during a first VSYNC pulse period, but the corresponding frame A may not be displayed by the display 208 until a fourth VYSNC pulse period. However, during the fourth VSYNC pulse period, the game may be processing a different subsequent frame (e.g., a frame D) . Accordingly, if the user provides a touch input during the fourth VSYNC pulse period, the game applies the touch input to the most recent frame (e.g., the frame D) , which may be a frame that the user has not been presented. Thus, the touch input is applied to an incorrect frame, which may result in a negative user experience for the user.
Examples disclosed herein provide techniques for reducing frame latency by improving the rendering pipeline. For example, disclosed techniques may modify the performing of the application processor rendering tasks to include an application processor sleep interval prior to the performing of the command (e.g., a “swap buffer” command) that instructs the graphics processor to begin performing the graphics processor rendering tasks. In some examples, the application processor sleep interval may be selected so that the completion of the graphics processor rendering tasks aligns with a VSYNC pulse. For example, disclosed techniques may estimate a duration for completing the graphics processor rendering tasks for a frame. Disclosed techniques may then select the application processor sleep interval and modify the performing of the application processor rendering tasks for the frame (e.g., via the application processor sleep interval) so that the start of the graphics processor rendering tasks for the frame is delayed, which may result in the completion of the graphics processor rendering tasks for the frame to align with a VSYNC pulse.
Example techniques disclosed herein may also modify the performing of the compositing tasks by delaying the start of the work mode of the compositing component during the performing of the compositing tasks during a VSYNC pulse period. For example, disclosed techniques may modify the performing of the compositing tasks to cause the compositing component to operate first in the idle mode for an compositing sleep duration and then operate in the work mode so that the completion of the compositing tasks aligns with a VSYNC pulse.
For example, disclosed techniques may estimate a duration for completing the compositing tasks for a frame. Disclosed techniques may then select the compositing sleep duration and modify the performing of the compositing tasks for the frame (e.g., via the compositing sleep duration) so that the start of the work mode of the compositing component for performing the compositing tasks for the frame is delayed, which may result in the completion of the compositing tasks for the frame to align with a VSYNC pulse. In some examples, the disclosed techniques may cause the completion of the compositing tasks and the graphics processor rendering tasks to align with the same VSYNC pulse.
Thus, it should be appreciated that by aligning the completion of the compositing tasks and the graphics processor rendering tasks to align with the same VSYNC pulse, the gap between when the graphics processor completes performing the graphics processor rendering tasks and the performing of the display rendering tasks may be reduced. For example, since disclosed techniques facilitate aligning the compositing completion of the compositing tasks and the graphics processor rendering tasks, when the graphics processor completes the performing of the graphics processor rendering tasks, the display may begin performing the display rendering tasks, including the presentment of the corresponding rendered frame. Accordingly, disclosed techniques may facilitate reducing frame latency due to the rendering pipeline from, for example, four or more VSYNC pulse periods to, for example, two or three VSYNC pulse periods.
Various aspects of systems, apparatuses, computer program products, and methods are described more fully hereinafter with reference to the accompanying drawings. This disclosure may, however, be embodied in many different forms and should not be construed as limited to any specific structure or function presented throughout this disclosure. Rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of this disclosure to those skilled in the art. Based on the teachings herein one skilled in the art should appreciate that the scope of this disclosure is intended to cover any aspect of the systems, apparatuses, computer program products, and methods disclosed herein, whether implemented independently of, or combined with, other aspects of the disclosure. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method which is practiced using other structure, functionality, or structure and  functionality in addition to or other than the various aspects of the disclosure set forth herein. Any aspect disclosed herein may be embodied by one or more elements of a claim.
Although various aspects are described herein, many variations and permutations of these aspects fall within the scope of this disclosure. Although some potential benefits and advantages of aspects of this disclosure are mentioned, the scope of this disclosure is not intended to be limited to particular benefits, uses, or objectives. Rather, aspects of this disclosure are intended to be broadly applicable to different wireless technologies, system configurations, networks, and transmission protocols, some of which are illustrated by way of example in the figures and in the following description. The detailed description and drawings are merely illustrative of this disclosure rather than limiting, the scope of this disclosure being defined by the appended claims and equivalents thereof.
Several aspects are presented with reference to various apparatus and methods. These apparatus and methods are described in the following detailed description and illustrated in the accompanying drawings by various blocks, components, circuits, processes, algorithms, and the like (collectively referred to as “elements” ) . These elements may be implemented using electronic hardware, computer software, or any combination thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.
By way of example, an element, or any portion of an element, or any combination of elements may be implemented as a “processing system” that includes one or more processors (which may also be referred to as processing units) . Examples of processors include microprocessors, microcontrollers, graphics processors, graphics processing units (GPUs) , general purpose GPUs (GPGPUs) , central processing units (CPUs) , application processors, digital signal processors (DSPs) , reduced instruction set computing (RISC) processors, systems-on-chip (SOC) , baseband processors, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , programmable logic devices (PLDs) , state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. One or more processors in the processing system may execute software. Software can be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines,  subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. The term application may refer to software. As described herein, one or more techniques may refer to an application (e.g., software) being configured to perform one or more functions. In such examples, the application may be stored on a memory (e.g., on-chip memory of a processor, memory, system memory, or any other memory) . Hardware described herein, such as a processor may be configured to execute the application. For example, the application may be described as including code that, when executed by the hardware, causes the hardware to perform one or more techniques described herein. As an example, the hardware may access the code from a memory and execute the code accessed from the memory to perform one or more techniques described herein. In some examples, components are identified in this disclosure. In such examples, the components may be hardware, software, or a combination thereof. The components may be separate components or sub-components of a single component.
Accordingly, in one or more examples described herein, the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise a random access memory (RAM) , a read-only memory (ROM) , an electrically erasable programmable ROM (EEPROM) , optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
In general, examples disclosed herein provide techniques for reducing frame latency. Example techniques may improve performance and reduce power consumption by reducing the interval between a rendering instruction for a frame being generated and the corresponding frame being presented. For example, disclosed techniques may estimate a duration for performing graphics processor rendering tasks for a frame and modify the performing of the application processor rendering tasks for the frame to delay the start of the graphics processor rendering tasks so that  the completion of the graphics processor rendering tasks aligns with a VSYNC pulse. Disclosed techniques may also estimate a duration for performing compositing tasks for the frame and cause the compositing component to first operate in an idle mode for a duration so that the duration during which the compositing component operates in the work mode results in the completion of the compositing tasks to align with the same VSYNC pulse as the graphics processor rendering tasks. Thus, it should be appreciated that examples disclosed herein provide techniques for reducing the frame latency associated with displaying a frame. For example, this disclosure describes techniques for graphics processing in any device that utilizes a rendering pipeline. Other example benefits are described throughout this disclosure.
As used herein, instances of the term “content” may refer to “graphical content, ” “image, ” and vice versa. This is true regardless of whether the terms are being used as an adjective, noun, or other parts of speech. In some examples, as used herein, the term “graphical content” may refer to content produced by one or more processes of a graphics processing pipeline. In some examples, as used herein, the term “graphical content” may refer to content produced by a processing unit configured to perform graphics processing. In some examples, as used herein, the term “graphical content” may refer to content produced by a graphics processing unit.
In some examples, as used herein, the term “display content” may refer to content generated by a processing unit configured to perform display processing. In some examples, as used herein, the term “display content” may refer to content generated by a display processing unit. Graphical content may be processed to become display content. For example, a graphics processing unit may output graphical content, such as a frame, to a buffer (which may be referred to as a framebuffer) . A display processing unit may read the graphical content, such as one or more frames from the buffer, and perform one or more display processing techniques thereon to generate display content. For example, a display processing unit may be configured to perform composition on one or more rendered layers to generate a frame. As another example, a display processing unit may be configured to compose, blend, or otherwise combine two or more layers together into a single frame. A display processing unit may be configured to perform scaling (e.g., upscaling or downscaling) on a frame. In some examples, a frame may refer to a layer. In other examples, a frame may refer to two or more layers that have already been blended together to form the frame (e.g., the frame includes two or more layers and the frame that includes two or more layers may subsequently be blended) .
FIG. 3 is a block diagram illustrating components of a device 300, in accordance with aspects of this disclosure. In the illustrated example of FIG. 3, the device 300 includes an application processor 310, a memory 320, a graphics processor 330, and a display 340. In some examples, the application processor 310, the memory 320, the graphics processor 330, and the display 340 may be in communication via one or more busses that may be implemented using any combination of bus structures and/or bus protocols.
In the illustrated example of FIG. 3, the application processor 310 may include one or more processors that are configured to execute an application 312, an application rendering component 314, and a compositing component 316. In some examples, the application processor 310 may be configured to execute instructions that cause the application processor 310 to perform one or more of the example techniques disclosed herein.
In the illustrated example of FIG. 3, the memory 320 may store one or more commands 322 and a rendered frame buffer 324. In some examples, the memory 320 may also store instructions that, when executed, cause the application processor 310, the graphics processor 330, and/or the display 340 to perform one or more of the example techniques disclosed herein.
In the illustrated example of FIG. 3, the graphics processor 330 may include one or more processors that are configured to render a frame. For example, the graphics processor 330 may be configured to execute one or more rendering commands to render a frame. In some examples, the graphics processor 330 may be configured to execute instructions that cause the graphics processor 330 to perform one or more of the example techniques disclosed herein.
In the illustrated example of FIG. 3, the display 340 may include a display panel, a display client, and/or a screen to facilitate presentment of a rendered frame. In some examples, the display 340 may be configured to execute instructions that cause the display 340 to perform one or more example techniques disclosed herein.
In the illustrated example, the application 312 may be a graphics application that may use the graphics processor 330 to render one or more graphics objects into an image or frame to be displayed (e.g., via the display 340) . For example, the application 312 may include operations that are performed via a rendering pipeline. In the illustrated example, the application 312 generates a rendering instruction to cause the rendering of a frame, such as example frame A of FIGs. 1 and/or 2. The rendering instruction is passed from the application 312 to the application rendering component 314.
In the illustrated example, the application rendering component 314 may be configured to perform the application processor rendering tasks disclosed herein. In some examples, the application processor rendering tasks may be based on the rendering instruction received from the application 312. For example, the application rendering component 314 may be configured to analyze the rendering instruction and generate one or more rendering commands that may be executed by the graphics processor 330. In some examples, the application processor rendering tasks may additionally or alternatively include rendering a frame based on the rendering instruction. In some examples, one or more aspects of the application rendering component 314 may be implemented by an application programming interface (API) and/or a driver. For example, an API may be a runtime service that translates the rendering instruction received from the application 312 into a format that is consumable by a driver and/or the graphics processor 330.
In the illustrated example, the application rendering component 314 stores the rendering commands in the commands buffer 322 of the memory 320. In some examples, the rendering commands may include draw call commands and/or other graphics commands to cause the graphics processor 330 to perform graphics operations to render one or more frames for presentment (e.g., via the display 340) . For example, a draw call command may instruct the graphics processor 330 to render an object defined by a group of one or more vertices stored in the memory 320 (e.g., in a vertices buffer) . The geometry defined by the group of one or more vertices may, in some examples, correspond to one or more primitives (e.g., points, lines, triangles, patches, etc. ) to be rendered. In general, a draw call command may cause the graphics processor 330 to render all of the vertices stored in a section of the memory 320 (e.g., in the vertices buffer) .
In some examples, the application rendering component 314 performing the application processor rendering tasks may include the application rendering component 314 providing a command (e.g., a “swap buffer” command) to the graphics processor 330 that instructs the graphics processor 330 to begin performing the graphics processor rendering tasks associated with a respective frame. In some examples, the application rendering component 314 performing the application processor rendering tasks may include the application rendering component 314 providing a command (e.g., a “queue buffer” command) to the compositing component 316 that indicates to the compositing component 316 that there is a frame being  prepared for display (e.g., the application rendering component 314 provided a command to the graphics processor 330 to perform graphics processor rendering tasks to render a frame) .
It should be appreciated that in some examples, the application rendering component 314 may be implemented via a library, such as an application processor-side-application-rendering library. For example, the application rendering component 314 may be configured to use the graphics processor 330 to provide hardware acceleration by using an application programming interface (API) , such as the OpenGLES API. In this manner, the application processor 310 may use the graphics processor 330 to perform hardware accelerated application rendering.
In the illustrated example, the compositing component 316 may be configured to perform one or more compositing tasks disclosed herein. For example, the compositing component 316 may be configured to receive a command (e.g., the “queue buffer” command) from the application rendering component 314 that indicates to the compositing component 316 that there is a frame being prepared (e.g., rendered) for display. The compositing component 316 may then configure the display 340 for the displaying of the corresponding frame by passing information regarding the respective buffer to the display 340. For example, the graphics processor 330 may store the output of the graphics processor rendering tasks in the rendered frame buffer 324 and the performing of the compositing tasks by the compositing component 316 may include the compositing component 316 providing information identifying the rendered frame buffer 324 (and/or a location of the rendered frame buffer corresponding to the rendered frame) to the display 340.
As used herein, a compositing component (sometimes referred to as a “compositing engine, ” “composition engine, ” “compositing hardware, ” or “composition hardware” ) refers to an analogue or digital circuit that programs display hardware to display rendered frame data or animation data to a display (e.g., the display 340) . The compositing component may include an input for the rendered data and an output for the data and/or instructions to the display hardware (e.g., the display 340) . In some examples, the compositing component may reside in hardware or may be implemented in software running on the application processor 310.
It should be appreciated that in some examples, a “Surface Flinger” (SF) (sometimes referred to as a “Surface Flinger component” or a “Surface Flinger engine” ) is a software equivalent of the compositing component running at a user-space level in the application processor (e.g.,  CPU) in the ANDROID operating system. The Surface Flinger may additionally or alternatively reside at a kernel level of an application processor (e.g., a CPU) .
It should be appreciated that in some examples, composition functionality and/or programming of the display hardware (e.g., the display 340) may be distributed between two or more of hardware components, software components, and/or firmware components.
In the illustrated example of FIG. 3, the graphics processor 330 may be configured to perform the graphics processor rendering tasks disclosed herein. For example, the graphics processor 330 may be configured to execute the rendering commands stored in the commands buffer 322 and render a frame. It should be appreciated that in some examples, one or more aspects of the graphics processor rendering tasks may be implemented via a graphics processing pipeline. In the illustrated example, the graphics processor 330 stores the rendered frame in the rendered frame buffer 324 of the memory 320.
In some examples, the graphics processor 330 may monitor for a command (e.g., the “swap buffer” command” ) received from the application rendering component 314 before starting to perform the graphics processor rendering tasks. For example, as described above, the application processor rendering tasks performed by the application rendering component 314 may include providing the swap buffer command to the graphics processor 330 after the application rendering component 314 generates and stores the rendering commands for a frame in the commands buffer 322. In some such examples, the graphics processor 330 may monitor commands received from the application processor 310 and/or the application rendering component 314 for the swap buffer command.
In some examples, the graphics processor 330 may be configured to generate an indication at the completion of the performing of the graphics processor rendering tasks to indicate to the application processor 310 and/or the display 340 that the rendering of the frame is complete. In some such examples, the indication may indicate to the display 340 that a rendered frame is stored at the rendered frame buffer 324 and that the rendered frame is available for presentment.
It should be appreciated that in some examples, the indication generated by the graphics processor 330 may be associated with a synchronization fence that is available to the application rendering component 314. For example, when the application rendering component 314 issues the queue buffer command (e.g., to the compositing component 316) , the respective  buffer may be associated with a synchronization fence. In some such examples, the synchronization fence may indicate to components of the device 300 that the graphics processor 330 is operating on a buffer (e.g., the rendered frame buffer 324) and to prevent the other components of the device 300 from also operating on the buffer. For example, the application rendering component 314 may also issue a command enabling a synchronization fence for the buffer. In this manner, the application processor 310 and the graphics processor 330 may perform tasks concurrently (e.g., in parallel or nearly in parallel) and without concerns of performing tasks that overlap on the buffer. In some examples, when the graphics processor 330 completes writing to the buffer (e.g., completes performing the graphics processor rendering tasks) , an indication may be signaled indicating that the corresponding buffer is available and that the corresponding synchronization fence is disabled. For example, the display 340 may monitor for the indication to determine when the rendered frame stored at the buffer is ready for presentment. In some examples, the graphics processor 330 may signal the indication indicating that the synchronization fence is disabled.
In the illustrated example of FIG. 3, the display 340 may be configured to perform one or more display rendering tasks disclosed herein. For example, the display 340 may be configured to display the rendered frame. In some examples, the display 340 may be configured to receive information from the compositing component 316 that identifies a rendered frame buffer (and/or a location of the rendered frame buffer corresponding to the rendered frame) . In some such examples, the display 340 may be configured to monitor for an indication that the rendering of the frame is complete and that the rendered frame is available at the identified rendered frame buffer for presentment. For example, the display 340 may monitor for an indication that a synchronization fence associated with the identified buffer is disabled. In some examples, after receiving the indication that the synchronization fence is disabled, the display 340 may be configured to display the corresponding rendered frame. For example, the display 340 may access the rendered frame at the rendered frame buffer 322 for presentment.
In some examples, after the presentment of the rendered frame, the display 340 may generate a VSYNC pulse. In some examples, the VSYNC pulse may indicate, for example, that the corresponding buffer is available. For example, when executing the rendering pipeline, a frame may be associated with a corresponding buffer. For example, when performing the application processor rendering tasks for a frame, the application rendering component 314 may designate  a rendered frame buffer 324 for storing the rendered frame. When performing the graphics processor rendering tasks, the graphics processor 330 may store the rendered frame in the designated rendered frame buffer 324. When performing the compositing tasks, the compositing component 316 may provide information to the display 340 that identifies the designated rendered frame buffer 324. When performing the display rendering tasks, the display 340 may monitor for an indication that the designated rendered frame buffer 324 is available for presentment. In some such examples, generating the VSYNC pulse after the presentment of the rendered frame enables the application rendering component 314 to determine that the designated rendered frame buffer 324 may be designated for storing a subsequent rendered frame.
It should be appreciated that the generating of the VSYNC pulse may be a periodic occurrence, may be an a-periodic occurrence, may be a one-time occurrence, and/or may be an event-based occurrence. For example, the occurrences of the VYSNC pulses may be associated with a periodicity based on the refresh rate of the display 340. For example, a display with a 60 Hz refresh rate may have a VSYNC pulse period of 16.67 milliseconds (ms) (e.g., 1/60) . That is, a duration between a first VSYNC pulse and a second VSYNC pulse may be 16.67 ms.
In operation, the application 312 may generate a rendering instruction to facilitate rendering a frame. The application rendering component 314 may receive the rendering instruction and start performing application processor rendering tasks (e.g., the example application processor rendering tasks 112 of FIG. 1 and/or the example application processor rendering tasks 212 of FIG. 2) . For example, the application rendering component 314 may designate a buffer for storing the rendered frame (e.g., may enable a synchronization fence for the designated buffer) , may generate rendering commands 322 for execution by the graphics processor 330, may send a queue buffer command to the compositing component 316 (e.g., to identify the designated buffer) , and may send a swap buffer command to the graphics processor 330 to indicate to the graphics processor 330 that the graphics processor 330 may start performing the graphics processor rendering tasks.
The graphics processor 330 may receive the swap buffer command and start performing graphics processor rendering tasks (e.g., the example graphics processor rendering tasks 114, 214) . For example the graphics processor 330 may execute the rendering commands 322 associated with the frame, may write the rendered frame to the designated buffer, and may  signal an indication when the writing to the designated buffer is complete (e.g., may disable the corresponding synchronization fence) .
The compositing component 316 may receive the queue buffer command and start performing compositing tasks (e.g., the example compositing tasks 116, 216) . For example, the composing component 316 may provide information identifying the designated buffer to the display 340, and may perform compositing of the rendered frame.
The display 340 may receive the information identifying the designated buffer and start performing display rendering tasks (e.g., the example display rendering tasks 118, 218) . For example, the display 340 may monitor for the indication that writing the designated buffer is complete, may display the rendered frame, and may generate a VSYNC pulse.
As described above, the interval between when the application 312 generates the rendering instruction to render a frame and the display 340 displays the corresponding rendered frame may be referred to as frame latency. In some examples, and as described above in connection with FIGs. 1 and/or 2, the compositing component 316 may synchronize the performing of the compositing tasks based on occurrences of VSYNC pulses. For example, the display 340 may generate a VSYNC pulse that is received by the compositing component 316. In some such examples, the compositing component 316 may begin the performing of the compositing tasks after the VSYNC pulse is received. For example, the compositing component 316 may receive a queue buffer command from the application rendering component 314 and wait for receipt of a subsequent VSYNC pulse before starting the performing of the compositing tasks. However, as described above, in some examples, the compositing component 316 may be capable of performing the compositing tasks in a duration that is less than the VSYNC pulse period, which may result in the compositing component 316 operating in a work mode and an idle mode.
Furthermore, the graphics processor 330 may begin performing graphics processor rendering tasks after receiving the swap buffer command from the application rendering component 314 and the display 340 may not begin displaying a rendered frame until the display 340 receives an indication that the performing of the graphics processor rendering tasks is complete (e.g., until the synchronization fence associated with the corresponding buffer is disabled) . In some examples, and as described above in connection with FIGs. 1 and/or 2) , the display 340 may synchronize the presentment of a rendered frame based on occurrence of VSYNC pulses. For  example, the display 340 may receive an indication indicating that the rendered frame is ready for presentment and wait for the next VSYNC pulse before starting the presentment of the rendered frame. However, as described above, in some examples, the performing of the graphics processor rendering tasks may not be synchronized with a VSYNC pulse, which may result in an interval after the graphics processor 330 completes performing the graphics processor rendering tasks and before the display 340 begins performing the display rendering tasks.
Examples disclosed herein provide techniques for reducing frame latency by improving the timing of performing tasks within the rendering pipeline. For example, disclosed techniques facilitate modifying the performing of certain tasks so that the completion of the graphics processor rendering tasks and the completion of the compositing tasks may be synchronized with the same VSYNC pulse. As described above, the display rendering tasks depend on the completion of the graphics processor rendering tasks (e.g., the display waits for the rendered frame to be stored in the rendered frame buffer 324 before displaying the rendered frame) and waits for information from the compositing tasks (e.g., information identifying which buffer to monitor for completion before presentment of the rendered frame) . By aligning the completing of the graphics processor rendering tasks and the completion of the compositing tasks to be synchronized with the same VSYNC pulse, disclosed techniques facilitate reducing wait time between the graphics processor rendering tasks completing and the display rendering tasks starting (as shown in the example timing diagrams 100, 200 of FIGs. 1 and 2, respectively) .
FIG. 4 is an example timing diagram 400 depicting active periods for an application processor 402, a graphics processor 404, a compositing component 406, and a display 408 operating on a frame (frame A) in a rendering pipeline, in accordance with one or more techniques of this disclosure. One or more aspects of the application processor 402 may be implemented by the application rendering component 314 of FIG. 3. One or more aspects of the graphics processor 404 may be implemented by the graphics processor 330 of FIG. 3. One or more aspects of the compositing component 406 may be implemented by the compositing component 316 of FIG. 3. One or more aspects of the display 408 may be implemented by the display 340 of FIG. 3.
The example timing diagram 400 of FIG. 4 is similar to the example timing diagrams 100, 200 of FIGs. 1 and 2, respectively, and includes application processor rendering tasks 412 executed  by the application processor 402, graphics processor rendering tasks 414 executed by the graphics processor 404, compositing tasks 416 executed by the compositing component 406, and display rendering tasks 418 executed by the display 408. Furthermore, as shown in FIG. 4, the application processor rendering tasks 412 start when a rendering instruction 410 is received by the application processor 402 and the application processor rendering tasks 412 complete after a first VSYNC pulse 420a. As the performing of the graphics processor rendering tasks 414 is triggered via the completion of the application processor rendering tasks 412 (e.g., via a “swap buffer” command of application processor rendering tasks 412) , the graphics processor rendering tasks 414 start after the completion of the application processor rendering tasks 412. In the illustrated example of FIG. 4, the start time of the graphics processor rendering tasks 414 and the duration of the graphics processor rendering tasks 414 (e.g., the interval between when the graphics processor rendering tasks 414 complete and when the graphics processor rendering tasks 414 start) causes the completion of the graphics processor rendering tasks 414 to align with the occurrence of a second VSYNC pulse 420b. Furthermore, as shown in FIG. 4, the completion of the compositing tasks 416 aligns with the occurrence of the second VSYNC pulse 420b. The performing of the display rendering tasks 418 may then begin after the occurrence of the second VSYNC pulse 420b as the display 408 receives information identifying a buffer configured to store the rendered frame A (e.g., during the performing of the compositing tasks 416) and also receives an indication that the rendered frame A is stored in the identified buffer (e.g., during the performing of the graphics processor rendering tasks 414) .
To facilitate the aligning of the completion of the graphics processor rendering tasks 414 and the compositing tasks 416 with the occurrence of the second VSYNC pulse 420b, example techniques disclosed herein facilitate modifying the performing of the application processor rendering tasks 412. For example, disclosed techniques modify the performing of the application processor rendering tasks 412 to include an application processor sleep duration during the performing of the application processor rendering tasks. The example application processor sleep duration may be configured to delay the end of the application processor rendering tasks and, thus, to delay the start of the performing of the graphics processor rendering tasks (e.g., by delaying the transmitting of the swap buffer command from the application processor to the graphics processor) .
Example techniques disclosed herein may also modify the performing of the compositing tasks 416 by delaying the start of the work mode of the compositing component 406 during the performing of the compositing tasks 416 during a VSYNC pulse period (e.g., the period between the occurrence of the second VSYNC pulse 420b and the occurrence of the first VSYNC pulse 420a) . For example, disclosed techniques may modify the performing of the compositing tasks 416 to cause the compositing component 406 to operate first in the idle mode for an compositing sleep duration and then operate in the work mode so that the completion of the compositing tasks 416 aligns with the occurrence of the second VSYNC pulse 420b.
FIG. 5 is an example timing diagram 500 depicting active periods for an application processor 502, a graphics processor 504, a compositing component 506, and a display 508 operating on a frame (frame A) in a rendering pipeline, in accordance with one or more techniques of this disclosure. One or more aspects of the application processor 502 may be implemented by the application rendering component 314 of FIG. 3 and/or the application processor 402 of FIG. 4. One or more aspects of the graphics processor 504 may be implemented by the graphics processor 330 of FIG. 3 and/or the graphics processor 404 of FIG. 4. One or more aspects of the compositing component 506 may be implemented by the compositing component 316 of FIG. 3 and/or the compositing component 406 of FIG. 4. One or more aspects of the display 508 may be implemented by the display 340 of FIG. 3 and/or the display 408 of FIG. 4.
The example timing diagram 500 of FIG. 5 is similar to the example timing diagram 400 of FIG. 4 and includes application processor rendering tasks 512 executed by the application processor 502, graphics processor rendering tasks 514 executed by the graphics processor 504, compositing tasks 516 executed by the compositing component 506, and display rendering tasks 518 executed by the display 508. Furthermore, as shown in FIG. 5, the application processor rendering tasks 512 start when a rendering instruction 510 is received by the application processor 502 and the application processor rendering tasks 512 complete after a first VSYNC pulse 520a.
To facilitate the aligning of the completion of the graphics processor rendering tasks 514 and the compositing tasks 516 with the occurrence of the second VSYNC pulse 520b, example techniques disclosed herein facilitate modifying the performing of the application processor rendering tasks 512. For example, disclosed techniques modify the performing of the application processor rendering tasks 512 to include an application processor sleep duration  513 during the performing of the application processor rendering tasks 512. The example application processor sleep duration 513 may be configured to delay the end of the application processor rendering tasks 512 and, thus, to delay the start of the performing of the graphics processor rendering tasks 514 (e.g., by delaying the transmitting of the swap buffer command from the application processor 502 to the graphics processor 504) .
In some such examples, the application processor rendering tasks 512 may be modified so that the transmitting of the swap buffer command that instructs the graphics processor 504 to begin performing the graphics processor rendering tasks 504 occurs after the application processor sleep duration 513. In some examples, the duration of the application processor sleep duration 513 may be selected so that the completion of the graphics processor rendering tasks 514 aligns with a VSYNC pulse (e.g. the occurrence of the second VSYNC pulse 520b) .
Referring back to the illustrated example of FIG. 3, to facilitate the aligning of the completion of the graphics processor rendering tasks with the occurrence of a VYSNC pulse, the example application rendering component 314 of FIG. 3 includes an example application processor delay component 350 configured to delay the transmitting of the swap buffer command. For example, the application processor delay component 350 may determine a application processor delay duration and include the application processor delay duration in the performing of the application processor rendering tasks so that the transmitting of the swap buffer command from the application rendering component 314 to the graphics processor 330 is delayed.
In some examples, to determine the duration of the application processor delay to include in the performing of the application processor rendering tasks, the example application processor delay component 350 may estimate a duration of performing the graphics processor rendering tasks. For example, the application processor delay component 350 may sample the duration of performing the graphics processor rendering tasks for a quantity of previous frames and estimate a duration of performing the graphics processor rendering tasks for the current frame. In some examples, the quantity of previous frames may be a preconfigured quantity, such as five frames, ten frames, or any other suitable quantity of frames. In some examples, the quantity of previous frames may vary.
In some examples, the application processor delay component 350 may estimate the duration of performing the graphics processor rendering tasks for the current frame by selecting the  longest duration of the sampled durations. In some examples, the application processor delay component 350 may include a buffer duration (e.g., a padding duration) to the estimated duration. For example, the application processor delay component 350 may estimate the duration of performing the graphics processor rendering tasks for the current frame by sampling the duration of performing the graphics processor rendering tasks for five previous frames, selecting the duration of the five sampled durations that is the longest duration (e.g., a maximum duration) , and adding a buffer duration to the selected duration. However, it should be appreciated that in other examples, additional or alternative techniques for estimating the duration of performing the graphics processor rendering tasks for the current frame based on sampled durations of previous frames may be used. For example, the application processor delay component 350 may estimate the duration of performing the graphics processor rendering tasks for the current frame by averaging the durations of the sampled durations.
In some examples, to sample the duration of performing the graphics processor rendering tasks for a previous frame, the application processor delay component 350 may maintain information regarding the start time of the performing of the graphics processor rendering tasks for the previous frame and the stop time of the performing of the graphics processor rendering tasks for the previous frame. For example, when the application rendering component 314 transmits the swap buffer command to the graphics processor 330 for a frame N, the application processor delay component 350 may record a timestamp associated with the start of the graphics processor rendering tasks for the frame N. The application processor delay component 350 may also monitor for the indication from the graphics processor 330 disabling the synchronization fence associated with the buffer to which the graphics processor 330 stores the rendered frame N. For example, when the application processor delay component 350 detects the indication disabling the synchronization fence associated with the respective buffer, the application processor delay component 350 may record a timestamp associated with the completion of the graphics processor rendering tasks for the frame N. The example application processor delay component 350 may then calculate the difference between the recorded completion timestamp and the start timestamp for the frame N to sample the duration of the performing of the graphics processor rendering tasks for the frame N.
In some examples, after the application processor delay component 350 estimates the duration for performing the graphics processor rendering tasks, the application processor delay  component 350 may determine the application processor sleep duration to facilitate aligning the end of the completion of the graphics processor rendering tasks with a VSYNC pulse. For example, the application processor delay component 350 may determine the application processor sleep duration based on the VSYNC pulse period, the estimated duration of performing the graphics processor rendering tasks, and the duration of performing the unmodified application processor rendering tasks. For example, the application processor delay component 350 may calculate a start time for the performing of the graphics processor rendering tasks, relative to an occurrence of a VSYNC pulse, based on a difference in the VSYNC pulse period and the estimated duration of performing the graphics processor rendering tasks. The application processor delay component 350 may then calculate the application processor sleep duration as a difference between the start time for the performing of the graphics processor rendering tasks and the end time of the performing of the unmodified application processor rendering tasks.
As an illustrative example, and referring to FIG. 5, the example timing diagram 500 illustrates the application processor 502 (e.g., the application rendering component 314) receiving the rendering instruction 510 at a time t0 and initiating the performing of the application processor rendering tasks 512. In the illustrated example, the completion of the unmodified application processor rendering tasks 512 (e.g., the application processor rendering tasks without the application processor delay duration) occurs at a time t2. The estimated duration of the graphics processor rendering tasks is the difference between time t5 and time t3, and the start time of the graphics processor rendering tasks is at the time t3. The application processor delay component 350 may then determine that the application processor delay duration is the duration between the time t3 (e.g., the determined start time for performing the graphics processor rendering tasks 514) and the time t2 (e.g., the completion time of the unmodified application processor rendering tasks 512) . After determining the application processor delay duration 513, the example application processor delay component 350 may then modify the application processor rendering tasks so that the application processor delay duration 513 is included in the application processor rendering tasks and is performed before the transmitting of the swap buffer command to the graphics processor to trigger the start of the graphics processor rendering tasks.
In some examples, the application processor delay component 350 may also modify the performing of the application processor rendering tasks so that the transmitting of the queue buffer command from the application rendering component 314 to the compositing component 316 is transmitted when the queue buffer command is transmitted from the application rendering component 314 to the graphics processor 330. That is, in some examples, the application processor delay component 350 may also delay the transmitting of the queue buffer command to facilitate delaying the start of the work mode of the compositing component 316.
For example, as shown in FIG. 5, disclosed techniques may cause the compositing component 506 to wait an compositing sleep duration 517 prior to performing the compositing tasks 516 for the current frame (e.g., frame A) . By waiting the compositing sleep duration 517 prior to starting the performing of the compositing tasks 516, disclosed techniques enable the completion of the performing of the compositing tasks 516 to align with the occurrence of a VSYNC pulse (e.g., the second example VSYNC pulse 520b) . For example, disclosed techniques may modify the performing of the compositing tasks to cause the compositing component to operate first in the idle mode for an compositing sleep duration and then operate in the work mode so that the completion of the compositing tasks aligns with a VSYNC pulse.
Referring back to the illustrated example of FIG. 3, to facilitate the aligning of the completion of the compositing tasks with the occurrence of a VYSNC pulse, the example compositing component 316 of FIG. 3 includes an example compositing sleep delay component 352 configured to delay the performing of the compositing tasks within a VSYNC cycle. For example, the compositing sleep delay component 352 may determine an compositing sleep duration and include the compositing sleep duration in the performing of the compositing tasks so that the compositing component 316 operates in the idle mode for the compositing sleep duration and then operates in the work mode to facilitate performing and completing the performing of the compositing tasks at the occurrence of a VSYNC pulse.
In some examples, to determine the duration of the compositing sleep delay, the example compositing sleep delay component 352 may estimate a duration of performing the compositing tasks. For example, the compositing sleep delay component 352 may sample the duration of performing the compositing tasks for a quantity of previous frames and estimate a duration of performing the compositing tasks for the current frame. In some examples, the quantity of previous frames may be a preconfigured quantity, such as five frames, ten frames,  or any other suitable quantity of frames. In some examples, the quantity of previous frames may vary.
In some examples, the compositing sleep delay component 352 may estimate the duration of performing the compositing tasks for the current frame by selecting the longest duration of the sampled durations. In some examples, the compositing sleep delay component 352 may include a buffer duration (e.g., a padding duration) to the estimated duration. For example, the compositing sleep delay component 352 may estimate the duration of performing the compositing tasks for the current frame by sampling the duration of performing the compositing tasks for five previous frames, selecting the duration of the five sampled durations that is the longest duration (e.g., a maximum duration) , and adding a buffer duration to the selected duration. However, it should be appreciated that in other examples, additional or alternative techniques for estimating the duration of performing the compositing tasks for the current frame based on sampled durations of previous frames may be used. For example, the compositing sleep delay component 352 may estimate the duration of performing the compositing tasks for the current frame by averaging the durations of the sampled durations.
In some examples, to sample the duration of performing the compositing tasks for a previous frame, the compositing sleep delay component 352 may maintain information regarding the start time of the performing of the compositing tasks for the previous frame and the stop time of the performing of the compositing tasks for the previous frame. For example, when the compositing component 316 starts performing the compositing tasks for a frame N, the compositing sleep delay component 352 may record a timestamp associated with the start of the compositing tasks for the frame N. The compositing sleep delay component 352 may also monitor for an indication of when the performing of the compositing tasks for the frame N is complete. For example, when the compositing sleep delay component 352 detects the indication that the performing of the compositing tasks is complete, the compositing sleep delay component 352 may record a timestamp associated with the completion of the compositing tasks for the frame N. The example compositing sleep delay component 352 may then calculate the difference between the recorded completion timestamp and the start timestamp for the frame N to sample the duration of the performing of the compositing tasks for the frame N.
In some examples, after the compositing sleep delay component 352 estimates the duration for performing the compositing tasks for the current frame, the compositing sleep delay component 352 may determine the compositing sleep duration to facilitate aligning the end of the completion of the compositing tasks with a VSYNC pulse. For example, the compositing sleep delay component 352 may determine the compositing sleep duration based on the VSYNC pulse period and the estimated duration of performing the compositing tasks. For example, the compositing sleep delay component 352 may calculate a start time for the performing of the compositing tasks, relative to an occurrence of a VSYNC pulse, based on a difference in the VSYNC pulse period and the estimated duration of performing the compositing tasks. The compositing sleep delay component 352 may then calculate the compositing sleep duration as a difference between the start time for the performing of the compositing tasks and the occurrence of a previous VSYNC pulse.
As an illustrative example, and referring to FIG. 5, the example timing diagram 500 illustrates that the estimated duration of the compositing tasks is the difference between time t5 and time t4, and the start time of the compositing tasks is at the time t4. The compositing sleep delay component 352 may then determine that the compositing sleep duration is the difference between the VSYNC pulse period and the estimated duration of the compositing tasks. The example compositing sleep delay component 352 may then determine the start time of the compositing tasks 516 based on the compositing sleep duration and a timestamp associated with a previous VSYNC pulse (e.g., the occurrence of the first VSYNC pulse 520a at the time t1) .
Thus, it should be appreciated that by aligning the completion of the compositing tasks and the graphics processor rendering tasks to align with the same VSYNC pulse, the gap between when the graphics processor completes performing the graphics processor rendering tasks and the performing of the display rendering tasks may be reduced. For example, since disclosed techniques facilitate aligning the compositing completion of the compositing tasks and the graphics processor rendering tasks, when the graphics processor completes the performing of the graphics processor rendering tasks, the display may begin performing the display rendering tasks, including the presentment of the corresponding rendered frame. Accordingly, disclosed techniques may facilitate reducing frame latency due to the rendering pipeline from, for  example, four or more VSYNC pulse periods to, for example, two or three VSYNC pulse periods.
Referring again to FIG. 3, in operation, the example device 300 may include the application processor 310 configured to execute the application 312, which may generate a rendering instruction to facilitate the rendering of a frame. The application rendering component 314 may receive the rendering instruction. In some examples, the application rendering component 314 may determine whether to employ example frame latency reducing techniques disclosed herein. For example, the application rendering component 314 (and/or the application processor delay component 350) may access a data structure (e.g., a file, a list, etc. ) that indicates which applications are permitted to utilize the example frame latency reducing techniques disclosed herein and/or which applications are not permitted to utilize the example frame latency reducing techniques disclosed herein. For example, applications that may be permitted to access the example frame latency reducing techniques disclosed herein may include applications that employ frame rendering, while applications that may not be permitted to access the example frame latency reducing techniques disclosed herein may include applications that do not employ frame rendering. In some examples, the data structure may include a white list of applications that are permitted to use the example frame latency reducing techniques disclosed herein. However, it should be appreciated that in other examples, additional or alternative techniques for determining which applications may use the example frame latency reducing techniques disclosed herein and/or which applications may not use the example frame latency reducing techniques disclosed herein may also be used. In some examples, when an application is determined to be permitted to employ the example frame latency reducing techniques disclosed herein, the application rendering component 314 (and/or the application processor delay component 350) may enable a frame latency reducing indicator that may be used by the application processor delay component 350 to determine whether to modify the performing of the application processor rendering tasks and/or may be used by the compositing sleep delay component 352 to determine whether to perform the compositing task after an compositing sleep duration during a VSYNC cycle. In some examples, the application rendering component 314 (and/or the application processor delay component 350) may disable the frame latency reducing indication when an application is determined to be not permitted to employ the example frame latency reducing techniques disclosed herein.
After receiving the rendering instruction, the example application rendering component 314 may start performing the application processor rendering tasks associated with rendering a frame. In some examples (e.g., when the frame latency reducing indicator is enabled) , the application processor delay component 350 may be configured to select a start time for performing graphics processor rendering tasks for the frame. For example, the application processor delay component 350 may estimate a duration for performing the graphics processor rendering tasks for the current frame and determine a start time for performing the graphics processor rendering tasks based on the estimated duration for performing the graphics processor rendering tasks. Based on the selected start time, the application processor delay component 350 may be configured to delay the transmitting of the swap buffer command to the graphics processor 330 to trigger the graphics processor 330 to start performing the graphics processor rendering tasks. In some examples, the application processor delay component 350 may also be configured to delay the transmitting of the queue buffer command to the compositing component 316 to delay the start of the work mode of the compositing component 316 when performing the compositing tasks for the frame.
After receiving the swap buffer command, the example graphics processor 330 may start performing the graphics processor rendering tasks. In some examples, the application processor delay component 350 may record a timestamp associated with the start of the graphics processor 330 performing the graphics processor rendering tasks. At the completion of the performing of the graphics processor rendering tasks, the graphics processor 330 may store the rendered frame in the rendered frame buffer 324 and signal an indication to disable the synchronization fence associated with the rendered frame buffer 324. In some examples, the application processor delay component 350 may record a timestamp associated with the indication indicating that the synchronization fence associated with the frame and the rendered frame buffer 324 is disabled.
The example compositing component 316 may start performing the compositing tasks after receiving the queue buffer command from the application rendering component 314. In some examples (e.g., when the frame latency reducing indicator is enabled) , the compositing sleep delay component 352 may be configured to delay the start of the work mode of the compositing component 316 during the performing of the compositing tasks.
For example, the compositing sleep delay component 352 may estimate a duration for performing the compositing tasks for the current frame and determine a start time for performing the compositing tasks based on the estimated duration for performing the compositing tasks. Based on the selected start time, the compositing sleep delay component 352 may be configured to cause the compositing component 316 to start the VSYNC cycle in the idle mode and then cause the compositing component 316 to transition to the work mode at the selected start time.
The example display 340 may be configured to start the performing of the display rendering tasks. In some examples, the completion of the graphics processor rendering tasks and the compositing tasks may be aligned with the occurrence of a VSYNC pulse and the start of the display rendering tasks may be aligned with the occurrence of the same VSYNC pulse.
FIG. 6 illustrates an example flowchart 600 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, the example application processor delay component 350, the example compositing component 316, the example compositing sleep delay component 352, the example memory 320, the example graphics processor 330, and/or the example display 340.
At 602, the apparatus may perform application processor rendering tasks for rendering a frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application rendering component 314 may be configured to perform the application processor rendering tasks 412, 512 of FIGs. 4 and 5, respectively.
At 604, the apparatus may select a start time for performing graphics processor rendering tasks (e.g., performed by a GPU) for the frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may be configured to determine the start time for performing the graphics processor rendering tasks for the frame. Example techniques for selecting the start time for performing the graphics processor rendering tasks are described in connection with FIGs. 7 and 8.
At 606, the apparatus may perform the graphics processor rendering tasks for the frame at the selected start time, as described in connection with the examples of FIGs. 3, 4, and/or 5. For  example, the graphics processor 330 may be configured to perform the graphics  processor rendering tasks  414, 514.
At 608, the apparatus may select a start time for performing compositing tasks for the frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may be configured to determine the start time for performing the compositing tasks for the frame. Example techniques for selecting the start time for performing the compositing tasks are described in connection with FIGs. 9 and 10.
At 610, the apparatus may perform the compositing tasks for the frame at the selected start time, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing component 316 may be configured to perform the  compositing tasks  416, 516.
At 612, the apparatus may perform the display rendering tasks for the frame at the selected start time, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the display 340 may be configured to perform the  display rendering tasks  418, 518.
FIG. 7 illustrates an example flowchart 700 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, and/or the example application processor delay component 350. The example flowchart 700 may facilitate selecting the start time for performing graphics processor rendering tasks.
At 702, the apparatus may estimate a duration for performing the graphics processor rendering tasks for a current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may be configured to estimate the duration for performing the graphics processor rendering asks for the current frame. Example techniques for estimating the duration for performing the graphics processor rendering tasks for the current frame are described in connection with FIG. 8.
At 704, the apparatus may determine a application processor sleep duration based on a difference in a VSYNC pulse period and the estimated graphics processor rendering tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may be configured to calculate a difference between the VSYNC pulse period and the estimated graphics processor rendering tasks duration to determine the application processor sleep duration.
At 706, the apparatus may determine the start time for performing the graphics processor rendering tasks for the current frame based on the application processor sleep duration and an end time of the unmodified application processor rendering tasks, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may be configured to determine the start time for performing the graphics processor rendering tasks by calculating a difference between a timestamp corresponding to the end of the unmodified application processor rendering tasks and the application processor sleep duration to determine the start time for performing the graphics processor rendering tasks.
FIG. 8 illustrates an example flowchart 800 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, and/or the example application processor delay component 350. The example flowchart 800 may facilitate estimating a duration for performing the graphics processor rendering tasks for a current frame.
At 802, the apparatus may sample a graphics processor rendering tasks duration for a previous frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may be configured to calculate a difference between a stop time of the graphics processor rendering tasks of the previous frame and a start time of the graphics processor rendering tasks of the previous frame.
At 804, the apparatus may sample graphics processor rendering tasks durations for a quantity of previous frames, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may sample a preconfigured quantity of frames (e.g., five frames, ten frames, etc. ) .
At 806, the apparatus may select a maximum sampled graphics processor rendering tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may determine which of the sampled durations is a longest duration and select the corresponding duration.
At 808, the apparatus may add a padding duration to the selected sampled graphics processor rendering tasks duration to estimate the duration for performing the graphics processor  rendering tasks for the current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may add a buffer duration to account for examples in which the duration of performing graphics processor rendering tasks for the current frame is longer than the estimated duration of performing the graphics processor rendering tasks for the current frame.
However, it should be appreciated that other techniques for estimating the duration of performing the graphics processor rendering tasks for the current frame may additionally or alternatively be used.
FIG. 9 illustrates an example flowchart 900 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example compositing component 316, and/or the example compositing sleep delay component 352. The example flowchart 900 may facilitate selecting the start time for performing compositing tasks.
At 902, the apparatus may estimate a duration for performing the compositing tasks for a current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may be configured to estimate the duration for performing the compositing tasks for the current frame. Example techniques for estimating the duration for performing the compositing tasks for the current frame are described in connection with FIG. 10.
At 904, the apparatus may determine an compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may be configured to calculate a difference between the VSYNC pulse period and the estimated compositing tasks duration to determine the compositing sleep duration.
At 906, the apparatus may determine the start time for performing the compositing tasks for the current frame based on the compositing sleep duration and a timestamp associated with a previous VSYNC pulse, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may be configured to determine the start time for performing the compositing tasks by calculating a difference between the  compositing sleep duration and a timestamp corresponding to the previous VSYNC pulse to determine the start time for performing the compositing tasks.
FIG. 10 illustrates an example flowchart 1000 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example compositing component 316, and/or the example compositing sleep delay component 352. The example flowchart 1000 may facilitate estimating a duration for performing the compositing tasks for a current frame.
At 1002, the apparatus may sample an compositing tasks duration for a previous frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may be configured to calculate a difference between a stop time of the compositing tasks of the previous frame and a start time of the compositing tasks of the previous frame.
At 1004, the apparatus may sample compositing tasks durations for a quantity of previous frames, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may sample a preconfigured quantity of frames (e.g., five frames, ten frames, etc. ) .
At 1006, the apparatus may select a maximum sampled compositing tasks duration, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may determine which of the sampled durations is a longest duration and select the corresponding duration.
At 1008, the apparatus may add a padding duration to the selected sampled compositing tasks duration to estimate the duration for performing the compositing tasks for the current frame, as described in connection with the examples of FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may add a buffer duration to account for examples in which the duration of performing compositing tasks for the current frame is longer than the estimated duration of performing the compositing tasks for the current frame.
However, it should be appreciated that other techniques for estimating the duration of performing the compositing tasks for the current frame may additionally or alternatively be used.
FIG. 11 illustrates an example flowchart 1100 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application 312, the example application rendering component 314, and/or the example application processor delay component 350. The example flowchart 1100 may facilitate initialize the example frame latency reducing techniques disclosed herein for different applications.
At 1102, the apparatus may initialize a display connection for an application, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application rendering component 314 may initialize the display connection for the application 312 after receiving a rendering instruction from the application 312. In some examples, the apparatus may utilize an API to initialize the display connection. For example, EGL, an interface layer between rendering APIs, may provide mechanisms for initializing the display connection. As an illustrative example, the apparatus may call “eglInitialize” to initialize an EGL display connection. However, it should be appreciated that other examples may use additional or alternative APIs and/or interfaces for initializing the display connection.
At 1104, the apparatus may determine whether the application is approved for utilizing the example frame latency reducing techniques disclosed herein, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application rendering component 314 and/or the application processor delay component 350 may compare an identifier associated with the application 312 to applications included in a data structure to determine whether the application 312 is permitted to use the example frame latency reducing techniques disclosed herein.
If, at 1104, the apparatus determines that the application is not permitted to use the example frame latency reducing techniques disclosed herein, then, at 1106, the apparatus may disable a frame latency reducing indicator, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application rendering component 314 and/or the application processor delay component 350 may disable the frame latency reducing indicator.
If, at 1104, the apparatus determines that the application is permitted to use the example frame latency reducing techniques disclosed herein, then, at 1108, the apparatus may enable the frame latency reducing indicator, as described in connection with the examples in FIGs. 3, 4, and/or  5. For example, the application rendering component 314 and/or the application processor delay component 350 may enable the frame latency reducing indicator.
At 1110, the apparatus may register the display connection to receive VSYNC pulse timestamps, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application rendering component 314 and/or the application processor delay component 350 may register the display connection with an operating system so that the application rendering component 314 (and/or the application processor delay component 350) may receive timestamps for VSYNC pulses. It should be appreciated that in some examples, the apparatus may use operating system-level APIs to allow the application to register and to receive VSYNC pulse occurrences.
FIG. 12 illustrates an example flowchart 1200 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example application rendering component 314, and/or the example application processor delay component 350. The example flowchart 1200 may facilitate applying the example frame latency reducing techniques disclosed herein by synchronizing the completion of the graphics processor rendering tasks with a VSYNC pulse.
At 1202, the apparatus may receive an indication that the performing of the application processor rendering tasks is complete, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may determine that the application rendering component 314 is ready to transmit the swap buffer command to the graphics processor 330.
At 1204, the apparatus may determine whether the frame latency reducing indicator is enabled for the application, as described in connection with the examples in FIGs. 3, 4, and/or 5.
If, at 1204, the apparatus determines that the frame latency reducing indicator is not enabled for the application (e.g., the frame latency reducing indicator is disabled) , then control proceeds to 1208 to call the swap buffer command, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may not modify the performing the application processor rendering tasks by including an application processor sleep duration. In some examples, calling the swap buffer command may also include calling the queue buffer command.
If, at 1204, the apparatus determines that the frame latency reducing indicator is enabled for the application, then, at 1206, the apparatus may schedule and apply an application processor sleep duration based on the estimated duration for performing the graphics processor rendering tasks for the current frame, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may utilize the example techniques described in connection with FIGs. 7 and/or 8 to schedule and apply the application processor sleep duration.
At 1208, the apparatus may call the swap buffer command, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the application processor delay component 350 may cause the application rendering component 314 to transmit the swap buffer command to the graphics processor 330 after the application processor sleep duration. In some examples, calling the swap buffer command may also include calling the queue buffer command.
FIG. 13 illustrates an example flowchart 1300 of an example method in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as the example device 300 of FIG. 3 and/or a component of the device 300, such as the example application processor 310, the example compositing component 316, and/or the example compositing sleep delay component 352. The example flowchart 1300 may facilitate applying the example frame latency reducing techniques disclosed herein by synchronizing the completion of the compositing tasks with a VSYNC pulse.
At 1302, the apparatus may save a VSYNC pulse timestamp for a received VSYNC pulse, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the compositing component 316 may be configured to save the VSYNC pulse timestamp when a VSYNC pulse is received from the display 340.
At 1304, the apparatus may receive a queue buffer command, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the compositing component 316 may receive the queue buffer command from the application rendering component 314.
At 1306, the apparatus may determine whether the frame latency reducing indicator is enabled for the application, as described in connection with the examples in FIGs. 3, 4, and/or 5.
If, at 1306, the apparatus determines that the frame latency reducing indicator is not enabled for the application (e.g., the frame latency reducing indicator is disabled) , then control proceeds to 1310 to save a timestamp at a start of the performing of the compositing tasks.
If, at 1306, the apparatus determines that the frame latency reducing indicator is enabled for the application, then, at 1308, the apparatus may schedule and apply an compositing sleep duration based on the saved VSYNC pulse timestamp and estimated compositing task duration, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the compositing sleep delay component 352 may utilize the example techniques described in connection with FIGs. 9 and/or 10 to schedule and apply the compositing sleep duration.
At 1310, the apparatus may save a timestamp at a start of the performing of the compositing tasks, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the compositing component 316 may be configured to save the timestamp associated with the start time of the performing of the compositing tasks.
At 1312, the apparatus may perform the compositing tasks, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the compositing component 316 may be configured to perform the  compositing tasks  416, 516 of FIGs. 4 and 5, respectively.
At 1314, the apparatus may save a timestamp at a completion of the performing of the compositing tasks, as described in connection with the examples in FIGs. 3, 4, and/or 5. For example, the compositing component 316 may be configured to save the timestamp associated with the completion of the performing of the compositing tasks.
FIG. 14 is a block diagram that illustrates an example content generation system 1400 configured to implement one or more techniques of this disclosure. The content generation system 1400 includes a device 1404. The device 1404 may include one or more components or circuits for performing various functions described herein. Aspects of the device 1404 may be implemented by the example device 300 of FIG. 3. In some examples, one or more components of the device 1404 may be components of an SOC. The device 1404 may include one or more components configured to perform one or more techniques of this disclosure. In the example shown, the device 1404 includes a processing unit 1420 and a memory 1424. In some examples, the device 1404 can include a number of additional or alternative components, such as a communication interface 1426, a transceiver 1432, a receiver 1428, a transmitter 1430, a display processor 1427, and a display client 1431.
In the illustrated example of FIG. 14, the processing unit 1420 includes an internal memory 1421. The processing unit 1420 may be configured to perform graphics processing, such as in a graphics processing pipeline 1407. In some examples, aspects of the rendering pipeline may  be implemented by the graphics processing pipeline 1407. In some examples, the device 1404 may include a display processor, such as the display processor 1427, to perform one or more display processing techniques on one or more frames generated by the processing unit 1420 before presentment by the display client 1431. The display processor 1427 may be configured to perform display processing. For example, the display processor 1427 may be configured to perform one or more display processing techniques on one or more frames generated by the processing unit 1420.
Reference to the display client 1431 may refer to one or more displays, such as the example display 340 of FIG. 3. For example, the display client 1431 may include a single display or multiple displays. The display client 1431 may include a first display and a second display. In further examples, the results of the graphics processing may not be displayed on the device (e.g., the first and second displays may not receive any frames for presentment thereon) . Instead, the frames or graphics processing results may be transferred to another device. The display client 1431 may be configured to display or otherwise present frames processed by the display processor 1427. In some examples, the display client 1431 may include one or more of: a liquid crystal display (LCD) , a plasma display, an organic light emitting diode (OLED) display, a projection display device, an augmented reality display device, a virtual reality display device, a head-mounted display, or any other type of display device.
Memory external to the processing unit 1420, such as the memory 1424, may be accessible to the processing unit 1420. For example, the processing unit 1420 may be configured to read from and/or write to external memory, such as the memory 1424. The processing unit 1420 may be communicatively coupled to the memory 1424 over a bus. In some examples, the processing unit 1420 and the memory 1424 may be communicatively coupled to each other over the bus or a different connection.
It should be appreciated that in some examples, the device 1404 may include a content encoder/decoder configured to receive graphical and/or display content from any source, such as the memory 1424 and/or the communication interface 1426. The memory 1424 may be configured to store received encoded or decoded content. In some examples, the content encoder/decoder may be configured to receive encoded or decoded content (e.g., from the memory 1424 and/or the communication interface 1426) in the form of encoded pixel data. In  some examples, the content encoder/decoder may be configured to encode or decode any content.
The internal memory 1421 or the memory 1424 may include one or more volatile or non-volatile memories or storage devices. In some examples, the internal memory 1421 or the memory 1424 may include RAM, SRAM, DRAM, erasable programmable ROM (EPROM) , electrically erasable programmable ROM (EEPROM) , flash memory, a magnetic data media or an optical storage media, or any other type of memory.
The internal memory 1421 or the memory 1424 may be a non-transitory storage medium according to some examples. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that the internal memory 1421 or the memory 1424 is non-movable or that its contents are static. As one example, the memory 1424 may be removed from the device 1404 and moved to another device. As another example, the memory 1424 may not be removable from the device 1404.
The processing unit 1420 may be an application processor, a central processing unit (CPU) , a graphics processor, a graphics processing unit (GPU) , a general purpose GPU (GPGPU) , or any other processing unit that may be configured to perform system processing, such as graphics processing, compute processing, etc. For example, aspects of the application processor 310 and/or the graphics processor 330 may be implemented by the processing unit 1420. In some examples, the processing unit 1420 may be integrated into a motherboard of the device 1404. In some examples, the processing unit 1420 may be present on a graphics card that is installed in a port in a motherboard of the device 1404, or may be otherwise incorporated within a peripheral device configured to interoperate with the device 1404. The processing unit 1420 may include one or more processors, such as one or more microprocessors, GPUs, application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) , arithmetic logic units (ALUs) , digital signal processors (DSPs) , discrete logic, software, hardware, firmware, other equivalent integrated or discrete logic circuitry, or any combinations thereof. If the techniques are implemented partially in software, the processing unit 1420 may store instructions for the software in a suitable, non-transitory computer-readable storage medium (e.g., the internal memory 1421) and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing,  including hardware, software, a combination of hardware and software, etc., may be considered to be one or more processors.
In some aspects, the content generation system 1400 can include a communication interface 1426. The communication interface 1426 may include a receiver 1428 and a transmitter 1430. The receiver 1428 may be configured to perform any receiving function described herein with respect to the device 1404. Additionally, the receiver 1428 may be configured to receive information (e.g., eye or head position information, rendering commands, and/or location information) from another device. The transmitter 1430 may be configured to perform any transmitting function described herein with respect to the device 1404. For example, the transmitter 1430 may be configured to transmit information to another device, which may include a request for content. The receiver 1428 and the transmitter 1430 may be combined into a transceiver 1432. In such examples, the transceiver 1432 may be configured to perform any receiving function and/or transmitting function described herein with respect to the device 1404.
In some examples, the graphical content from the processing unit 1420 for display via the display client 1431 is not static and may be changing. Accordingly, the display processor 1427 may periodically refresh the graphical content displayed via the display client 1431. For example, the display processor 1427 may periodically retrieve graphical content from the memory 1424, where the graphical content may have been updated by the execution of an application (and/or the processing unit 1420) that outputs the graphical content to the memory 1424.
It should be appreciated that while shown as separate components in FIG. 14, in some examples, the display client 1431 (sometimes referred to as a “display panel” ) may include the display processor 1427. Furthermore, in some examples, the processing unit 1420 may include the display processor 1427.
Referring again to FIG. 14, in certain aspects, the processing unit 1420 may be configured to perform frame latency reducing techniques disclosed herein. In the illustrated example of FIG. 14, the processing unit 1420 may include a frame latency reducing component 1498 configured to facilitate reducing frame latency in the rendering pipeline. Aspects of the frame latency reducing component 1498 may be implemented by the device 300 of FIG. 3.
For example, the frame latency reducing component 1498 may be configured to perform first processor rendering tasks for rendering a frame. The example frame latency reducing component 1498 may also be configured to perform second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks. The example frame latency reducing component 1498 may also be configured to perform compositing rendering tasks for rendering the frame at a second start time based on an estimated duration for performing the compositing rendering tasks. The example frame latency reducing component 1498 may also be configured to perform display rendering tasks to display the frame.
In some examples, the example frame latency reducing component 1498 may be configured to synchronize completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks with an occurrence of a same VSYNC pulse. In some examples, the example frame latency reducing component 1498 may be configured to start the performing of the display rendering tasks after the occurrence of the same VSYNC pulse.
In some examples, the example frame latency reducing component 1498 may be configured to select the first start time by estimating the duration for performing the second processor rendering tasks. The example frame latency reducing component 1498 may also be configured to determine a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration. The example frame latency reducing component 1498 may also be configured to determine the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks. In some examples, the example frame latency reducing component 1498 may be configured to estimate the duration for performing the second processor rendering tasks by sampling second processor rendering tasks durations for a quantity of previous frames, selecting a sampled second processor rendering tasks duration, and adding a padding duration to the sampled second processor rendering tasks duration. In some examples, the example frame latency reducing component 1498 may be configured to select the sampled second processor rendering tasks duration by identifying a maximum duration of the sampled second processor rendering tasks durations. In some examples, the example frame latency reducing component 1498 may be configured to  select the sampled second processor rendering tasks duration by calculating an average duration of the sampled second processor rendering tasks durations. In some examples, the example frame latency reducing component 1498 may be configured to sample second processor rendering tasks duration for a first previous frame by calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame.
In some examples, the example frame latency reducing component 1498 may be configured to select the second start time by estimating the duration for performing the compositing rendering tasks, determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration, and determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse. In some examples, the example frame latency reducing component 1498 may be configured to estimate the duration for performing the compositing rendering tasks by sampling compositing rendering tasks durations for a quantity of previous frames, selecting a sampled compositing rendering tasks duration, and adding a padding duration to the sampled compositing rendering tasks duration.
In some examples, the first processor rendering tasks may be performed by an application processor or a CPU. In some examples, the second processor rendering tasks may be performed by a graphics processor or a GPU.
As described herein, a device, such as the device 1404, may refer to any device, apparatus, or system configured to perform one or more techniques described herein. For example, a device may be a server, a base station, user equipment, a client device, a station, an access point, a computer (e.g., a personal computer, a desktop computer, a laptop computer, a tablet computer, a computer workstation, or a mainframe computer) , an end product, an apparatus, a phone, a smart phone, a server, a video game platform or console, a handheld device (e.g., a portable video game device or a personal digital assistant (PDA) ) , a wearable computing device (e.g., a smart watch, an augmented reality device, or a virtual reality device) , a non-wearable device, a display or display device, a television, a television set-top box, an intermediate network device, a digital media player, a video streaming device, a content streaming device, an in-car computer, any mobile device, any device configured to generate graphical content, or any  device configured to perform one or more techniques described herein. Processes herein may be described as performed by a particular component (e.g., a GPU) , but, in further embodiments, can be performed using other components (e.g., a CPU) , consistent with disclosed embodiments.
In one configuration, a method or apparatus for graphics processing is provided. The apparatus may be a processing unit, an application processor, a CPU, a graphics processor, a GPU, a display processor, a DPU, a video processor, or some other processor that can perform display processing. In some examples, the apparatus may be the processing unit 1420 within the device 1404, or may be some other hardware within the device 1404, or another device. The apparatus may include means for performing first processor rendering tasks for rendering a frame. The apparatus may also include means for performing second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks. The apparatus may also include means for performing compositing rendering tasks for rendering the frame at a second start time based on an estimated duration for performing the compositing rendering tasks. The apparatus may also include means for performing display rendering tasks to display the frame. The apparatus may also include means for synchronizing completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks with an occurrence of a same VSYNC pulse. The apparatus may also include means for starting the performing of the display rendering tasks after the occurrence of the same VSYNC pulse. The apparatus may also include means for estimating the duration for performing the second processor rendering tasks. The apparatus may also include means for determining a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration. The apparatus may also include means for determining the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks. The apparatus may also include means for sampling second processor rendering tasks durations for a quantity of previous frames. The apparatus may also include means for selecting a sampled second processor rendering tasks duration. The apparatus may also include means for adding a padding duration to the sampled second processor rendering tasks duration. The apparatus may also include means for selecting the sampled second  processor rendering tasks duration by identifying a maximum duration of the sampled second processor rendering tasks durations. The apparatus may also include means for selecting the sampled second processor rendering tasks duration by calculating an average duration of the sampled second processor rendering tasks durations. The apparatus may also include means for sampling the second processor rendering tasks duration for a first previous frame by calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame. The apparatus may also include means for estimating the duration for performing the compositing rendering tasks. The apparatus may also include means for determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration. The apparatus may also include means for determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse. The apparatus may also include means for sampling compositing rendering tasks durations for a quantity of previous frames. The apparatus may also include means for selecting a sampled compositing rendering tasks duration. The apparatus may also include means for adding a padding duration to the sampled compositing rendering tasks duration.
The subject matter described herein can be implemented to realize one or more benefits or advantages. For instance, the described frame latency reducing techniques can be used by an application processor, a CPU, a graphics processor, a GPU, a display processor, a DPU, or a video processor or some other processor that can perform graphical rendering of a frame. Moreover, the frame latency reducing techniques disclosed herein can improve or speed up data processing or execution. Further, the frame latency reducing techniques herein can improve resource or data utilization and/or resource efficiency. For example, aspects of the present disclosure can reduce frame latency in a rendering pipeline.
In accordance with this disclosure, the term “or” may be interrupted as “and/or” where context does not dictate otherwise. Additionally, while phrases such as “one or more” or “at least one” or the like may have been used for some features disclosed herein but not others, the features for which such language was not used may be interpreted to have such a meaning implied where context does not dictate otherwise.
In one or more examples, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. For example, although the term “processing unit” has been used throughout this disclosure, such processing units may be implemented in hardware, software, firmware, or any combination thereof. If any function, processing unit, technique described herein, or other module is implemented in software, the function, processing unit, technique described herein, or other module may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media, which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, . Disk and disc, as used herein, includes compact disc (CD) , laser disc, optical disc, digital versatile disc (DVD) , floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. A computer program product may include a computer-readable medium.
The code may be executed by one or more processors, such as one or more digital signal processors (DSPs) , general purpose microprocessors, application specific integrated circuits (ASICs) , arithmetic logic units (ALUs) , field programmable logic arrays (FPGAs) , or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor, ” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set) . Various components, modules or units are described in this disclosure to emphasize  functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need realization by different hardware units. Rather, as described above, various units may be combined in any hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (24)

  1. A method of graphics processing, comprising:
    performing first processor rendering tasks for rendering a frame;
    performing second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks;
    performing compositing rendering tasks for rendering frame at a second start time based on an estimated duration for performing the compositing rendering tasks; and
    performing display rendering tasks to display the frame.
  2. The method of claim 1, wherein completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks are synchronized with an occurrence of a same VSYNC pulse.
  3. The method of claim 2, wherein the performing of the display rendering tasks starts after the occurrence of the same VSYNC pulse.
  4. The method of claim 1, wherein selecting the first start time comprises:
    estimating the duration for performing the second processor rendering tasks;
    determining a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration; and
    determining the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks.
  5. The method of claim 4, wherein estimating the duration for performing the second processor rendering tasks comprises:
    sampling second processor rendering tasks durations for a quantity of previous frames;
    selecting a sampled second processor rendering tasks duration; and
    adding a padding duration to the sampled second processor rendering tasks duration.
  6. The method of claim 5, wherein the selecting of the sampled second processor rendering tasks duration comprises identifying a maximum duration of the sampled second processor rendering tasks durations.
  7. The method of claim 5, wherein the selecting of the sampled second processor rendering tasks duration comprises calculating an average duration of the sampled second processor rendering tasks durations.
  8. The method of claim 5, wherein sampling the second processor rendering tasks duration for a first previous frame comprises calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame.
  9. The method of claim 1, wherein selecting the second start time comprises:
    estimating the duration for performing the compositing rendering tasks;
    determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration; and
    determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse.
  10. The method of claim 9, wherein estimating the duration for performing the compositing rendering tasks comprises:
    sampling compositing rendering tasks durations for a quantity of previous frames;
    selecting a sampled compositing rendering tasks duration; and
    adding a padding duration to the sampled compositing rendering tasks duration.
  11. The method of claim 1, wherein the performing of the first processor rendering tasks is performed by an application processor or a central processing unit, and wherein the performing of  the second processor rendering tasks is performed by a graphics processor or a graphics processing unit.
  12. An apparatus for graphics processing, comprising:
    a memory; and
    at least one processor coupled to the memory and configured to:
    perform first processor rendering tasks for rendering a frame;
    perform second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks;
    perform compositing rendering tasks for rendering frame at a second start time based on an estimated duration for performing the compositing rendering tasks; and
    perform display rendering tasks to display the frame.
  13. The apparatus of claim 12, wherein the at least one processor is configured to synchronize completion of the performing of the second processor rendering tasks and completion of the performing of the compositing rendering tasks with an occurrence of a same VSYNC pulse.
  14. The apparatus of claim 13, wherein the at least one processor is configured to start the performing of the display rendering tasks after the occurrence of the same VSYNC pulse.
  15. The apparatus of claim 12, wherein the at least one processor is configured to select the first start time by:
    estimating the duration for performing the second processor rendering tasks;
    determining a first processor sleep duration based on a difference in a VSYNC pulse period and the estimated second processor rendering tasks duration; and
    determining the first start time for performing the second processor rendering tasks based on the first processor sleep duration and an end time of the performing of the first processor rendering tasks.
  16. The apparatus of claim 15, wherein the at least one processor is configured to estimate the duration for performing the second processor rendering tasks by:
    sampling second processor rendering tasks durations for a quantity of previous frames;
    selecting a sampled second processor rendering tasks duration; and
    adding a padding duration to the sampled second processor rendering tasks duration.
  17. The apparatus of claim 16, wherein the at least one processor is configured to select the sampled second processor rendering tasks duration by identifying a maximum duration of the sampled second processor rendering tasks durations.
  18. The apparatus of claim 16, wherein the at least one processor is configured to select the sampled second processor rendering tasks duration by calculating an average duration of the sampled second processor rendering tasks durations.
  19. The apparatus of claim 16, wherein the at least one processor is configured to sample the second processor rendering tasks duration for a first previous frame by calculating a difference between a stop time of the second processor rendering tasks of the first previous frame and a start time of the second processor rendering tasks of the first previous frame.
  20. The apparatus of claim 12, wherein the at least one processor is configured to select the second start time by:
    estimating the duration for performing the compositing rendering tasks;
    determining a compositing sleep duration based on a difference in a VSYNC pulse period and the estimated compositing rendering tasks duration; and
    determining the second start time for performing the compositing rendering tasks based on the compositing sleep duration and a timestamp associated with an occurrence of a previous VSYNC pulse.
  21. The apparatus of claim 20, wherein the at least one processor is configured to estimate the duration for performing the compositing rendering tasks by:
    sampling compositing rendering tasks durations for a quantity of previous frames;
    selecting a sampled compositing rendering tasks duration; and
    adding a padding duration to the sampled compositing rendering tasks duration.
  22. The apparatus of claim 12, wherein the apparatus includes a wireless communication device.
  23. The apparatus of claim 12, further comprising:
    an application processor configured to perform the first processor rendering tasks; and
    a graphics processor configured to perform the second processor rendering tasks.
  24. A non-transitory computer-readable medium storing computer executable code for graphics processing, comprising code to:
    perform first processor rendering tasks for rendering a frame;
    perform second processor rendering tasks for rendering the frame at a first start time based on an estimated duration for performing the second processor rendering tasks;
    perform compositing rendering tasks for rendering frame at a second start time based on an estimated duration for performing the compositing rendering tasks; and
    perform display rendering tasks to display the frame.
PCT/CN2020/072777 2020-01-17 2020-01-17 Methods and apparatus for reducing frame latency WO2021142780A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/072777 WO2021142780A1 (en) 2020-01-17 2020-01-17 Methods and apparatus for reducing frame latency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/072777 WO2021142780A1 (en) 2020-01-17 2020-01-17 Methods and apparatus for reducing frame latency

Publications (1)

Publication Number Publication Date
WO2021142780A1 true WO2021142780A1 (en) 2021-07-22

Family

ID=76863307

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/072777 WO2021142780A1 (en) 2020-01-17 2020-01-17 Methods and apparatus for reducing frame latency

Country Status (1)

Country Link
WO (1) WO2021142780A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130120424A1 (en) * 2011-11-14 2013-05-16 Qualcomm Innovation Center, Inc. Method and apparatus for improved rendering of images
CN107220019A (en) * 2017-05-15 2017-09-29 努比亚技术有限公司 A kind of rendering intent, mobile terminal and storage medium based on dynamic VSYNC signals
CN109474768A (en) * 2017-09-08 2019-03-15 中兴通讯股份有限公司 A kind of method and device improving image fluency
CN110503708A (en) * 2019-07-03 2019-11-26 华为技术有限公司 A kind of image processing method and electronic equipment based on vertical synchronizing signal
CN110520819A (en) * 2017-04-13 2019-11-29 微软技术许可有限责任公司 Real-time frequency controls in frame

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130120424A1 (en) * 2011-11-14 2013-05-16 Qualcomm Innovation Center, Inc. Method and apparatus for improved rendering of images
CN110520819A (en) * 2017-04-13 2019-11-29 微软技术许可有限责任公司 Real-time frequency controls in frame
CN107220019A (en) * 2017-05-15 2017-09-29 努比亚技术有限公司 A kind of rendering intent, mobile terminal and storage medium based on dynamic VSYNC signals
CN109474768A (en) * 2017-09-08 2019-03-15 中兴通讯股份有限公司 A kind of method and device improving image fluency
CN110503708A (en) * 2019-07-03 2019-11-26 华为技术有限公司 A kind of image processing method and electronic equipment based on vertical synchronizing signal

Similar Documents

Publication Publication Date Title
US20230073736A1 (en) Reduced display processing unit transfer time to compensate for delayed graphics processing unit render time
WO2021000220A1 (en) Methods and apparatus for dynamic jank reduction
US20240242690A1 (en) Software vsync filtering
US11625806B2 (en) Methods and apparatus for standardized APIs for split rendering
US20200311859A1 (en) Methods and apparatus for improving gpu pipeline utilization
WO2021151228A1 (en) Methods and apparatus for adaptive frame headroom
US20230074876A1 (en) Delaying dsi clock change based on frame update to provide smoother user interface experience
WO2021142780A1 (en) Methods and apparatus for reducing frame latency
US20220013087A1 (en) Methods and apparatus for display processor enhancement
US20210358079A1 (en) Methods and apparatus for adaptive rendering
WO2021096883A1 (en) Methods and apparatus for adaptive display frame scheduling
US20230169938A1 (en) Video data processing based on sampling rate
WO2021000226A1 (en) Methods and apparatus for optimizing frame response
WO2021056364A1 (en) Methods and apparatus to facilitate frame per second rate switching via touch event signals
WO2021102772A1 (en) Methods and apparatus to smooth edge portions of an irregularly-shaped display
US11238772B2 (en) Methods and apparatus for compositor learning models
WO2021248370A1 (en) Methods and apparatus for reducing frame drop via adaptive scheduling
WO2021196175A1 (en) Methods and apparatus for clock frequency adjustment based on frame latency
WO2021042331A1 (en) Methods and apparatus for graphics and display pipeline management
US12002142B2 (en) Performance overhead optimization in GPU scoping
WO2021232328A1 (en) Methods and apparatus for tickless pre-rendering
US12045910B2 (en) Technique to optimize power and performance of XR workload
WO2023230744A1 (en) Display driver thread run-time scheduling
US20220284536A1 (en) Methods and apparatus for incremental resource allocation for jank free composition convergence
US20240169953A1 (en) Display processing unit (dpu) pixel rate based on display region of interest (roi) geometry

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20914750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20914750

Country of ref document: EP

Kind code of ref document: A1