CN117812346A

CN117812346A - Display equipment and media asset playing method

Info

Publication number: CN117812346A
Application number: CN202310817525.3A
Authority: CN
Inventors: 周彪; 吴耀华
Original assignee: Hisense Electronic Technology Shenzhen Co ltd
Current assignee: Hisense Electronic Technology Shenzhen Co ltd
Priority date: 2023-07-05
Filing date: 2023-07-05
Publication date: 2024-04-02

Abstract

The application provides display equipment and a media asset playing method, wherein target media asset data are loaded in response to a playing instruction input by a user, preset media asset data are played if the decoding time length of the target video data exceeds a decoding time length threshold value until the target video data are decoded, the preset media asset data are canceled to play, and the target video data and the target audio data are played, so that a long-time black screen in the decoding process of the target video data is avoided. If the decoding time of the target video data does not exceed the decoding time threshold, the method does not cause long-time black screen to appear if the target video data is waited to finish decoding and replaying the picture, and therefore preset media data do not need to be played. In this way, whether the preset media data need to be played can be judged according to the actual decoding time of the target video data, so that the display equipment does not need to keep a long-time black screen silent state when the video data need to be played for a long-time type of broadcast television program.

Description

Display equipment and media asset playing method

Technical Field

The invention relates to the technical field of display equipment, in particular to display equipment and a media asset playing method.

Background

Broadcast television programs refer to the basic organization and play-out forms of all play-out contents of a broadcast television station. Broadcast television programs typically include video data and audio data, which are required to be played simultaneously when the broadcast television program is played. After acquiring the video data and the audio data, the display device needs to decode the video data and the audio data before being able to play the video data and the audio data.

After the video data and the audio data are decoded, the chip layer of the display device will upload the decoded status information to the playing software when the decoded video data and audio data are synchronized, where the status information is usually audio and video synchronized status information. After receiving the synchronous state information of the audio and the video, the playing controller of the playing software needs to be opened according to the synchronous state information of the audio and the video, and is used for controlling the peripheral switch for displaying the picture and outputting the sound, and then the display equipment can display the picture and output the sound.

However, in practical applications, although the audio data can be decoded about 1s after the playing is started, the decoding time of the video data is often longer than that of the audio data, if the decoding time of the video data of some broadcast television programs is too long, the playing software is often caused to wait for the audio and video synchronized state message for a longer time, and the corresponding display device can keep a longer-time black-screen silent state. For example, a still pictures type broadcast television program, the first frame video data of which has a decoding time of about 5s, requires the display device to maintain a black silent state of about 5s to wait for audio to be played synchronously.

Disclosure of Invention

The application provides a display device and a media playing method, which can prevent the display device from keeping a long-time black screen silent state when playing a broadcast television program with overlong decoding time required by video data.

In a first aspect, some embodiments of the present application provide a display device, including:

a display;

a controller configured to:

loading target media asset data in response to a play instruction input by a user, wherein the target media asset data comprises target audio data and target video data, and the decoding time required by the target audio data is smaller than the decoding time required by the target video data;

if the decoding time required by the target video data exceeds a decoding time threshold, playing at least preset media data, and when the target video data is decoded, canceling to play the preset media data, and playing the target video data and the target audio data, wherein the preset media data is media data which is stored in the display equipment in advance and does not need decoding;

and if the decoding time required by the target video data does not exceed a preset decoding threshold value, not playing the preset media data, and playing the target video data and the target audio data when the target video data is decoded.

In a second aspect, some embodiments of the present application provide a media asset playing method, including:

As can be seen from the above technical solution, in the media asset playing method and the display device provided in the foregoing embodiments, in response to a playing instruction input by a user, target media asset data is loaded, where the target media asset data includes target audio data and target video data, and a required decoding time period of the target audio data is shorter than a required decoding time period of the target video data, so only a relation between the decoding time period of the target video data and a decoding time period threshold needs to be considered. If the decoding time of the target video data exceeds the threshold value of the decoding time, it means that if the target video data is waited for decoding to finish playing the picture again, a long-time black screen appears, and therefore, the preset media data needs to be played. And canceling playing the preset media data until the target video data is decoded, and playing the target video data and the target audio data, so that a long-time black screen in the process of waiting for decoding the target video data is avoided. If the decoding time of the target video data does not exceed the decoding time threshold, the method does not cause long-time black screen to appear if the target video data is waited to finish decoding and replaying the picture, and therefore preset media data do not need to be played. In this way, whether the preset media data need to be played can be judged according to the actual decoding time of the target video data, so that the display equipment does not need to keep a long-time black screen silent state when the video data need to be played for a long-time type of broadcast television program.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an operation scenario between a display device and a control device provided in some embodiments of the present application;

FIG. 2 is a block diagram of a hardware configuration of a display device provided in some embodiments of the present application;

FIG. 3 is a block diagram of a hardware configuration of a control device provided in some embodiments of the present application;

fig. 4 is a schematic diagram of software configuration in a display device according to some embodiments of the present application;

fig. 5 is an audio/video playing schematic diagram provided in some embodiments of the present application;

fig. 6 is a signaling interaction diagram of an underlying framework performing an audio-video playing process according to some embodiments of the present application;

FIG. 7 is a block diagram of the hardware connections of some functional modules of the display device 200 provided in some embodiments of the present application;

fig. 8 is a flowchart illustrating a method for playing media by the display device 200 according to some embodiments of the present application;

FIG. 9 is a signaling interaction diagram of an audio-video playing process performed by another underlying framework provided in some embodiments of the present application;

FIG. 10 is a signaling diagram illustrating an audio-video playback process performed by another underlying framework according to some embodiments of the present application;

FIG. 11 is a schematic diagram of a user interface of a display device 200 provided in some embodiments of the present application;

FIG. 12 is a schematic diagram of a user interface of yet another display device 200 provided in some embodiments of the present application;

FIG. 13 is a schematic diagram of a user interface of yet another display device 200 provided in some embodiments of the present application;

FIG. 14 is a schematic diagram of a user interface of yet another display device 200 provided in some embodiments of the present application;

FIG. 15 is a schematic diagram of a user interface of yet another display device 200 provided in some embodiments of the present application;

FIG. 16 is a schematic diagram of a user interface of yet another display device 200 provided in some embodiments of the present application;

FIG. 17 is a schematic diagram of a user interface of yet another display device 200 provided in some embodiments of the present application;

FIG. 18 is a schematic diagram illustrating decoding principles of multiple target media data according to some embodiments of the present application;

fig. 19 is a schematic application flow chart of a media playing method according to some embodiments of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of some embodiments of the present application more clear, the technical solutions of some embodiments of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application.

It should be noted that the brief description of the terms in some embodiments of the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the implementation of some embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the function associated with that element.

Fig. 1 is a schematic diagram of an operation scenario between a display device and a control device according to some embodiments of the present application. As shown in fig. 1, a user may operate the display device 200 through the mobile terminal 300 and the control device 100.

In some embodiments, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and achieve the purpose of one-to-one control operation and data communication. The audio/video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.

As also shown in fig. 1, the display device 200 is also in data communication with the server 400 via a variety of communication means. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks.

The display apparatus 200 may additionally provide a smart network television function of a computer support function, including, but not limited to, a network television, a smart television, an Internet Protocol Television (IPTV), etc., in addition to the broadcast receiving television function.

Fig. 2 is a block diagram of a hardware configuration of the display device 200 of fig. 1 provided in some embodiments of the present application.

In some embodiments, display apparatus 200 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, memory, a power supply, a user interface.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for capturing the intensity of ambient light; alternatively, the detector 230 includes an image collector such as a camera, which may be used to collect external environmental scenes, user attributes, or user interaction gestures, or alternatively, the detector 230 includes a sound collector such as a microphone, or the like, which is used to receive external sounds.

In some embodiments, the display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, for receiving an image signal from the controller output, for displaying video content, image content, and components of a menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, communicator 220 is a component for communicating with external devices or servers 400 according to various communication protocol types.

In some embodiments, the controller 250 includes a processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, first to nth interfaces for input/output, and the controller 250 controls the operation of the display device and responds to the user's operation through various software control programs stored on the memory. The controller 250 controls the overall operation of the display apparatus 200.

In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like.

In some embodiments, a user may input a user command through a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI).

In some embodiments, user interface 280 is an interface that may be used to receive control inputs.

Fig. 3 is a block diagram of a hardware configuration of the control device in fig. 1 according to some embodiments of the present application. As shown in fig. 3, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.

The control device 100 is configured to control the display device 200, and can receive an input operation instruction of a user, and convert the operation instruction into an instruction recognizable and responsive to the display device 200, functioning as an interaction between the user and the display device 200.

In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications for controlling the display apparatus 200 according to user's needs.

In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similarly to the control device 100 after installing an application that manipulates the display device 200.

The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller 110 is used to control the operation and operation of the control device 100, as well as the communication collaboration among the internal components and the external and internal data processing functions.

The communication interface 130 enables communication of control signals and data signals with the display device 200 under the control of the controller 110. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.

A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touchpad 142, a sensor 143, keys 144, and other input interfaces.

In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is provided with a communication interface 130 such as: the WiFi, bluetooth, NFC, etc. modules may send the user input instruction to the display device 200 through a WiFi protocol, or a bluetooth protocol, or an NFC protocol code.

A memory 190 for storing various operation programs, data and applications for driving and controlling the control device 100 under the control of the controller. The memory 190 may store various control signal instructions input by a user.

A power supply 180 for providing operating power support for the various elements of the control device 100 under the control of the controller.

Broadcast television programs refer to the basic organization and play-out forms of all play-out contents of a broadcast television station. Broadcast television programs typically include video data and audio data, which are required to be played simultaneously when the broadcast television program is played. Wherein the video data acquired by the display apparatus 200 is data obtained by encoding an original video using a video encoding format, and the audio data acquired by the display apparatus 200 is data obtained by encoding an original audio using an audio encoding format. The audio encoding formats include WAV (Waveform Audio File Format ) format, MIDI (Musical Instrument Digital Interface, musical instrument digital interface) format, MP3 (Moving Picture Experts Group Audio Layer-3) format, WMA (Windows Media Audio) format, and the like. The video encoding formats include MP4 (Moving Picture Experts Group ) format, RMVB (Real Media Variable Bitrate, dynamic variable sampling rate) format, ASF (Advanced Streaming format, advanced stream) format, and the like.

Audio coding techniques are techniques for achieving data compression by removing statistical redundancy in an audio signal, and audio coding algorithms or audio compression algorithms are intended to efficiently store or transmit high-quality audio signals, and the main purpose thereof is to describe an original signal with as few bits as possible and to ensure that the reconstructed signal is not distorted as much as possible. Vocoding is the process of converting an analog speech signal into a digitized speech signal. The process of converting an analog continuous sound signal into a digital signal, known as audio digitization, requires three steps of acquisition, quantization, and encoding. Sampling is the taking of a number of representative sample values from a time continuously varying analog signal to represent the continuously varying analog signal. The quantization process is a combination of dividing the sampled signal into a finite number of segments according to the amplitude of the entire acoustic wave. The sampled, quantized data is not yet a digital signal, and it needs to be converted into digital pulses, a process called encoding.

The decoding process of audio is the inverse of the audio encoding process. When decoding the audio, the coded bit stream is subjected to frame splitting to obtain a data stream and side information, the data stream and the side information are subjected to entropy decoding to obtain frequency domain parameters, and then the frequency domain parameters are subjected to time-frequency inverse transformation to form the reconstructed audio.

Video coding is also known as video compression, and with the increasing demand of users for high definition video, the volume of video data for video multimedia is increasing. These videos are difficult to apply to actual storage and transmission if not compressed. The video has a lot of redundant information, if the video is not coded, the data volume will be very large, and the large data volume will be difficult to directly store or transmit, so compression technology must be used to reduce the code rate. Video coding includes three steps of sampling, coding, and compression. Sampling is the capture of picture data in a video stream, and encoding is the processing of such picture data, compression into smaller packets for transmission.

The same video decoding process is the inverse of the video decoding process. The receiving end opens the received video data packet, restores the video data packet into pictures, and then arranges the pictures frame by frame according to the sequence to form a video stream.

Accordingly, the display apparatus 200 based on the above principle needs to decode video data and audio data after acquiring the video data and audio data, and then can play the video data and the audio data.

Fig. 4 is a schematic software configuration diagram of the display device in fig. 1 provided in some embodiments of the present application, and in some embodiments, the system of the display device 200 may be divided into three layers, namely, an application layer, a middleware layer, and a hardware layer from top to bottom.

The application layer mainly comprises common applications on the television, and an application framework (Application Framework), wherein the common applications are mainly applications developed based on Browser, such as: HTML5 APPs; native applications (Native APPs); common applications of the present application may also include a media asset application, such as a live broadcast application, with which a user may view media assets.

The application framework (Application Framework) is a complete program model with all the basic functions required by standard application software, such as: file access, data exchange, and the interface for the use of these functions (toolbar, status column, menu, dialog box).

Native applications (Native APPs) may support online or offline, message pushing, or local resource access.

The middleware layer includes middleware such as various television protocols, multimedia protocols, and system components. The middleware can use basic services (functions) provided by the system software to connect various parts of the application system or different applications on the network, so that the purposes of resource sharing and function sharing can be achieved. The middleware of the present application further includes a controller for controlling the video player and the audio player to be turned on from the peripheral device, for example, including a play controller, by sending control signals to the play controller.

The hardware layer mainly comprises a HAL interface, hardware and a driver, wherein the HAL interface is a unified interface for all the television chips to be docked, and specific logic is realized by each chip. The hardware layer in the application also comprises a video decoder, an audio decoder, a video output controller and an audio output controller. The video decoder decodes the video data to generate a video data appearance message, the audio decoder decodes the audio data to generate an audio data appearance message, the video output controller outputs the decoded video data to the driving layer after the video data appears, and the audio output controller outputs the decoded audio data to the driving layer after the audio data appears.

The driving mainly comprises: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive, etc. The audio driver of the driving layer drives the audio player to play sound according to the decoded audio data. The video drive of the drive layer drives the video player to play video pictures according to the decoded video data.

Based on the system software framework shown in fig. 4, in some embodiments, in the audio-video playing schematic diagram shown in fig. 5 and the audio-video playing signaling diagram shown in fig. 6, when the video decoder decodes the video data (here, decoding the first frame of video data is completed), the video decoder also reports a monitoring interface through decoding status information, and a video data appearance message is uploaded to a playing controller of the middleware layer. When the audio decoder decodes the audio data (here, the decoding of the first frame of audio data is finished), the audio decoder also reports the monitoring interface through the decoding status information, and throws an audio data appearance message to the playing controller of the middleware layer. The play controller judges whether to control the opening of the display and the loudspeaker of the peripheral device by monitoring the state information of the audio decoder and the video decoder. The playing controller controls the display and the loudspeaker of the peripheral to be opened (calling the audio and video data interface, injecting video data into the display and simultaneously injecting audio data into the power amplifier to play) after receiving the audio and video synchronized information, so that the display plays the picture, the loudspeaker plays the sound, and then the display equipment can display the picture and output the sound of the equipment, thereby realizing the playing of the broadcast television program.

Although the audio data can be decoded about 1s after the playing is started, the decoding time of the video data is often longer than that of the audio data, if the decoding time of the video data of some broadcast television programs is too long, the playing software is often caused to wait for the audio and video synchronized state information for a longer time, and the corresponding display equipment can keep a black screen silent state for a longer time.

For example, a still pictures type broadcast television program, the first frame video data of which has a decoding time of about 5s, requires the display device to maintain a black silent state of about 5s to wait for audio to be played synchronously. The effect presented to the user is that after the user inputs a play instruction, the user waits for about 5s before being able to view the picture and hear the sound on the display device.

In view of the above, some embodiments of the present application provide a display apparatus 200. In order to facilitate understanding of the technical solutions in some embodiments of the present application, the following details of each step are described with reference to some specific embodiments and the accompanying drawings. Fig. 7 is a block diagram of hardware connection of some functional modules of the display device 200 according to some embodiments of the present application. Fig. 8 is a flowchart illustrating a method for playing media by the display device 200 according to some embodiments of the present application.

As shown in fig. 7, the functional modules of the display device 200 according to the embodiment of the present application mainly include: a controller 250, a power supply, a display 260, an audio system, a video decoder, an audio decoder, and a memory. The above-mentioned functional modules are merely modules described for illustration, and are not intended to implement all of the functional modules of the present application.

The controller 250 is a control and signal processing core of the whole display device 200, and is responsible for controlling the system operation of the whole display device 200, including receiving external image signals, decoding the image signals, processing the image quality and outputting the image signals; audio signal input, audio signal processing and audio signal output to the power amplifier 500, the backlight assembly is controlled to work, and the normal work of peripheral devices or devices such as Wi-Fi and Bluetooth is ensured.

The power supply is a power output module of the entire display apparatus 200, and provides power guarantee to all modules of the display apparatus 200. The display 260 is used for displaying video pictures; the audio system is used for playing audio data. The video decoder is configured to decode the received video data, and then transmit the decoded video data to the display 260 for displaying the video picture. The audio decoder is used for decoding the received audio data, then transmitting the decoded audio data to the audio system, and playing sound according to the decoded audio data by the audio system. The memory may store video data and audio data that do not need to be decoded, or video data and audio data that have a shorter decoding time.

Based on the functional modules shown in fig. 7, as shown in fig. 8, the media asset playing method executed by the display device 200 provided in the embodiment of the present application includes the following steps:

step S100: and loading target media asset data in response to a play instruction input by a user, wherein the target media asset data comprises target audio data and target video data, and the decoding time required by the target audio data is smaller than the decoding time required by the target video data.

The application can be applied to a scene that the display device 200 plays a broadcast television program, a scene that the display device 200 plays a network television program, and a scene that the display device 200 acquires media resources by using HDMI (High Definition Multimedia, high definition multimedia interface) external media resource equipment. For example, a set top box, a DVD player, a computer, etc. are connected to the HDMI interface of the display device 200, and then the media assets are acquired from the media asset devices such as the external set top box, the DVD player, the computer, etc., and then the acquired media assets are played on the display device 200.

In the scene of broadcasting the broadcast television program, the broadcasting instruction may be input by selecting the broadcast television program channel, and after the server corresponding to the broadcast television program channel acquires the broadcasting instruction, the server issues the media data corresponding to the broadcasting instruction to the display device 200. In the scene of playing the network television program, a playing instruction can be input on the media resource platform by selecting a certain network media resource, and after the server corresponding to the media resource platform acquires the playing instruction, the server issues media resource data corresponding to the playing instruction to the display device 200. In the scenario of obtaining media assets by using the HDMI external media resource device, a play command may be input by selecting a certain media asset icon from the external device support application, and after obtaining the play command, the external media resource device transmits corresponding media asset data to the display device 200.

In the above-mentioned various scenarios, the media data acquired by the display device 200 is the target media data in the embodiments of the present application. In a different scenario, after the display device 200 acquires the target media asset data, the target media asset data needs to be decoded. The target audio data and the target video data are included in the target media asset data, so that if the target media asset data needs to be decoded, the target video data and the target audio data need to be decoded at the same time.

Before decoding the target audio data and the target video data, respectively, it is also necessary to disassemble the target audio data and the target video data in the target media data, or process the target video data to extract the target audio data from the target video data. Target video data is processed, for example, by FFMpeg (Fast Forward Mpeg, an open source computer program that records, converts, and converts digital audio, video into streams), extracting data in Pulse Code Modulation (PCM) format with a symbol 16 bit small end, in 1600 samples. That is, the target video data and the target audio data in the target media data may be loaded separately, or the target audio data and the target video data may be combined and loaded.

The target audio data acquired by the display apparatus 200 is composed of audio frames. The audio frame is a data block with a fixed or unfixed end length generated after audio data with a certain length are compressed by a certain specific compression algorithm. Such a data block is called a frame. The audio data before the compression process may be PCM (Pulse Code Modulatio, pulse code modulated) data. For an audio file, the audio file, after being encoded and compressed, consists of a large number of audio frames and headers. Wherein the frame header comprises some descriptive information of the audio frame, such as coding type, number of channels, etc.

Video is a continuous sequence of images, consisting of successive frames, one frame being an image. Because of the persistence effect of human eye vision, when a sequence of frames is played at a certain rate, the human eye sees a video that is continuous in motion. Because of the extremely high similarity between successive frames, in order to facilitate storage transmission, it is generally necessary to code compress the original video to remove redundancy in the spatial and temporal dimensions. A process that is essentially diametrically opposed to video encoding is performed for video decoding.

As can be seen from the above description, the audio frame and the video frame are basic units in encoding and decoding in the audio-video technology. Whether audio decoding or video decoding can be time consuming. The target audio data is decoded frame by taking the audio frame as a basic unit, and the same target video data is decoded frame by taking the video frame as a basic unit. As can be seen from the audio/video playback schematic diagram shown in fig. 5, the playback controller receives the audio/video synchronized messages (both audio data and video data have been present) sent by the audio decoder and the video decoder, and then controls the display and the speaker to open. Therefore, the decoding time period required for the target audio data according to the embodiment of the present application refers to the time consumed for decoding the first frame audio frame in the target audio data, and the decoding time period required for the target video data according to the embodiment of the present application refers to the time consumed for decoding the first frame video frame in the target video data.

That is, in the embodiment of the present application, the audio decoder generates the audio data presence message when the decoding of the first frame audio frame is completed, and the video decoder generates the video data presence message when the decoding of the first frame video frame is completed. And when the decoding of the first frame of video frame is completed and the decoding of the first frame of audio frame is completed, generating an audio and video synchronized message.

One frame of audio generally includes only the data content of the present frame, and thus the decoding duration of one frame of audio data is short. But a frame of video frame may be an I-frame (intra-coded frame), a P-frame (predictive-coded frame), a B-frame (bi-predictive-coded frame). An I-frame is an independent frame with all information, and can be independently decoded without referring to other images, and can be simply understood as a still picture. P frames need to reference the previous I frame to be encoded. The difference between the current frame picture and the previous frame (the previous frame may be an I frame or a P frame). The difference defined by the present frame is overlapped by the previously buffered picture when decoding, and the final picture is generated. P frames typically occupy fewer data bits than I frames, but have complex reliabilities to previous P and I reference frames. B frames are also called bi-predictive coded frames, i.e. B frames record the difference between the present frame and the preceding and following frames. That is, in order to decode the B frame, it is necessary to acquire not only the previous buffered picture but also the picture after decoding, and the final picture is acquired by superimposing the previous and subsequent pictures with the present frame data. The B-frame compression rate is high, but the decoding performance is required to be high.

The first frame in the video sequence is always an I frame, but since it is a key frame and is an independent frame with all the information, the decoding duration of the first frame of the target video data is usually longer than the decoding duration of the first frame of the target audio data. Some embodiments of the present application do not consider the case where the required decoding time length of the target audio data is longer than the required decoding time length of the target video data.

Step S200: and if the decoding time required by the target video data exceeds a decoding time threshold, playing at least preset media data, and when the target video data is decoded, canceling playing the preset media data, and playing the target video data and the target audio data, wherein the preset media data is media data which is stored in the display equipment in advance and does not need decoding.

Step S300: and if the decoding time required by the target video data does not exceed a preset decoding threshold value, not playing the preset media data, and playing the target video data and the target audio data when the target video data is decoded.

After the target media asset data is acquired, the display apparatus 200 may determine a decoding time period required for the target audio data and the target video data included in the target media asset data according to the type of the target media asset data. For example, the target asset data itself carries the asset identifier, and then a configuration file is stored in the memory of the display device 200. The profile includes media asset identification and decoding duration required for the target audio data and the target video data. Therefore, the decoding time required by the target audio data and the target video data can be searched from the configuration file according to the media asset identification of the target media asset data while the target media asset data is acquired.

For example, the configuration file includes A, B, C three media asset identifiers, and the decoding time periods required by the target audio data and the target video data corresponding to the media asset identifier a are recorded respectively, and the decoding time periods required by the target audio data and the target video data corresponding to the media asset identifier B are recorded respectively, and the decoding time periods required by the target audio data and the target video data corresponding to the media asset identifier C are recorded respectively. The medium resource identifier A corresponds to one type of medium resource data, the medium resource identifier B corresponds to one type of medium resource data, and the medium resource identifier C corresponds to one type of medium resource data.

If the target media data M1 loaded by the display device 200 carries the media identifier a, it indicates that the target media data M1 belongs to the media identifier a. And then searching the decoding time lengths X1 and Y1 of the corresponding target video data and the target audio data from the configuration file. If the target media asset data M2 loaded by the display device 200 carries the media asset identifier B, it indicates that the target media asset data M1 belongs to the media asset identifier B. And then searching the decoding time lengths X2 and Y2 of the corresponding target video data and the target audio data from the configuration file.

In the embodiment of the application, the decoding time length required by the target video data is firstly positioned and is larger than the decoding time length required by the target audio data, so that only the decoding time length required by the target video data is required to be compared with the decoding time length threshold. For example, for the target media data M1 carrying the media identifier a, the decoding duration X1 required for the target video data is compared with the decoding duration threshold, if the decoding duration X1 required for the target video data exceeds the decoding duration threshold, then if it is required to wait for the target video data and the target audio data to be decoded simultaneously, it is required for the user to wait for a longer time while the display device is in the black silent state.

The specific implementation of the process on the bottom layer is as follows: the decoding time required by the audio decoder to decode the target audio data is Y1, for example, the decoding time required by the target audio data is 1 second, that is, it takes 1 second for the audio decoder to decode the first frame of the target audio data, and the audio data occurrence message can be thrown to the play controller. The decoding time required by the video decoder to decode the target video data is X1, for example, the decoding time required by the target video data is 5 seconds, that is, it takes 5 seconds for the video decoder to decode the first frame of video of the target video data, so that the video data occurrence message can be thrown up to the play controller. That is, the playback controller needs to wait for at least 5 seconds before receiving the audio and video synchronization message. If the play controller receives the audio and video synchronization message, the display and the loudspeaker are controlled to be turned on, and after the user experiences that the play instruction is input, the display device presents a black screen silent state for 5 seconds, and then the picture and the sound can be played.

In the audio/video playing signaling diagram shown in fig. 9, the decoding time required by the playing controller to receive the target video data sent by the chip layer exceeds the decoding time threshold, and the playing controller directly injects preset media data into the peripheral device for playing. And then the audio decoder sends an audio data appearance message to the play controller when the decoding of the first frame of audio data is completed. And when the video decoder finishes decoding the first frame of video data, sending a video data appearance message to the play controller. And then the play controller judges that the audio and video are synchronous according to the audio data occurrence message and the video data occurrence message, and then the target media resource data is injected into the external equipment.

In the audio/video playing signaling diagram shown in fig. 10, the playing controller may receive the decoding duration required by the target video data sent by the chip layer and exceeds the decoding duration threshold, then the playing controller waits for the audio data occurrence message sent by the audio decoder, and the playing controller determines that the audio data occurs and the video data occurs, and injects the target audio data and the preset media data (including only the video data) into the external device. And then, when the video decoder finishes decoding the first frame of video data, sending a video data appearance message to the play controller. Then the play controller stops injecting the preset media data into the peripheral device, and injects the target video data and the target audio data into the peripheral device.

The decoding time period threshold may be a parameter previously stored in the display device 200 or set by the user. As shown in the user interface of fig. 9, the user may set a decoding duration threshold according to the actual usage scenario, and may also set different decoding duration thresholds for different media asset types. For example, in the user interface shown in fig. 9, a decoding duration threshold N1 is set for the target media asset data of the type corresponding to the media asset identifier a, a decoding duration threshold N2 is set for the target media asset data of the type corresponding to the media asset identifier B, and a decoding duration threshold N3 is set for the target media asset data of the type corresponding to the media asset identifier C. The user may set different decoding duration thresholds for different media asset types in the user interface shown in fig. 11, respectively.

After loading the target media asset data, the display device 200 searches the decoding duration required by the corresponding target video data from the configuration file according to the media asset identifier carried by the target media asset data, and then can search the corresponding decoding duration threshold according to the media asset identifier. And compares the decoding time length required for the target video data with a decoding time length threshold.

In the above example, if the found decoding duration threshold is 2 seconds and the decoding duration required for the target video data is 5 seconds, the decoding duration required for the target video data exceeds the decoding duration threshold, and thus the step of playing the preset media asset data may be performed. If the found decoding time length threshold is 6 seconds and the decoding time length required by the target video data is 5 seconds, the decoding time length required by the target video data does not exceed the decoding time length threshold, so that the step of simultaneously playing the target video data and the target audio data can be performed without playing the preset media data.

The specific implementation of the process on the bottom layer is as follows: the decoding time required by the audio decoder to decode the target audio data is Y1, for example, the decoding time required by the target audio data is 1 second, that is, it takes 1 second for the audio decoder to decode the first frame of the target audio data, and the audio data occurrence message can be thrown to the play controller. The decoding time required by the video decoder to decode the target video data is 5 seconds, that is, it takes 5 seconds for the video decoder to decode the first frame of the target video data, so that the video data appearance message can be thrown up to the play controller. That is, the playback controller needs to wait for at least 5 seconds before receiving the audio and video synchronization message.

If the decoding time length threshold is 2 seconds, judging that the decoding time length of the target video data exceeds the decoding time length threshold, wherein the judging process can be executed by a bottom chip, the bottom chip throws the judging result to a play controller, and the play controller can control a display and/or a loudspeaker to be started and then play preset media data. The preset media asset data at least includes one of video data, picture data and audio data.

If the preset media data only includes video data or picture data, the play controller may control only the display to be turned on, that is, only play the preset video data or the preset picture data on the display before playing the target video data and the target audio data. If the preset media data includes video data and audio data, or includes picture data and audio data, the play controller may control the display and the speaker to be turned on, that is, may play preset video data or preset picture data on the display before playing the target video data and the target audio data, and simultaneously play sound by the speaker according to the preset audio data. If the preset media data only includes audio data, the play controller may control only the horn to be turned on, that is, before the target video data and the target audio data are played, sound may be played by the horn according to the preset audio data.

In all three cases, when the decoding time required by the target video data exceeds the decoding time threshold, the preset video data and the preset audio data are played, or the preset video data or the preset audio data are played, so that the display equipment is prevented from keeping the black screen silent state for a long time. It should be noted that, here, the embodiment of the present application is to avoid that the display device keeps the black screen and the silent state for a long time. (avoiding a long-time silent state by playing preset video data and preset audio data simultaneously; avoiding a long-time silent state by playing only preset video data but not requiring a long-time silent state; avoiding a long-time silent state but not requiring a long-time black state by playing only preset audio data).

In some embodiments, the target audio data and the preset media data may be played simultaneously when decoding of the target audio data is completed, but the decoding time required for the target video data exceeds a decoding time threshold. For example, the target media asset data is a still picture type video data type in which a group of pictures is actually played, that is, the target video data is a group of pictures, not a moving picture, and the target audio data is a music type audio data. The user is actually less concerned about the video data to be played, but is actually more concerned about the target audio data, so that the target audio data can be played first (the target audio data and the target video data are not required to be synchronized) and the preset media data can be played. In the user interface shown in fig. 12, the target audio data is played and a still music cover picture is displayed before the target video data is loaded. Or in the user interface shown in fig. 13, before loading the target video data, playing the target audio data, and displaying a prompt message "still pictures program is loading picture …", from which the user can obtain that the target media data being played is of the still pictures type.

The specific implementation of the process on the bottom layer is as follows: the decoding time required by the audio decoder to decode the target audio data is 1 second, that is, it takes 1 second for the audio decoder to decode the first frame of the target audio data, and the audio data appearance message can be thrown to the playing controller. The decoding time required by the video decoder to decode the target video data is 5 seconds, that is, it takes 5 seconds for the video decoder to decode the first frame of the target video data, so that the video data appearance message can be thrown up to the play controller. That is, the playback controller needs to wait for at least 5 seconds before receiving the audio and video synchronization message. If the target media asset data type is determined to be the still pictures type, the play controller can control the display and the loudspeaker to be simultaneously started, the display presents music cover pictures or prompt information, and the loudspeaker plays sound according to the target audio data.

In this case, the display device 200 needs to wait for the target audio data to be decoded after the target audio data is obtained, but may display a music cover picture or a hint information after the target audio data is decoded, or may display the music cover picture or the hint information when the target audio data is obtained, and then play the target audio data after the target audio data is decoded. And if the preset media asset data comprises preset audio data and preset video data, the preset audio data and the preset video data can be simultaneously played when the target media asset data is acquired, and the target audio data does not need to be waited for finishing decoding.

If the decoding time required for the target video data exceeds the decoding time threshold, after at least playing the preset media data, if the target video data is decoded, in order to synchronize the audio data and the video data, there are the following playing cases:

if the target audio data is played while the preset media data is played, the preset media data is cancelled when the target video data is decoded. Then, the target audio data is played from the first frame of the played target audio data, and the target video data is played from the first frame of the played target audio data, so that the synchronous playing effect of the target audio data and the target video data can be realized.

If the target audio data is not played while the preset media data is played, the preset media data is cancelled when the target video data is decoded. Then, the target audio data is played from the first frame of the target audio data, and the target video data is played from the first frame of the video data at the same time, so that the synchronous playing effect of the target audio data and the target video data can be realized.

In some embodiments, if the decoding duration required for the target video data exceeds the decoding duration threshold, but the preset media asset data is not stored in the current system, a prompt message may be presented to the user to prompt the user that the preset media asset data is not stored in the current system. For example, a play instruction input by a user is acquired, and a media asset identifier carried by the play instruction is judged to belong to a media asset type requiring to play preset media asset data, so that the decoding time required by determining target video data does not exceed a decoding time threshold. Meanwhile, it is also determined that preset media data are not stored in the current system, and at this time, as shown in fig. 14, a prompt message "the system does not store preset media data" can be displayed in the user interface, and you confirm that the media needs to be played.

A 'ok' button and a 'cancel' button are also arranged below the prompt message. If the user selects the "ok" button, in response to a determination instruction input by the user by selecting the "ok" button, it is necessary to wait for decoding of both the audio data and the video data, that is, to show the user a long-time silent state of a black screen, the display will appear on the screen, and the speaker will play the sound. If the user selects the "cancel" button, in response to a cancel instruction entered by the user by selecting the "cancel" button, the user interface for selecting the media asset may be jumped back from the user interface shown in FIG. 14, such as jumping back to the media asset platform main page.

In some embodiments, if the decoding duration required for the target video data exceeds the decoding duration threshold, but preset media asset data is stored in the current system, a prompt message may also be displayed to the user to prompt the user that the preset media asset data is stored in the current system, and if the current media asset is selected, the preset media asset data needs to be played before the selected media asset is played. For example, a play instruction input by a user is acquired, and the play instruction is judged to carry a media asset identifier, which belongs to a media asset type requiring to play preset media asset data, so that the decoding time required by determining target video data exceeds a decoding time threshold. Meanwhile, it is determined that preset media data are stored in the current system, and at this time, a prompt message "preset media data are stored in the system" can be displayed in the user interface as shown in fig. 15, and you confirm that the media needs to be played.

A 'ok' button and a 'cancel' button are also arranged below the prompt message. If the user selects the "ok" button, in response to a determination instruction input by the user by selecting the "ok" button, it is necessary to wait for decoding of both the audio data and the video data to be completed, at which time preset media asset data may be presented to the user. If the user selects the "cancel" button, in response to a cancel instruction entered by the user by selecting the "cancel" button, the user interface for selecting the media asset may be jumped back from the user interface shown in FIG. 15, such as jumping back to the media asset platform main page.

In some embodiments, in the setting function of the display device 200, whether to judge whether the decoding duration of the target video data included in the media asset data exceeds the decoding duration threshold if clicking a certain media asset is also set, where the setting may be performed integrally, or may be performed uniformly, or may be performed on a single certain media asset.

For example, the user interface shown in fig. 16 includes an "overall setting" option, a "media asset type" option, and a "media asset name" option. A "ok" button and a "cancel" button are also provided below each option. If the user selects the "ok" button under the "global setting" option, after the user inputs a determination instruction in response to the user input, no matter what kind of media asset is selected to be played by the user, whether the decoding duration required by the target video data exceeds the decoding duration threshold needs to be judged first. And then playing at least the preset media data if the decoding time required by the target video data exceeds the decoding time threshold value, and not playing the preset media data if the decoding time required by the target video data does not exceed the decoding time threshold value.

If the user inputs the media asset identifier A in the input box of the media asset type option, and selects the 'ok' button below the 'media asset type' option, after the user successfully sets the media asset identifier A, the user only needs to judge whether the decoding duration required by the target video data exceeds the decoding duration threshold value or not as long as the user selects the media asset of the type corresponding to the media asset identifier A. And then playing at least the preset media data if the decoding time required by the target video data exceeds the decoding time threshold value, and not playing the preset media data if the decoding time required by the target video data does not exceed the decoding time threshold value. If the user selects other types of media assets, it is not necessary to determine whether the decoding duration required for the target video data exceeds the decoding duration threshold. The process of synchronously playing the target video data and the target audio data can be directly waited for after the decoding of the target video data and the target audio data is completed. The media asset identifier a corresponds to a media asset type, which may be a media asset type that the decoding duration required by the target video data known by the user according to experience exceeds the decoding duration threshold. The user interface shown in fig. 14 only shows one input box for inputting the media asset type, and a plurality of input boxes for inputting the media asset type may be provided in practical application. For example, the media assets of the type corresponding to the media asset identifier a and the media assets of the type corresponding to the media asset identifier B are selected … at the same time, so that if the user selects to play these types of media assets, the user can first determine whether the decoding duration required by the target video data and the target audio data exceeds the decoding duration threshold.

If the user inputs the media asset name XXX in the input box of the media asset name option, and selects the "ok" button below the media asset name option, after the user successfully sets the settings, the user only needs to select to play the media asset with the media asset name XXX or the media asset with the media asset name containing XXX, and needs to judge whether the decoding duration required by the target video data exceeds the decoding duration threshold. And then playing at least the preset media data if the decoding time required by the target video data exceeds the decoding time threshold value, and not playing the preset media data if the decoding time required by the target video data does not exceed the decoding time threshold value. If the user selects to play the media with other names, it is not necessary to determine whether the decoding duration required by the target video data exceeds the decoding duration threshold. The process of synchronously playing the target video data and the target audio data can be directly waited for after the decoding of the target video data and the target audio data is completed.

In some embodiments, if the decoding duration required for the target video data exceeds the decoding duration threshold, at least the preset media asset data is played, but if the target video data is decoded, the preset media asset data is not yet played, and an option of continuing to play the preset media asset data or playing the target media asset data may also be provided.

For example, if the target video data is completely decoded and the preset media data is not completely played at this time, as in the user interface shown in fig. 17, a prompt message "is the target media data completely loaded and is played? ". A 'determination' button and a 'cancel' button are also arranged below the prompt information. If the user selects the 'ok' button, the playback of the preset media asset data is canceled and then the target media asset data is played in response to a ok instruction input by the user. If the user selects the cancel button, in response to a determination instruction input by the user, the play of the preset media asset data is not canceled, but the completion of the play of the preset media asset data is waited for, and then the target media asset data is played.

In some embodiments, if a plurality of preset media asset data are stored in the display device 200, the preset media asset data may be further classified, and the preset media asset data may be further classified according to the media asset type of the target media asset data. For example, the target asset data includes an asset identifier a corresponding type, an asset identifier B corresponding type, and an asset identifier C corresponding type …. When the preset media data is classified, the preset media data can be classified according to the type corresponding to the media asset identifier A, the type corresponding to the media asset identifier B and the type corresponding to the media asset identifier C …. Thus, if the decoding time required by the target video data exceeds the decoding time threshold, preset media data with the same media type can be searched from the system for playing according to the media type of the target video data.

In some embodiments, if the playing instruction input by the user carries the media asset identifier of the plurality of target media asset data, that is, the user may select to play the plurality of target media asset data, the plurality of target media asset data may be played sequentially in the selected order or may be played sequentially in the order of completing the loading. Thus, when a plurality of target media asset data are sequentially played, the intervals of video frames included in different target media asset data may be different. When one of the target media asset data is played, the following target media asset data may not be loaded, and a black silent state may exist for a certain period of time due to different video frame intervals. Therefore, when different target media data are continuously played, not only the first target media data needs to judge whether the decoding time length required by the target video data exceeds the decoding time length threshold value, but also other target media data need to judge whether the decoding time length required by the target video data exceeds the decoding time length threshold value.

For example, the playing instruction carries a media asset identifier a and a media asset identifier B, where the media asset identifier a corresponds to first target media asset data, the media asset identifier B corresponds to second target media asset data, and the first target media asset data includes first target video data and first target audio data, and the second target media asset data includes second target video data and second target audio data. If the playing sequence of the first target media data is before the second target media data, firstly judging whether the decoding time length of the first target video data exceeds a decoding time length threshold value, if so, playing preset media data, and when the decoding of the first target video data is completed, synchronously playing the first target video data and the first target audio data.

And then when the playing of the first target media data is completed, judging whether the difference value between the playing time length of the single-frame video frame of the second target video data and the decoding time length of the first target video data exceeds a decoding time length threshold value. The decoding schematic diagram shown in fig. 18 is because the video decoder starts decoding the first frame video frame of the second target video data when the decoding of the last frame video frame of the first target video data is completed, that is, the first frame video frame of the second target video data is being decoded when the last frame video frame of the first target media data is being played.

The single-frame video frame playing duration of the first target video data is t1. The single-frame video frame decoding duration of the second target video data is t2. If t1 is greater than or equal to t2, it indicates that the last frame of video frame of the first target video data has not been played yet, the first frame of video frame of the second target video data has been decoded, or the first frame of video frame of the second target video data has just been decoded when the last frame of video frame of the first target video data has been played. At this time, no matter whether the decoding time length of the second target video data exceeds the time length threshold value, the state of black screen silence can not occur, and whether the decoding time length of the second target video data exceeds the time length threshold value does not need to be judged.

If t1 is smaller than t2, it indicates that the playing of the last frame of the first target video data is completed, the first frame of the second target video data is not yet decoded, that is, the second target video data cannot be connected to the first target video data. At this time, it is necessary to determine whether the value of t2-t1 exceeds the decoding duration threshold, and if the value of t2-t1 exceeds the decoding duration threshold, the preset media asset data needs to be played. If the value of t2-t1 does not exceed the decoding duration threshold, the preset media data does not need to be played.

Based on the media playing method described in the foregoing embodiment, as shown in the flowchart of fig. 19, the following is a specific application flow of the media playing method provided in the embodiment of the present application, where the flow specifically includes:

s3010: starting the display device 200 to prepare to play the media asset data;

s3011: after the target media asset data is acquired, decoding of the target video data and the target audio data included in the target media asset data is started.

S3012: after the target audio data is decoded, the bottom chip sends an audio data appearance message to the play controller.

S3013: whether the media is still pictures type is judged, if not, step S3014 is executed, and if yes, step S3016 is executed.

S3014: and waiting for the audio and video synchronization message.

S3015: and the play controller receives the audio and video synchronization message, opens audio and video output and plays the target audio data and the target video data.

S3016: and opening the audio output, playing the target audio data, and informing the application to display default pictures or prompt information.

S3017: waiting for a video data presence message, if a video data presence message is received, executing step S3018, and if a video data presence message is not received, continuing executing step S3017.

S3018: and notifying the application to stop displaying the default picture or the prompt information.

S3019: and opening the video output, and playing the target video data.

The same and similar parts of the embodiments in this specification are referred to each other, and are not described herein.

It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied essentially or in parts contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the embodiments or parts of the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, characterized by comprising:

a display;

a controller configured to:

and if the decoding time required by the target video data does not exceed the decoding time threshold, not playing the preset media data, and playing the target video data and the target audio data when the target video data is decoded.

2. The display device of claim 1, wherein the controller executing at least playing preset media asset data is configured to:

and playing the target audio data and the preset media data when the target audio data are decoded, wherein the preset media data at least comprise one of video data, picture data and audio data.

3. The display device of claim 2, wherein the controller executing playing the target video data and the target audio data is configured to:

and playing the target video data and simultaneously starting to play the target audio data from the first frame of the played target audio data, so that the target video data and the target audio data are synchronously played.

4. The display device of claim 1, wherein the controller executing at least playing preset media asset data is configured to:

and playing the preset media data, and not playing the target audio data when the target audio data is decoded.

5. The display device of claim 4, wherein the preset media asset data comprises preset video data and preset audio data.

6. The display device of claim 4, wherein the controller executing playing the target video data and the target audio data is configured to:

and synchronously playing the target video data and the target audio data.

7. The display device according to claim 1, wherein the controller performing the determination of whether the decoding duration required for the target video data exceeds the decoding duration threshold is configured to:

Acquiring a media asset identifier of the target media asset data;

if the media asset identifier belongs to the media asset type of the preset media asset data to be played, determining that the decoding duration required by the target video data exceeds the decoding duration threshold;

and if the media asset identifier does not belong to the media asset type of the preset media asset data to be played, determining that the decoding duration required by the target video data does not exceed the decoding duration threshold.

8. The display device of claim 7, wherein if the required decoding duration of the target video data exceeds a decoding duration threshold, the controller is further configured to:

and displaying prompt information on the display, wherein the prompt information is used for prompting that the media asset type of the current playing media asset of the user belongs to the media asset type of the preset media asset data to be played.

9. The display device according to claim 1, wherein the display device further comprises:

an audio decoder configured to: decoding the target audio data and transmitting an audio data presence message to the controller when the target audio data is completely decoded;

a video decoder configured to: decoding the target audio data and transmitting a video data presence message to the controller when the target video data is completely decoded:

The controller performs the cancel play of the preset media asset data and is configured to:

and when the audio data occurrence message and the video data occurrence message are received, the playing of the preset media data is canceled.

10. A method for playing media assets, which is applied to a display device, comprising: