CN110545460B

CN110545460B - Media file preloading method and device and storage medium

Info

Publication number: CN110545460B
Application number: CN201810530636.5A
Authority: CN
Inventors: 银国徽
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2023-03-24
Anticipated expiration: 2038-05-29
Also published as: CN110545460A

Abstract

The present disclosure provides a media file preloading method, including: displaying a playing window of the player for playing the media file; displaying the playing progress of the media file in the playing window; responding to a playing point reached by the playing progress of the media file in real time, and displaying the identifier of the pre-loaded segmented media file in the playing window; the media files correspond to a plurality of segmented media files, and the playing time of the preloaded segmented media files is later than that of the playing point. The disclosure also provides a preloading device and a storage medium of the media file.

Description

Media file preloading method and device and storage medium

Technical Field

The present disclosure relates to a media file preloading technology, and in particular, to a media file preloading method, apparatus, and storage medium.

Background

When the multimedia information is played through the webpage, the buffering or loading of the multimedia information is completed by the webpage browser; the method is specifically realized in such a way that a webpage browser loads segmented multimedia data from a current playing point to an ending point from the current playing point, and the browser cannot control the size of the cached or loaded multimedia data in the loading process. Thus, when the user selectively views the loaded multimedia data, unnecessary consumption of traffic is caused.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method and an apparatus for preloading a media file, and a storage medium, which can reduce unnecessary consumption of traffic when playing multimedia information.

In one aspect, an embodiment of the present disclosure provides a method for preloading a media file, including:

displaying a playing window of the player for playing the media file;

displaying the playing progress of the media file in the playing window;

responding to a playing point reached by the playing progress of the media file in real time, and displaying the identifier of the pre-loaded segmented media file in the playing window;

the media files correspond to a plurality of segmented media files, and the playing time of the preloaded segmented media files is later than that of the playing point.

In another aspect, an embodiment of the present disclosure provides an apparatus for preloading a media file, including:

the display unit is used for displaying a playing window of the player for playing the media file and displaying the playing progress of the media file in the playing window;

the loading unit is used for responding to a playing point reached by the playing progress of the media file in real time and displaying the mark of the pre-loaded segmented media file in the playing window;

a memory for storing executable instructions;

and the processor is used for realizing the preloading method of the media file by executing the executable instructions stored in the memory.

In another aspect, the present disclosure provides a storage medium storing executable instructions, where the executable instructions are executed by a processor, and are used to implement the method for preloading a media file according to the present disclosure.

In the embodiment of the disclosure, a playing window of a player for playing media files is displayed; displaying the playing progress of the media file in the playing window; and displaying the identifier of the preloaded segmented media file in the playing window in response to the playing point reached by the playing progress of the media file in real time. Therefore, the media files are divided in advance to obtain a plurality of segmented media files, and the identifiers of the pre-loaded segmented media files are displayed in the playing window, so that a user can selectively watch the segmented media files behind the playing point based on the identifiers of the pre-loaded segmented media files; since the player pre-loads the segmented media file after the play point, the consumption of traffic is avoided.

Drawings

FIG. 1 is an alternative schematic construction of a container provided by embodiments of the present disclosure;

fig. 2 is a schematic diagram of an alternative package structure of an MP4 file provided in the embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a media data container storing media data in a media file according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of an alternative packaging structure of an FMP4 file provided by an embodiment of the disclosure;

FIG. 5 is an alternative structural diagram of a preloading device for media files provided by the embodiment of the disclosure;

FIG. 6 is a schematic diagram of an alternative processing flow of a preloading method for media files according to an embodiment of the present disclosure;

FIG. 7A is an alternative diagram of an embodiment of the present disclosure showing a playback window for playing back a media file;

FIG. 7B is another alternative diagram illustrating a display window for playing a media file in accordance with an embodiment of the present disclosure;

FIG. 8A is an alternative diagram of the display of the playing progress of a media file on the playing progress bar of the player according to the embodiment of the disclosure;

FIG. 8B is another alternative diagram illustrating the playing progress of a media file displayed on the playing interface of the player according to the embodiment of the disclosure;

FIG. 9A is an alternative diagram of an embodiment of the present disclosure displaying an identification of a pre-loaded segmented media file in a play window;

FIG. 9B is another alternative diagram illustrating the display of an identification of a pre-loaded segmented media file in a play window according to an embodiment of the present disclosure;

FIG. 9C is yet another alternative diagram of an embodiment of the present disclosure displaying an identification of a pre-loaded segmented media file in a play window;

fig. 10 is a schematic processing flow diagram of another alternative preloading method of a media file applied to a preloading device of the media file according to an embodiment of the disclosure;

fig. 11 is a schematic processing flow diagram of yet another alternative preloading method of a media file applied to a preloading device of the media file according to an embodiment of the disclosure;

FIG. 12 is an alternative flow diagram of encapsulating segmented media files provided by examples of the present disclosure;

FIG. 13 is a schematic diagram of an alternative process for parsing media information from a metadata container according to an embodiment of the present disclosure;

FIG. 14 is a schematic view of a process flow of acquiring media data of a corresponding segmented media file from a server through a network request according to an embodiment of the present disclosure;

fig. 15 is a schematic flowchart of a player sending a segmented media file to a media element of a web page for decoding and playing through a media source extension interface of the web page according to an embodiment of the present disclosure

FIG. 16 is an alternative diagram of a player according to an embodiment of the present disclosure playing a segmented media file through a media source extension interface of a web page;

fig. 17 is a schematic diagram of converting an MP4 file into an FMP4 file and playing the FMP4 file through a media source extension interface according to an embodiment of the disclosure;

fig. 18 is a schematic structural diagram of a preloading device for media files according to an embodiment of the present disclosure.

Detailed Description

For the purpose of making the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be described in further detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present disclosure, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure.

Before the present disclosure is explained in further detail, terms and expressions referred to in the embodiments of the present disclosure are explained, and the terms and expressions referred to in the embodiments of the present disclosure are applied to the following explanations.

1) A media file, a file storing encoded media data (e.g., at least one of audio data and video data) in a container (Box), includes metadata, i.e., data describing the media data, and carries media information to ensure that the media data is decoded correctly.

For example, a file that encapsulates multimedia data in an MP4 container format is called an MP4 file, and typically, the MP4 file stores therein Video data encoded by the Advanced Video Coding (AVC, advanced Video Coding, i.e., h.264) or MPEG-4 (Part 2) specification and Audio data encoded by the Advanced Audio Coding (AAC), although other encoding manners of Video and Audio are not excluded.

2) The container (Box), also called a Box, an object-oriented component defined by a unique type identifier and a length, see fig. 1, is an optional structural diagram of the container provided by the embodiment of the present disclosure, and includes a container Header (Box Header) and container Data (Box Data), which are filled with binary Data to express various information.

The container header includes a size (size) and a type (type), the size indicates a size (also referred to as a capacity or a length herein) of a storage space occupied by the container, the type indicates a type of the container, and fig. 2 is a schematic diagram of an alternative package structure of an MP4 file provided by the embodiment of the present disclosure, and basic container types involved in the MP4 file include a file type container (ftyp box), a metadata container (moov box), and a media data container (mdat box).

The container data portion may store specific data, where the container is referred to as a "data container," and may further encapsulate other types of containers, where the container is referred to as a "container of a container.

3) Track (Track), a time-ordered sequence of related samples (Sample) in a container of media data, which represents a sequence of video frames or a sequence of audio frames for media data, and may also include a subtitle Track synchronized with the sequence of video frames, a set of consecutive samples in the same Track being called a Chunk (Chunk).

4) A file type container, a container for storing the capacity (i.e. length of occupied bytes) and type of a file in a media file, as shown in fig. 2, binary data stored in the file type container describes the type and capacity of the container according to the specified byte length.

5) A metadata container, a container in a media file for storing metadata (i.e., data describing multimedia data stored in the media data container), and information expressed by binary data stored in the metadata container in an MP4 file is referred to as media information.

As shown in fig. 2, the header of the metadata container represents the type of the container as "moov box" using binary data, and the container data part encapsulates the mvhd container for storing the general information of the MP4 file, is independent of the MP4 file, and is related to the playing of the MP4 file, including duration, creation time, modification time, and the like.

The media data container of the media file may include sub-containers corresponding to a plurality of tracks, such as an audio track container (audio track box) and a video track container (video track box), in which references and descriptions of media data of the corresponding tracks are included, and the necessary sub-containers include: a container (denoted tkhd box) for describing the characteristics and overall information of the track (e.g. duration, width, height), and a container (denoted mdia box) for recording media information of the track (e.g. information of media type and sample).

As for the sub-containers packaged in the mdia box, it may include: recording the relevant attributes and content of the track (denoted mdhd box), recording the playing procedure information of the media (denoted hdlr box), describing the media information of the media data in the track (denoted minf box); the minf box is further packaged with a sub-container (denoted as dinf box) for explaining how to locate the media information, and a sub-container (denoted as stbl box) for recording all the time information (decoding time/display time), position information, and codec information of the samples in the track.

Referring to fig. 3, which is a schematic structural diagram of a media data container in a media file for storing media data according to an embodiment of the present disclosure, the time, type, capacity and location of a sample in the media data container can be interpreted by using media information identified from binary data in a stbl box container, and each sub-container in the stbl box is described below.

The stsd box contains a sample description (sample description) table, and there may be one or more description tables in each media file according to different coding schemes and the number of files storing data, and the description information of each sample can be found through the description tables, and the description information can ensure correct decoding of the sample, and different media types store different description information, for example, the description information is the structure of the image in the case of video media.

The stts box stores the duration information of the samples and provides a table to map time (decoding time) and the serial numbers of the samples, and the samples at any time in the media file can be located through the sttx box; the stts box also uses other tables to map the sample size and pointers, where each entry in the table provides the serial number of consecutive samples within the same time offset and the offset of the sample, and increments these offsets to build a complete time-sample mapping table, and the calculation formula is as follows:

DT(n+1)＝DT(n)+STTS(n) (1)

where STTS (n) is the duration of the nth sample, DT (n) is the display time of the nth sample, and the arrangement of the samples is sorted according to the time sequence, so that the offset is always non-negative, DT generally starts with 0, and taking the display time DT (i) of the ith sample as an example, the calculation formula is as follows:

DT(i)＝SUM(for j＝0to i-1 of delta(j)) (2)

the sum of all offsets is the duration of the media data in the track.

The stss box records the sequence number of the key frame in the media file.

The stsc box records the mapping relation between the samples and the blocks for storing the samples, the relation between the serial numbers of the samples and the serial numbers of the blocks is mapped through a table, and the blocks containing the specified samples can be found through table lookup.

The stco box defines the position of each block in the track, expressed in terms of the offset of the starting byte in the media data container, and the length (i.e., the size) relative to the starting byte.

The stsz box records the capacity (i.e., size) of each sample in the media file.

6) A media data container, a container for storing multimedia data in a media file, for example, a media data container in an MP4 file, as shown in fig. 3, a sample is a unit stored in the media data container, and is stored in a block of the media file, and lengths of the block and the sample may be different from each other.

7) The method comprises the steps of segmenting media files, and forming sub files by segmenting the media files, wherein each segmented media file can be independently decoded.

Taking an MP4 file as an example, media data in the MP4 file is divided according to a key frame, the divided media data and corresponding metadata are encapsulated to form a Fragmented MP4 (Fragmented MP 4) file, and the metadata in each FMP4 file can ensure that the media data is correctly decoded.

For example, when converting an MP4 file as shown in fig. 2 into multiple FMP4 files, referring to fig. 4, it is a schematic diagram of an optional packaging structure of FMP4 files provided in this disclosure, and one MP4 file may be converted into multiple FMP4 files, where each FMP4 file includes three basic containers: moov containers, moof containers, and mdat containers.

The moov container includes MP4 file level metadata describing all media data in the MP4 file from which the FMP4 file is derived, such as duration, creation time, and modification time of the MP4 file.

The moof container stores metadata at a segment level for describing media data encapsulated in the FMP4 file where the moof container is located, and ensures that the media data in the FMP4 file can be decoded.

1 moof container and 1 mdat container constitute 1 segment of a segmented MP4 file, and 1 or more such segments may be included in the 1 segmented MP4 file, and the metadata encapsulated in each segment ensures that the media data encapsulated in the segment can be independently decoded.

8) Media resource Extensions (MSE) interface, player-oriented interface implemented in web pages, interpreted by the browser's interpreter during loading in the web page, implemented by executing a front-end programming language (e.g., javaScript), provides the player with the functionality to call the play Media stream of hypertext markup language (HTML) Media elements (Media elements), for example, to implement the play functionality of video/audio using video Element < video > and audio Element < audio >.

9) The streaming media packaging format is a packaging technology for packaging media data into a file of streaming media, the file of the streaming media can be decoded and played without complete downloading and extra transcoding, namely, the packaging technology supports downloading while playing once as is. Typical Streaming media encapsulation format files are, for example, TS media file fragments based on HTTP Live Streaming (HLS) technology, FLV (Flash Video) files, and the like.

10 In non-streaming media encapsulation format), an encapsulation technique that encapsulates media data into a media file and the media file can be decoded and played after being completely downloaded, and a typical file in non-streaming media encapsulation format includes: MP4 files, windows Media Video (WMV) files, advanced Streaming Format (ASF) files, and the like.

It should be noted that the MP4 file does not natively support streaming media playing, but the technical effect of downloading one-pass playing can also be achieved by filling invalid binary data into the transcoded media stream of the player after online transcoding (for example, in the case of full downloading of ftyp container and moov container, the missing part of the filled mdat container is replaced by invalid binary data), and the package format of such file that does not natively support streaming media playing is referred to as non-streaming media package format herein.

The following describes a flow of a player implementing an embodiment of the present disclosure to acquire media data in a given period.

When playing a movie or a track, the player must be able to correctly parse the data stream, obtain the corresponding media data for a certain time and ensure that the piece of media data can be decoded independently.

1. Determining a time interval corresponding to the media data to be acquired, wherein the time interval is a period of time for continuously playing a current playing point, and the time corresponding to the playing point is time measurement relative to a media time coordinate system (taking the playing start time of the media file as a time origin).

2. The stts box is checked to determine the sequence number of the samples for a given period of decoding time.

For audio frame frames, the stts box is checked to determine the sequence number of the audio frame for a given period of decoding time.

For video frames, due to the compression algorithm, if the first frame in a given period is not a key frame, the first frame in the given period needs to be traced back to the key frame before the start time of the given period according to the time sequence to ensure that the frames in the given period can be decoded.

3. The sequence number of the block including the sample is determined by querying the stsc box according to the employed sequence number.

4. The offset of the block is looked up from the stco box.

5. And searching the stsz box according to the serial number of the sample, and finding the offset of the sample in the block and the volume of the sample.

The process of finding key frames implementing the embodiments of the present disclosure is described.

1. The sequence number of samples in a given time is determined.

2. The stss box is checked to find the key frame after this sample.

3. The stsc box is checked to find the block corresponding to the key frame.

4. The offset of the block is extracted from the stco box.

5. The stsz box is used to find the offset of the key frame sample within the block and the key frame size.

First, a media file preloading device that displays a playing window of a player for playing a media file, which implements the embodiment of the present disclosure, is described below; displaying the playing progress of the media file in the playing window; responding to a playing point reached by the playing progress of the media file in real time, and displaying the identifier of the pre-loaded segmented media file in the playing window; the media files correspond to a plurality of segmented media files, and the playing time of the preloaded segmented media files is later than that of the playing point.

The following is a continuation of the description of the structure of the media file preloading device that implements the embodiments of the present disclosure.

Referring to fig. 5, which is an alternative structural diagram of a media file preloading device 100 according to an embodiment of the present disclosure, the media file preloading device shown in fig. 5 includes: at least one processor 150, at least one communication bus 160, a user interface 180, at least one network interface 170, and memory 190. The various components of the media file preloading device 100 are coupled together by a communication bus 160. It will be appreciated that a communication bus 160 is used to enable communications among the components. The communication bus 160 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various busses are labeled in figure 5 as communication bus 160.

The user interface 180 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen. The network interface 170 may include a standard wired interface and the wireless interface may be a WiFi interface.

It is understood that the Memory 190 may be a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory. Memory 190 may also be at least one storage system physically located remote from processor 150.

The method for preloading media files, which is provided by the embodiment of the present disclosure and applied to the preloading device for media files, may be applied to the processor 150, or implemented by the processor 150. The processor 150 may be an integrated circuit chip having signal processing capabilities. In implementation, the different operations in the preloading method of media files applied to the preloading device of media files may be accomplished by integrated logic circuits in the form of hardware or instructions in the form of software in the processor 150. The processor 150 described above may be a general purpose processor, a DSP or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. The processor 150 may implement or execute a media file preloading method, steps and logic block diagram of a media file preloading device applied to the media file preloading device according to the embodiments of the present disclosure. A general purpose processor may be a microprocessor or any conventional processor or the like. The media file preloading method applied to the media file preloading device provided by the embodiment of the disclosure can be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.

By way of example, the software module may be located in a storage medium, which may be a memory 190 as shown in fig. 5, and the processor 150 reads executable instructions in the memory 190 and, in combination with hardware thereof, completes an optional process flow of the media file preloading method applied to the media file preloading device as provided in the embodiment of the present disclosure, as shown in fig. 6, including the following steps:

step S101, displaying a play window of the player for playing the media file.

In some embodiments, the form of the player may be a web-embedded H5 player, or may be a dedicated video playing Application (APP); the webpage can be a webpage of a browser or a webpage of an APP embedded with a browser kernel.

When the player is in the form of an H5 player embedded in a webpage, one playing window can be displayed in one webpage, and more than two playing windows can also be displayed in one webpage; taking the example of displaying two playing windows on one web page, a schematic diagram of displaying the playing windows for playing media files is shown in fig. 7A.

When the player is configured as a dedicated video playing application, a schematic diagram of a playing window for playing a media file is displayed, as shown in fig. 7B. In practical applications, only one play window may be displayed or more than two play windows may be simultaneously displayed on a display interface of an electronic device in which a video play application is installed.

Here, the media file corresponds to a plurality of segmented media files, and when the media file is a file supporting streaming media playing, the media file is a set of a series of segmented media files; when the media file is the HLS streaming media file, the media file is a set of a series of continuous TS files. When the media file is a file that does not support streaming media playing, such as an MP4 file, the segmented media file is a segmented MP4 file formed by repackaging data extracted from the media file, and the segmented MP4 file can be independently decoded and played.

Step S102, displaying the playing progress of the media file in the playing window.

In some embodiments, the playing progress of the media file is displayed on the playing progress bar of the player, as shown in fig. 8A, the media file that is played completely and the media file that is not played are displayed in different colors on the playing progress bar.

In other embodiments, the playing progress of the media file is displayed on the playing interface of the player, as shown in fig. 8B, the playing time corresponding to the current playing point and the total playing duration of the media file are displayed in a blank area on the playing interface of the player.

Step S103, responding to the playing point reached by the playing progress of the media file in real time, and displaying the mark of the pre-loaded segmented media file in the playing window.

In some embodiments, an alternative schematic diagram of the identifier of the preloaded segmented media file is displayed in the playing window, as shown in fig. 9A, the identifier of the preloaded segmented media file is displayed in the playing progress bar of the player, and the display mode is different from the display mode of the segmented media file already played in the progress bar. In specific implementation, different display parameters such as color and transparency can be set for the pre-loaded segmented media files respectively to perform differentiated display.

In other embodiments, another optional schematic diagram of displaying the identifier of the pre-loaded segmented media file in the playing window is shown in fig. 9B, except that different display parameters such as color, transparency, and the like are respectively set for the pre-loaded segmented media file to perform the difference display, the duration of the segmented media file may be prompted on a display interface corresponding to the segmented media file in the player.

In still other embodiments, as shown in fig. 9C, in addition to setting different display parameters such as color, transparency, and the like for the preloaded segmented media files and setting the duration of prompting the segmented media files on the display interface corresponding to the segmented media files in the player, key information of the segmented media files, such as main characters, scenes, and the like, can be displayed on the display interface corresponding to the segmented media files in a text or thumbnail prompting manner. Here, the scene may be displayed in a thumbnail manner, and the character may be displayed in a text manner.

In the embodiment of the present disclosure, the identifier of the segmented media file may exist all the time, or may disappear after the timeout time is reached. The identification of the segmented media file can be prompted only once, or the identification of the segmented media file can be prompted for multiple times, or the prompting function of the identification of the segmented media file can be closed. When the identification of the segmented media file is prompted multiple times, the time interval between two prompts may be preset.

In some embodiments, when the player comprises at least two play windows, displaying an identification of the segmented media file in the play window that takes the focus state, the displayed segmented media file having a play time later than a play time of the play point. The playing point may be a time reached by a skip operation of the playing progress, for example, the original playing point is 20% of the playing progress, and the skipped playing point is 30%; the playing point may also be a time reached by continuous playing, for example, a time reached from 30 th minute to 40 th minute.

Here, in the video playback APP, acquiring a playback window in a focused state means acquiring a latest playback window of a mouse click event or a touch event. In a player embedded in a web page, acquiring a play window in a focused state refers to a play window that is viewed when the player simultaneously opens multiple play windows, for example, one page in an H5 player displays multiple play windows, a part of the play window may be hidden when the page is scrolled, and the window displayed in the page is a focused window; therefore, by displaying the identification of the segmented media file in the playing window in the focusing state, the preloading of the segmented media file in the overall playing window can be avoided, and the flow consumption can be accurately controlled.

In the embodiment of the present disclosure, the number of the preloaded segmented media files may be one or more, and specifically, the number of the preloaded segmented media files may be preset in the following manner:

1. the number of pre-loaded segmented media files is set globally by the player, e.g. the number of pre-loaded segmented media files is set to 10.

2. Setting the number of the pre-loaded segmented media files according to the network condition and the software/hardware capability of the load-bearing player; wherein the software/hardware capabilities include: available bandwidth, available transport traffic, buffer capacity, player version, etc.

Taking network bandwidth as an example, the more the number of the pre-loaded segmented media files is, the more the amount of media data requested to be acquired from the server is after switching the playing points each time, the larger the occupied downlink network bandwidth is, that is, the positive correlation exists between the pre-loaded segmented media files and the occupied downlink network bandwidth, and the number of the segmented media files that can be pre-loaded by the player can be determined based on the positive correlation.

Taking available transmission flow as an example, the number of the pre-loaded segmented media files is positively correlated with the available transmission flow; that is, the larger the available transmission flow is, the more number of segmented media files can be preloaded; the smaller the available transmission flow is, the less number of segmented media files are preloaded to reduce the network load and ensure the transmission performance of the network.

3. The player requests information of the segmented media file from the server, and the server may transform the start time of recording different content units of the media file according to the content representation of the media file to form the segmented information, or record time points at which the viewing frequency tends to be uniform and continuous as the segmented media file according to the viewing frequency of different time points in the media file.

4. And according to the duration of the media files, averagely dividing and setting the media files according to the time granularity.

In some embodiments, the set time granularity is positively correlated with the number of pre-loaded segmented media files; namely, if the duration of the media file is longer, the number of the pre-loaded segmented media files is set to be larger; the media file duration is shorter, and the number of pre-loaded segmented media files is set to be smaller.

Here, the time granularity may be set at one time, i.e., valid, or valid for a certain period of time, or for a certain media file (e.g., a certain type of media file), or may be adaptively determined according to the login user setting.

When the time granularity is determined adaptively, at least two situations are included:

A. the method comprises the steps of determining the data volume which can be preloaded by the player based on the positive correlation between the downlink network bandwidth of the player and the capacity which can be preloaded by the cache, further calculating the playing time length of the preloaded data volume according to the capacity of each frame of video/audio in the media file, and taking the time length as the time granularity.

B. For the situation that the player is embedded in the webpage for playing, the time granularity can be determined based on the negative correlation relationship between the number of playing windows and the time granularity; that is, the larger the number of playing windows, the smaller the time granularity; therefore, the condition that the preloading time of the playing window preloading segmented media file is too long can be avoided, each playing window can be played when being clicked, and the condition of no response can not occur.

The above-mentioned modes can be selected randomly or in a specific order, for example: the information of the segmented media file queried from the server is preferentially used, and if not, the segmented media file is formed according to time granularity.

Another optional processing flow of the method for preloading media files applied to a media file preloading device provided in the embodiment of the present disclosure, as shown in fig. 10, includes the following steps:

step S201, a playing window of the player for playing the media file is displayed.

Step S202, displaying the playing progress of the media file in the playing window.

Step S203, in response to the playing point reached by the playing progress of the media file in real time, displaying the identifier of the preloaded segmented media file in the playing window.

The specific implementation process of steps S201 to S203 in the embodiment of the present disclosure is the same as the specific implementation process of steps S101 to S103 in the above embodiment.

Step S204, receiving the event of stopping playing of the playing window.

In some embodiments, a stop play event is received for a play window triggered by a user through a mouse or through a touch screen.

Step S205, the display of the identifier of the segmented media file later than the real-time playback point is suspended, and the network request of the corresponding segmented media file is suspended.

In some embodiments, upon receiving a stop play event for a play window, ceasing to display the identification of the segmented media file that is later than the real-time point of play; i.e. the identification of the segmented media file after the current play point is not displayed anymore.

When receiving the event of stopping playing of the playing window, the network request of the corresponding segmented media file is stopped; for example, the network request of the corresponding segmented media file may be revoked, or the connection of the network request of the corresponding segmented media file may be maintained, but new media data is not acquired; and when the playing event of the playing window is received, the connection requested by the network is restored, and new media data is acquired. Thus, accurate control of flow consumption can be achieved.

As shown in fig. 11, an optional process flow of the method for preloading a media file applied to a media file preloading device provided in the embodiment of the present disclosure includes the following steps:

step S301, a play window of the player for playing the media file is displayed.

Step S302, displaying the playing progress of the media file in the playing window.

Step S303, in response to the playing point reached by the playing progress of the media file in real time, displaying the identifier of the preloaded segmented media file in the playing window.

The specific implementation process of steps S301 to S303 in the embodiment of the present disclosure is the same as the specific implementation process of steps S101 to S103 in the above embodiment.

Step S304, when the player runs in the mode of embedding the webpage, the segmented media file is sent to the media resource expansion interface of the webpage.

Here, the media extension resource interface is for the player to invoke the media element of the web page to play the segmented media file.

After the segmented Media file is sent to the Media Resource extension interface of the webpage, the Media Resource extension interface gives the segmented Media file a Media Source (Media Source) object as a data Source of a virtual Uniform Resource Locator (URL), and a cache object (Source Buffer) is created as a cache of the Media Source; adding the segmented media files to the cache object; finally, the virtual URL is passed to the media element of the web page to play the segmented media file. The media element includes an audio element and/or a video element.

In some embodiments, when the segmented media file sent to the media asset extension interface by the preloading device of the media file is a video file in a streaming media format, such as HLS and FLV; the segments between the playing start time and the playing end time of the segmented media file in the video file are directly obtained, and each segment can be independently decoded and played.

In other embodiments, when the segmented media file sent to the media resource extension interface by the media file preloading device is a video file in a non-streaming media format, such as a video file in an MP4 format, the media file preloading device first requests the server to obtain media data of the segmented media file in the media file through the network; and calculating new metadata (including the moov at the level of the segmented media file and the moof at the level of the segmented media file) by combining the media information, and encapsulating the new metadata into a container of the segmented media file according to the encapsulation structure of the segmented media file to obtain the corresponding segmented media file.

Here, the media data includes video frames and audio frames.

In one embodiment, the player obtains a segmented media file from the server, and encapsulates the segmented media file according to the media data and metadata describing the media data according to an encapsulation structure of the segmented media file to form a segmented media file that can be used for independent decoding by media elements of the web page.

Referring to fig. 12, fig. 12 is an alternative flow chart of packaging segmented media files provided by the disclosed example, which will be described in conjunction with the steps shown in fig. 12.

Step S401, filling data representing the type and compatibility of the segmented media file into a file type container of the segmented media file.

For example, taking as an example an FMP4 file packaged to form a package structure as shown in fig. 4, the type and length of the container (representing the entire length of the ftyp box) are filled in the header of the ftyp box, which is a file type container of the FMP4 file, and data (binary data) representing that the file type is FMP4 and a compatible protocol is generated by filling in the data portion of the ftyp box.

Step S402, filling metadata indicating the file level of the segmented media file into a metadata container of the segmented media file.

In one embodiment, the metadata describing the media data required to fill the nested structure is calculated from the nested structure of the metadata container in the segmented media file, based on the media data to be filled into the encapsulation structure of the segmented media file.

Still taking fig. 4 as an example, metadata representing the file level of an FMP4 file is calculated and filled into a metadata container of FMP4 (i.e., moov box) in which three containers of mvhd, track, and video extension (mvex) are nested.

The metadata packaged in the mvhd container is used for representing media information related to the playing of the segmented media files, and the media information comprises position, duration, creation time, modification time and the like; the sub-containers nested in the track container represent references and descriptions of corresponding tracks in the media data, for example, a container (denoted as tkhd box) in which characteristics and overall information (such as duration and width) describing the tracks, and a container (denoted as mdia box) in which media information (such as information of media type and sample) of the tracks are nested in the track container.

Step S403, correspondingly filling the extracted media data and the metadata describing the media data into a media data container and a metadata container at a segment level in a segment container of the segmented media file.

In one embodiment, one or more segments (fragments) may be encapsulated in a segmented media file, and for media data to be filled, one or more segmented media data containers (i.e., mdat boxes) of the segmented media file may be filled, and a segment-level metadata container (denoted as moof box) is encapsulated in each segment, wherein the filled metadata is used to describe the media data filled in the segment, so that the segments can be independently decoded.

With reference to fig. 4, for example, filling media data to be filled into 2 segments of the encapsulation structure of the FMP4 file, the media data is filled into each segment; the metadata that needs to be filled into the metadata container (i.e., moof box) of the segmentation level of the corresponding segment is calculated and correspondingly filled into the child containers nested in the moof box, wherein the head of the moof box is called moof box, and the filled binary data is used to indicate the type of the container as "moof box" and the length of the moof box.

In one embodiment of filling data into the corresponding container in steps S401 to S403, when the filling operation is performed, a write operation function of a calling class completes writing and merging of binary data in the memory buffer of the child container, and returns an instance of the class, where the returned instance is used to merge the child container with a child container having a nested relationship.

As an example of the stuffing data, a class MP4 for implementing an encapsulation function is established, and each sub-container in the segmented media file is encapsulated as a static method of class Stream; establishing class streams for realizing binary data operation functions, wherein each class Stream is provided with a memory buffer area for storing binary data to be filled; converting multi-byte decimal data to be padded into binary data by a static method provided by Stream; merging and filling binary data to be filled into the sub-containers in the memory buffer area through a write operation function provided by the Stream-like instance; the static method provided by Stream returns a new Stream instance, and the merging of the current child container and other child containers with nested relation can be realized.

In the above embodiment, before the encapsulation of the segmented media file, metadata of the media data to be filled needs to be calculated, which needs to be calculated in combination with the metadata in the media file to obtain metadata at a segmented media file level (for example, for an FMP4 file, corresponding to metadata filled into a moov box), and metadata at a segmented media file level (for example, for an FMP4 file, corresponding to metadata filled into a moof box).

In the following, an exemplary implementation is described in which metadata encapsulated in a metadata container of a media file is parsed to obtain media information describing media data encapsulated in a media data container of the media file.

In one embodiment of the present disclosure, the media file is an MP4 file, the nested structure of the child containers in the metadata container of the media file is analyzed, and binary data in each child container is read according to the nested structure; and analyzing the media information of the media data represented by each sub-container from the read binary data.

With reference to the structure shown in fig. 2, the moov container of the MP4 file is a nested structure, the nested structure of the sub-containers in the metadata container is parsed, the sub-containers nested in the moov container, such as the mvhd container, the audio track container, and the video track container, are determined, if the sub-containers are also nested with containers, the parsing continues until the sub-containers are parsed, the binary data encapsulated in the corresponding sub-containers are read, and the media message represented by the binary data, such as the sequence number of the key frame in the media file recorded by the stss box, the volume (i.e., size) of each sample in the media file recorded by the stsz box, is parsed.

In one embodiment of the present disclosure, a manner is provided for setting a parser according to a container type, and parsing sub-containers in a metadata container according to the container type to obtain media information, which will be described with reference to fig. 6.

Referring to fig. 13, fig. 13 is an alternative flowchart of parsing media information from a metadata container according to an embodiment of the present disclosure, which will be described with reference to the steps shown in fig. 13.

Step S501 locates the position of the metadata container in the media file.

In one embodiment, by reading binary data from the binary data of the media file that conforms to the container header specification, the offset and size of the metadata container in the media file is located according to the type and length of the container identified in the read binary data.

For example, for binary data of a media file, binary data starting from zero bytes corresponds to a file type container, the starting position of the binary data of the media file is read through the specified length of the container header, the binary data conforming to the specified length of the container header is read, and the type and length of a container located behind the file type container in the media file can be determined by parsing the read binary data.

If the parsed type is a file type container, the length (i.e., capacity) of the metadata container can be parsed, and the offset of the metadata container is the length of the file type container.

If the parsed type is a media data container, the binary data conforming to the canonical length of the container header is continuously read according to the length of the media data container and the length of the classified type container as offsets, so that the length (i.e., the capacity) of the metadata container can be parsed, wherein the length of the metadata container is the sum of the length of the file type container and the length of the media data container.

The packaging sequence of the subsequent containers is not specified except that the initial container in the media file is a file type container, and through the analysis mode, the positions of the file type containers in the media file can be accurately and efficiently located no matter the packaging sequence of the containers in the media file is the file type container, the metadata container and the media data container or the file type container, the media data container and the metadata container.

Step S502, according to the position of the metadata container in the media file, binary data corresponding to the metadata container is obtained from the binary data of the media file.

The position of the metadata container in the media file is represented by an offset and a capacity, the binary data is read from the media file at the position corresponding to the offset until the length of the read binary data conforms to the capacity of the metadata container, and therefore the binary data corresponding to the metadata container is read.

Step S503, sequentially parsing the binary data corresponding to the canonical length of the container header in the binary data of the metadata container to obtain the container type of the sub-container in the metadata container and the length of the container data of the sub-container.

In one embodiment, for the case where multiple sub-containers are nested in the metadata container, the offset of each reading of binary data is the sum of the lengths of the identified sub-containers, and the length of the read binary data conforms to the canonical length of the container header, so that the type and length of the currently processed sub-container can be parsed.

For example, when reading for the first time, the binary data is read from zero bytes of the binary data of the metadata container, and the length of the read binary data conforms to the specified length of the container header, so that the type and length of the first sub-container can be parsed; and in the second reading, the binary data is read by taking the length of the first read sub-container as an offset, and the length of the read binary data conforms to the specified length of the container header, so that the type and the length of the second sub-container can be analyzed.

The binary data is read in the mode, the condition of backspacing caused by multi-reading can not occur, the condition of secondary reading caused by less reading can not occur, and the analysis efficiency and the accuracy can be ensured.

Step S504, a parser of a type corresponding to the container type of the sub-container is called, binary data corresponding to the length of the container data in the unresolved data is sequentially parsed, and the media information represented by the container data is obtained.

In one embodiment, a typical container type nested in the metadata container is pre-marked for indicating whether the container is directly used for packaging binary data or is further packaged with a container, for example, a container is further packaged with a mark such as mvhd box, audio track box and video track box shown in fig. 2, and a container is directly packaged with binary data with a mark such as stts box, stsd box shown in fig. 2.

Setting parsers corresponding to the container types one by one for the container types marked as directly encapsulating the binary data, wherein the parsers are used for parsing the represented media information according to the binary data; in step S504, when the container type of the child container analyzed in step S503 is compared with the pre-marked container type, the following two cases are involved.

Case 1) when the container type of the child container is determined to be pre-marked by comparison and pre-marked for directly encapsulating binary data, a parser corresponding to the container type of the child container is called, and the container data in the child container is parsed by the parser, so that the media information represented by the container data is obtained.

Case 2) when it is determined by the comparison that the container type of the child container is pre-marked and pre-marked for continuously packaging the container, recursively parsing the binary data corresponding to the child container according to the canonical length of the container header in the media file until it is parsed that the container type of the container packaged in the child container is pre-marked and pre-marked for directly packaging the binary data, calling a parser corresponding to the container type of the container packaged in the child container, parsing the binary data byte by byte, the length of the parsed binary data corresponding to the length of the container data of the container packaged in the child container, to obtain the media information represented by the container data of the container packaged in the child container.

In one embodiment, a method for recording media information in a process of parsing a metadata container is described, when binary data corresponding to a standard length of a container header in binary data of the metadata container is sequentially parsed to obtain a container type of a child container in the metadata container, an object is established according to a nesting relationship between the child container and an attributed container and the nesting relationship between the child container and an encapsulated container, when the container type of the child container is pre-marked to be used for directly encapsulating the binary data, an array including the media information is stored in the object established in the corresponding child container, and the stored media information is represented by the container data of the child container.

For example, in fig. 2, when the type of the parsed sub-container is stts box, since the stts box is pre-marked as direct package binary data, an array including media information is stored in an object created corresponding to the stts box, where the media information is duration information represented by container data of the stts box.

In one embodiment, a manner of recording nesting relationships among child containers in a process of parsing a metadata container is described, when binary data corresponding to a canonical length of a container header in binary data of the metadata container is sequentially parsed to obtain a container type of the child container in the metadata container, if the container type is pre-marked as directly encapsulating the binary data, the parsed child container is recorded in a called parser; and setting the recorded examples of the child containers into child container attributes, wherein the child container attributes are included in the containers to which the child containers belong, and are used for describing the nesting relationship between the child containers and the belonged containers.

For example, in fig. 2, when the type of the parsed sub-container is stsd box, since stsd box is pre-marked as directly encapsulating binary data, stsd box is recorded in the parser corresponding to the stsd box, an instance of stsd box is set to the stbl box sub-container attribute, and so on, and finally a plurality of sub-containers nested in stbl box, such as stsd box, stts box, stsc box, etc., are recorded in the sub-container attribute of stsd box.

In one embodiment, when the container type of the sub-container is determined by comparison to be not pre-marked or pre-marked to directly package binary data but a parser of a corresponding type is not called, the binary data corresponding to the sub-container is ignored for parsing, and according to the length of the sub-container, a part of the binary data corresponding to a next sub-container is jumped to for further parsing.

In fact, the user-defined container type can appear in the media file, the progress of the overall analysis of the metadata container can not be influenced in a skipping mode, meanwhile, when the container type of the metadata container changes, the latest metadata container can be compatible and analyzed quickly by adding, deleting and modifying analyzers of corresponding types, and the media file has the characteristic of flexible and quick upgrading.

Taking the example that the preloading device of the media file is a player, the player obtains the media data of the corresponding segmented media file from the server through the network request, as shown in fig. 14, and the method includes the following steps:

step S601, the player determines a first timestamp corresponding to the playing start time of the segmented media file and a second timestamp corresponding to the playing end time of the segmented media file.

Step S602, the player sends a network request to the server to find a first key frame whose decoding time is before the first timestamp and is closest to the first timestamp, and a second key frame whose decoding time is after the second timestamp and is closest to the second timestamp.

And when the video frame corresponding to the first timestamp is a key frame, the first key frame is the video frame corresponding to the first timestamp. And when the video frame corresponding to the second timestamp is a key frame, the second key frame is the video frame corresponding to the second timestamp.

Step S603, the server extracts the video frame between the first key frame and the second key frame from the media file stored in the server, and sends the video frame to the player.

In step S604, the player sends a network request to the server to search for a first audio frame whose decoding time is before the decoding time of the first key frame and closest to the decoding time of the first key frame, and a second audio frame whose decoding time is after the decoding time of the second key frame and closest to the decoding time of the second key frame.

Here, the alignment of the video frame and the audio frame is realized based on the decoding time, and the situation that there is no sound or sound and no picture in the video file is avoided.

Step S605, the server extracts the audio frame between the first audio frame and the second audio frame from the media file stored in the server, and sends the audio frame to the player.

The video frame extracted in step S603 and the audio frame extracted in step S605 together constitute media data.

It should be noted that, in the above embodiments of the present disclosure, the method for preloading a media file is implemented by various types of players on a media file preloading device, and the media file preloading device may be various terminal devices such as a desktop computer and a notebook computer.

The process of the player sending the segmented media file to the media source extension interface of the web page to the media element of the web page for decoding and playing will be described continuously.

Referring to fig. 15, it is a schematic flowchart of a process that a player sends a segmented media file to a media element of a web page through a media source extension interface of the web page for decoding and playing, and the steps shown in fig. 15 will be described.

Step S701, the player adds the segmented media file to the media source object in the media resource extension interface.

Referring to fig. 16, fig. 16 is an optional schematic diagram of a player playing a segmented Media file through a Media Source extension interface of a web page according to an embodiment of the present disclosure, where when the player receives a play event of the Media file at a play window (corresponding to the play window) in the web page, the player creates a Media Source object by executing a Media Source method; executing an addSource buffer method packaged in a media source expansion interface to create a buffer of a MediaSource object, namely a Source buffer (Source buffer) object, wherein one MediaSource object has one or more Source buffer objects, and each Source buffer object can be used for corresponding to a playing window in a webpage and is used for receiving a segmented media file to be played in the window.

In the process of playing the media file, a Parser (Parser) in the player continuously constructs a new segmented media file by parsing newly acquired media data, and adds the segmented media file to a SourceBuffer object of the same SourceBuffer object by executing an appndbuffer method of the SourceBuffer object.

Step S702, the player calls the media resource extension interface to create a virtual address corresponding to the media source object.

For example, the player executes the createObjectURL method encapsulated in the media source extension interface, and creates a virtual address, i.e. a virtual URL, of the corresponding media source object, in which the Blob-type segmented media file is encapsulated.

In addition, the player sets the MediaSource object as the source (src) attribute of the virtual URL, i.e., binds the virtual URL to a media element in the web page, such as a video/audio element, which is also referred to as associating the media source object to the media element in the web page.

Step S703, the player transmits a virtual address to the media element of the web page, where the virtual address is used for the media element to play with the media source object as a data source.

For example, the player includes a statement that calls the media element to play the virtual URL, such as: < audio > virtual URL. When the browser interprets the corresponding statement in the player embedded in the webpage, the media element of the browser reads the segmented media file from the SourceBuffer object bound by the virtual URL, and decodes and plays the segmented media file.

Next, a process of converting the MP4 file into the FMP4 file and playing the FMP4 file on the web page through the media source extension interface by the player will be described.

Referring to fig. 17, fig. 17 is a schematic diagram of converting an MP4 file into an FMP4 file and playing the FMP4 file through a media source extension interface according to an embodiment of the present disclosure, where a player requests to acquire media data in a part of the MP4 file from a server based on a real address (http:// www.toutiao.com/a/b.mp 4) of the media file, for example, data whose decoding time is in a given period for a subsequent playing point.

The player constructs an FMP4 file based on the acquired media data, and then adds the FMP4 file to a SourceBuffer object corresponding to the mediaSource object, and because the virtual URL is bound to the mediaSource object, when a code for calling the audio/video element by the player is executed, the audio/video element reads a new FMP4 file which is continuously added from the SourceBuffer object of the mediaSource object, and decodes the new FMP4 file to realize continuous playing of the media file.

Based on the method for preloading a media file, an embodiment of the present disclosure further provides a device for preloading a media file, as shown in fig. 18, the device 800 for preloading a media file includes:

a display unit 801, configured to display a play window of a player for playing a media file, and display a play progress of the media file in the play window;

a loading unit 802, configured to display, in response to a play point reached in real time by a play progress of the media file, an identifier of a preloaded segmented media file in the play window;

In some embodiments, the display unit 801 is configured to display an identifier of a preloaded segmented media file in a playing progress bar of the player, where the identifier is displayed in a manner different from that of the segmented media file already played in the progress bar.

In some embodiments, the display unit 801 is configured to display the identifier of the segmented media file in the play window in which the focus state is obtained when the player includes at least two play windows, and the play time of the displayed segmented media file is later than the play time of the play point.

In some embodiments, the apparatus 800 for preloading media files further includes:

a stopping unit 803, configured to, when a play stop event of the play window is received, stop displaying the identifier of the segmented media file later than the real-time play point, and stop the network request of the corresponding segmented media file.

In some embodiments, the apparatus 800 for preloading media files further comprises:

a sending unit 804, configured to send the segmented media file to a media resource extension interface of the web page when the player runs in an embedded manner in the web page, where the media resource extension interface is used for the player to call a media element of the web page to play the segmented media file.

an obtaining unit 805, configured to obtain, when the media file is in a non-streaming media package format, media data corresponding to the segmented media file in the media file from a server through a network request;

and packaging the acquired media data and the corresponding metadata into a container of the segmented media file to obtain the corresponding segmented media file.

In some embodiments, the obtaining unit 805 is configured to determine a first timestamp corresponding to a playing start time of the segmented media file, and a second timestamp corresponding to a playing end time of the segmented media file;

searching a first key frame with a decoding time before the first timestamp and closest to the first timestamp, and a second key frame with a decoding time after the second timestamp and closest to the second timestamp;

extracting video frames between the first key frame and the second key frame from the media file.

In some embodiments, the obtaining unit 805 is configured to find a first audio frame whose decoding time is before the decoding time of the first key frame and closest to the decoding time of the first key frame, and a second audio frame whose decoding time is after the decoding time of the second key frame and closest to the decoding time of the second key frame;

extracting audio frames between the first audio frame and the second audio frame from the media file.

The embodiment of the present disclosure further provides a device for preloading media files, including:

a memory for storing executable instructions;

In the embodiment of the present disclosure, the preloading device of the media file may be implemented as a player, and the form of the player may be an H5 player embedded in a web page, or may be a dedicated video playing APP.

The embodiment of the disclosure also provides a storage medium, wherein the storage medium stores executable instructions, and the computer executable instructions are used for executing the preloading method of the media file.

In summary, the embodiments of the present disclosure have the following technical effects:

1. dividing a media file in advance to obtain a plurality of media file segments, and displaying the identifier of the preloaded segmented media file in a playing window, so that a user can selectively watch the media file segments behind a playing point based on the identifier of the preloaded segmented media file; since the player pre-loads the media file segment after the play point, consumption of traffic is avoided.

2. When a given time interval of the media file needs to be played, only the media data of a given time needs to be extracted from the media file in the non-streaming media format and packaged into the segmented media file which can be independently decoded, and through the mode, on one hand, the limitation that the file in the non-streaming media format can be independently played after being completely downloaded is overcome, and the playing real-time performance is good; on the other hand, because the segmented media file is only required to be constructed for a given time period, rather than the complete media file is converted into the streaming media format in advance, the conversion delay is small, and therefore the segmented media file does not need to be stored in advance, the original media file does not occupy additional storage space, and the occupation of the storage space is obviously reduced.

3. The media data in the media file in the non-streaming media format is converted into the segmented media file, and the segmented media file is sent to the media element of the webpage for decoding and playing through the media source expansion interface of the webpage, so that the player can play the media file in the non-streaming media format through the embedded webpage, and the limitation that the file in the non-streaming media packaging format can be independently played after being completely downloaded is overcome.

4. The player acquires partial media data among the key frames of the media file, and the control of loading of the media data in the process of playing the media file is realized.

5. The segmented media file obtained by encapsulation is based on partial media data of the obtained media file, but not all data of the media file, so that conversion delay is small, pre-storage is not needed, no additional storage space is occupied except for the original media file, occupation of the storage space is remarkably reduced, black screen or blockage can not occur when resolution ratio switching is carried out in the watching process of a user, and instantaneity of resolution ratio switching is improved.

6. The media elements of the webpage acquire the segmented media file based on the virtual address for decoding and playing, and not acquire and play the media data based on the real address of the media file, so that the real address of the MP4 file is protected.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for preloading a media file, comprising:

displaying a playing window of the player for playing the media file;

displaying the playing progress of the media file in the playing window;

the media files correspond to a plurality of segmented media files, and the playing time of the preloaded segmented media files is later than that of the playing point;

the method further comprises the following steps:

when the player runs in a mode of being embedded in a webpage, sending the segmented media file to a media resource extension interface of the webpage, wherein the media resource extension interface is used for enabling the player to call media elements of the webpage to play the segmented media file;

when the media file is in a non-streaming media packaging format, acquiring media data corresponding to the segmented media file in the media file through a network request; packaging the acquired media data and the corresponding metadata into a container of a segmented media file to obtain a corresponding segmented media file, wherein the corresponding metadata comprises: segmented media file-level metadata and segmented-level metadata in a segmented media file.

2. The method of claim 1, wherein displaying the identification of the preloaded segmented media files in the play window comprises:

and displaying the identifier of the preloaded segmented media files in a playing progress bar of the player, wherein the display mode is different from that of the segmented media files already played in the progress bar.

3. The method of claim 1, wherein displaying the identification of the preloaded segmented media files in the play window comprises:

when the player comprises at least two play windows,

and displaying the mark of the segmented media file in the playing window in which the focusing state is obtained, wherein the playing time of the displayed segmented media file is later than that of the playing point.

4. The method of claim 1, further comprising:

when a stop play event of the play window is received,

and stopping displaying the identification of the segmented media files later than the real-time playing point and stopping the network request of the corresponding segmented media files.

5. The method of claim 1, wherein the requesting, via the network, media data corresponding to the segmented media file from the media file comprises:

determining a first time stamp corresponding to the playing starting time of the segmented media file and a second time stamp corresponding to the playing ending time of the segmented media file;

6. The method of claim 5, further comprising:

searching a first audio frame with the decoding time before the decoding time of the first key frame and closest to the decoding time of the first key frame, and a second audio frame with the decoding time after the decoding time of the second key frame and closest to the decoding time of the second key frame;

7. An apparatus for preloading media files, comprising:

the preloading device of the media file further comprises:

a sending unit, configured to send the segmented media file to a media resource extension interface of the web page when the player runs in an embedded manner in the web page, where the media resource extension interface is used for the player to call a media element of the web page to play the segmented media file;

an obtaining unit, configured to obtain, through a network request, media data corresponding to the segmented media file in the media file when the media file is in a non-streaming media package format; packaging the acquired media data and the corresponding metadata into a container of a segmented media file to obtain a corresponding segmented media file, wherein the corresponding metadata comprises: segmented media file-level metadata and segmented-level metadata in a segmented media file.

8. The apparatus of claim 7,

the display unit is used for displaying the mark of the pre-loaded segmented media file in the playing progress bar of the player, and the display mode is different from the segmented media file which is played in the progress bar.

9. The apparatus of claim 7,

the display unit is used for displaying the mark of the segmented media file in the playing window in the focusing state when the player comprises at least two playing windows, and the playing time of the displayed segmented media file is later than that of the playing point.

10. The apparatus of claim 7, further comprising:

a stopping unit for stopping playing of the playing window when receiving a stop event,

11. The apparatus of claim 7,

the acquisition unit is used for determining a first time stamp corresponding to the playing starting time of the segmented media file and a second time stamp corresponding to the playing ending time of the segmented media file;

12. The apparatus of claim 11,

the acquisition unit is used for searching a first audio frame with decoding time before the decoding time of the first key frame and closest to the decoding time of the first key frame, and a second audio frame with decoding time after the decoding time of the second key frame and closest to the decoding time of the second key frame;

13. An apparatus for preloading media files, comprising:

a memory for storing executable instructions;

a processor for implementing a method of preloading media files as claimed in any one of claims 1 to 6 by executing executable instructions stored in said memory.

14. A storage medium, characterized in that it stores executable instructions that, when executed, implement a method for preloading media files according to any one of claims 1 to 6.