WO2022227329A1

WO2022227329A1 - Media file generation method and device, and media file playback method and device

Info

Publication number: WO2022227329A1
Application number: PCT/CN2021/111384
Authority: WO
Inventors: 段君; 李东朔; 徐灿
Original assignee: 北京优幕科技有限责任公司
Priority date: 2021-04-28
Filing date: 2021-08-09
Publication date: 2022-11-03
Also published as: CN112969043B; CN112969043A

Abstract

The present invention provides a media file generation method and device, and a media file playback method and device. The generation method comprises: obtaining a picture sequence; recording audio data, presenting the picture sequence to a producer by means of a display interface in the recording process, obtaining picture switching data according to a switching action of the producer to the picture sequence, obtaining graffiti data according to a graffiti action of the producer in the display interface, and obtaining text addition data according to a text addition operation of the producer in the display interface, wherein the picture switching data, the graffiti data, and the text addition data respectively at least comprise time information based on the recording process; and packaging the picture switching data, the graffiti data, the text addition data, the audio data, the picture sequence and/or address information thereof into a media file.

Description

媒体文件生成、播放方法及设备Media file generation and playback method and device

技术领域technical field

本发明涉及媒体文件编辑和播放领域，具体涉及一种媒体文件生成、播放方法及设备。The invention relates to the field of editing and playing media files, in particular to a method and device for generating and playing media files.

背景技术Background technique

传统的知识内容在组织成数字格式时，通常以视频录制的方式。对于知识内容，视频是一种非结构化的组织形式。举例来说，一堂课的知识内容通常存储在一个或多个时长在几分钟到几小时的视频文件。课程录制形式通常就是视频采集，即录像。Traditional knowledge content, when organized into digital format, is usually recorded as video. For knowledge content, video is an unstructured form of organization. For example, the knowledge content of a class is usually stored in one or more video files with durations ranging from several minutes to several hours. The form of course recording is usually video collection, that is, video recording.

视频文件是通用格式，没有为内容做特定优化，所以视频文件消耗的存储空间较大，但是信息密度较低。视频文件无法将结构化的内容提取出来单独使用或者再次加工复用，比如图片、圈划点评、讲解语音等都混合在一起，也不便于知识内容的检索、查询、以及由计算机程序进行加工处理。Video files are general-purpose formats without specific optimizations for content, so video files consume more storage space but have lower information density. Video files cannot extract structured content for separate use or re-processing and reuse. For example, pictures, circle comments, and explanation voices are mixed together, and it is not convenient for knowledge content retrieval, query, and processing by computer programs. .

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明提供一种媒体文件生成方法，包括：In view of this, the present invention provides a method for generating media files, including:

获取图片序列；Get image sequence;

录制音频数据，在录制过程中通过显示界面向制作者呈现所述图片序列，根据制作者对所述图片序列的切换动作获取图片切换数据、根据制作者在显示界面中的涂鸦动作获取涂鸦数据、根据制作者在显示界面中添加文字的操作获取文字添加数据，其中所述图片切换数据、所述涂鸦数据、所述文字添加数据分别至少包括基于录制进程的时间信息；Recording audio data, presenting the picture sequence to the producer through the display interface during the recording process, obtaining picture switching data according to the producer's switching action on the picture sequence, obtaining graffiti data according to the producer's graffiti action on the display interface, The text addition data is obtained according to the operation of the producer adding text in the display interface, wherein the picture switching data, the graffiti data, and the text addition data respectively include at least time information based on the recording process;

将所述图片切换数据、所述涂鸦数据、所述文字添加数据、所述音频数据、所述图片序列和/或其地址信息封装为媒体文件。The picture switching data, the graffiti data, the text adding data, the audio data, the picture sequence and/or its address information are encapsulated into a media file.

可选地，在完成录制音频数据后，还包括：Optionally, after completing the recording of the audio data, the method further includes:

根据制作者的操作回放所述音频数据；Playing back the audio data according to the producer's operation;

在回放过程中通过显示界面向制作者呈现所述图片序列，根据制作者对所述图片序列的切换动作获取图片切换数据、根据制作者在显示界面中的涂鸦动作获取涂鸦数据、根据制作者在显示界面中添加文字的操作获取文字添加数据，其中所述图片切换数据、所述涂鸦数据、所述文字添加数据分别至少包括基于回放进程的时间信息。In the playback process, the picture sequence is presented to the producer through the display interface, the picture switching data is obtained according to the switch action of the producer on the picture sequence, the graffiti data is obtained according to the producer's graffiti action in the display interface, and the graffiti data is obtained according to the producer's graffiti action on the display interface. The operation of adding text in the display interface obtains text adding data, wherein the picture switching data, the graffiti data, and the text adding data respectively include at least time information based on the playback process.

本发明还提供一种媒体文件播放方法，包括：The present invention also provides a media file playing method, comprising:

获取媒体文件，其中包括音频数据、图片序列、图片切换数据、涂鸦数据、文字添加数据和/或其地址信息，所述图片切换数据、所述涂鸦数据、所述文字添加数据中分别至少包括基于所述音频数据录制进程的时间信息；Acquire media files, including audio data, picture sequences, picture switching data, graffiti data, text addition data and/or address information thereof, wherein the picture switching data, the graffiti data, and the text addition data respectively include at least one based on Time information of the audio data recording process;

解析所述媒体文件，播放所述音频数据；Parse the media file, and play the audio data;

在播放过程中根据所述时间信息在显示界面中显示所述图片序列中的图片以及涂鸦和文字。During the playing process, pictures, graffiti and texts in the picture sequence are displayed on the display interface according to the time information.

可选地，所述图片切换数据中的时间信息包括制作者在录制进程中切换图片的时间点；所述图片切换数据还包括切换后的图片在所述图片序列中的序号。Optionally, the time information in the picture switching data includes a time point when the producer switches pictures during the recording process; the picture switching data further includes the sequence number of the picture after the switch in the picture sequence.

可选地，所述涂鸦数据还包括涂鸦轨迹在所述显示界面中的位置信息，所述涂鸦数据中的时间信息包括所述涂鸦轨迹的出现时间和消失时间。Optionally, the graffiti data further includes location information of the graffiti track in the display interface, and the time information in the graffiti data includes the appearance time and disappearance time of the graffiti track.

可选地，所述涂鸦数据还包括所述涂鸦轨迹的颜色信息和/或宽度信息。Optionally, the graffiti data further includes color information and/or width information of the graffiti track.

可选地，在获取涂鸦数据的过程中，针对所述涂鸦轨迹的位置信息进行稀疏采样，用于压缩所述位置信息的数据量。Optionally, in the process of acquiring the graffiti data, sparse sampling is performed on the location information of the graffiti track, so as to compress the data amount of the location information.

可选地，所述文字添加数据的时间信息包括文字的出现时间和消失时间；所述文字添加数据还包括文字在所述显示界面中的位置信息。Optionally, the time information of the text addition data includes the appearance time and the disappearance time of the text; the text addition data also includes position information of the text in the display interface.

可选地，所述文字添加数据还包括文字的颜色信息和/或尺寸信息。Optionally, the text addition data further includes color information and/or size information of the text.

可选地，所述媒体文件包括头文件、元件索引、帧索引数据、元件数据、帧数据和所述音频数据，其中头文件包括版本信息、头长度信息，以及元件索引的长度信息、帧索引数据的长度信息、元件数据的长度信息和帧数据的长度信息；元件索引包括各个元件数据的位置信息；帧索引数据包括帧数据对应的播放时间和位置信息；元件数据为所述图片切换数据、所述涂鸦数据、所述文字添加数据和所述图片序列；帧数据包括关键帧数据和变化帧数据，所述关键帧数据是指在当前画面相对于上一帧有较大变化时，针对当前画面所建立的帧数据；所述变化帧数据是指相对于最近的上一个关键帧中的元件的差异属性该。Optionally, the media file includes a header file, an element index, frame index data, element data, frame data and the audio data, wherein the header file includes version information, header length information, and length information of the element index, frame index The length information of the data, the length information of the component data and the length information of the frame data; the component index includes the position information of each component data; the frame index data includes the playback time and position information corresponding to the frame data; the component data is the picture switching data, The graffiti data, the text addition data and the picture sequence; the frame data includes key frame data and change frame data, and the key frame data refers to the Frame data created by the picture; the change frame data refers to the difference attribute of the element relative to the last key frame.

可选地，所述解析所述媒体文件具体包括：Optionally, the parsing of the media file specifically includes:

获取所述头长度信息确定所述头文件的长度；Obtain the header length information to determine the length of the header file;

获取所述头文件，以确定后续各部分的长度以及起始位置；Obtain the header file to determine the length and starting position of subsequent parts;

获取所述元件索引和帧索引数据，解析帧索引数据，以确定后续的关键帧数据和变化帧数据对应的播放时间；Obtain the element index and frame index data, and analyze the frame index data to determine the play time corresponding to the subsequent key frame data and the change frame data;

获取帧数据，通过解析内部依赖元件信息异步加载对应元件数据，随着时间进程，播放所述关键帧数据和所述变化帧数据，以及播放所述音频数据。Acquire frame data, load corresponding component data asynchronously by parsing internal dependent component information, play the key frame data and the change frame data, and play the audio data as time progresses.

相应地，本发明提供一种媒体文件生成设备，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器执行上述媒体文件生成方法。Accordingly, the present invention provides a media file generating device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the one processor , the instructions are executed by the at least one processor, so that the at least one processor executes the above-mentioned method for generating a media file.

相应地，本发明提供一种媒体文件播放设备，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器执行上述媒体文件播放方法。Correspondingly, the present invention provides a media file playback device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by the one processor , the instruction is executed by the at least one processor, so that the at least one processor executes the above method for playing a media file.

根据本发明提供的媒体文件生成、播放方法及设备，在录制音频数据的过程中，根据制作者对于显示图片的切换操作、涂鸦动作和文字添加操作获取相应的数据，这些数据中分别包括基于录制进程的时间信息，由此记录制作者在录制过程中对于显示内容的各种操作。制作者可以任意选取素材，在进行语音讲解的同时翻阅图片并随时添加内容，制作者不需要拍摄视频，可以像可视化编辑文档一样所见即所得的编辑和制作知识内容。最终所生成的媒体文件最多包括音频数据、图片数据和一些用于记录制作者操作的数据，该文件相比于视频文件的数据量小5-10倍，并且便于知识内容的检索、查询、以及由计算机程序进行加工处理。According to the media file generation, playback method and device provided by the present invention, in the process of recording audio data, the corresponding data is obtained according to the producer's switching operation, graffiti action and text adding operation on the displayed picture. The time information of the process, thereby recording the various operations of the producer on the display content during the recording process. Producers can choose any material, flip through pictures and add content at any time while giving voice explanations. Producers do not need to shoot videos, and can edit and produce knowledge content just like visual editing of documents. The final generated media file includes at most audio data, picture data and some data used to record the operation of the producer. Compared with the video file, the data volume of this file is 5-10 times smaller, and it is convenient for retrieval, query, and retrieval of knowledge content. Processed by a computer program.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative efforts.

图1为本发明实施例中的一个应用程序界面示意图；1 is a schematic diagram of an application program interface in an embodiment of the present invention;

图2为本发明实施例中制作者在显示界面中添加涂鸦内容的示意图；2 is a schematic diagram of adding graffiti content in a display interface by a producer in an embodiment of the present invention;

图3为本发明实施例中制作者在显示界面中添加文字内容的示意图。FIG. 3 is a schematic diagram of adding text content to a display interface by a producer according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合附图对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

此外，下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明实施例提供一种媒体文件生成方法，该方法可由计算机、平板电脑、智能手机、服务器等电子设备执行，包括如下步骤：An embodiment of the present invention provides a method for generating a media file. The method can be executed by electronic devices such as a computer, a tablet computer, a smart phone, and a server, and includes the following steps:

S1，获取图片序列。制作者可以编制包括任何内容的图片，作为举例，比如制作者可以制作幻灯片文件，并将文件转换成图片序列。在优选的实施例中，支持导入自定义的幻灯片文件，经过实时转码队列转码，每一页生成一个图片，这一处理过程可以在执行本方法的设备中完成，也可以上传至远程服务器来处理。考虑占用空间和清晰度的问题，可将图片的分辨率预设为720p，得到的图片数据量约为200kb。S1, obtain a sequence of pictures. The producer can make a picture that includes any content, for example, the producer can make a slideshow file and convert the file into a sequence of pictures. In a preferred embodiment, it supports importing a custom slideshow file, and after transcoding in a real-time transcoding queue, each page generates an image. This process can be completed in the device that executes this method, or uploaded to a remote location. server to handle. Considering the space and clarity issues, the resolution of the picture can be preset to 720p, and the resulting picture data volume is about 200kb.

在可选的实施例中，提供实时文稿制作功能，制作者可以在预设的多种模板中选择某个模板，在其中添加文字内容，然后自动将模板和文字转换为图片。具体可通过一个文字编辑器实现添加文字的功能，用户在编辑器中输入要添加的文字时，会随着输入的文字行数变多由大到小自动改变文字的尺寸，保证所有文字都能被清楚地显示。文字的输入支持撤销和重做，便于制作者调整内容。In an optional embodiment, a real-time document production function is provided, and the producer can select a certain template from a variety of preset templates, add text content to it, and then automatically convert the template and the text into pictures. Specifically, the function of adding text can be realized through a text editor. When the user enters the text to be added in the editor, the size of the text will be automatically changed from large to small as the number of input lines of text increases to ensure that all text can be is clearly displayed. The input of text supports undo and redo, which is convenient for producers to adjust the content.

S2，录制音频数据，在录制过程中通过显示界面向制作者呈现图片序列，根据制作者对图片序列的切换动作获取图片切换数据、根据制作者在显示界面中的涂鸦动作获取涂鸦数据、根据制作者在显示界面中添加文字的操作获取文字添加数据，其中图片切换数据、涂鸦数据、文字添加数据分别至少包括基于录制进程的时间信息。S2, record audio data, present a picture sequence to the producer through the display interface during the recording process, obtain picture switching data according to the producer's switching action on the picture sequence, obtain graffiti data according to the producer's graffiti action on the display interface, and obtain the graffiti data according to the producer's graffiti action on the display interface. The user obtains the text adding data by adding text in the display interface, wherein the picture switching data, the graffiti data, and the text adding data respectively include at least time information based on the recording process.

在录制音频过程中获取的数据作为对制作者相应操作的描述信息，比如图片切换数据，即用于描述制作者在录制音频中的某个时间切换显示了图片序列中的某个图片。具体地，图1示出了一个基于本方法的应用程序界面，显示界面13中在某个时间点显示图片序列中的一个图片，界面中的进度显示栏11显示出音频数据的当前录制进度，假设初始录制时制作者在图片选择栏12中选择第1张图片，显示界面13中显示第一张图片，当录制到某个时间点t时，制作者点击图片选择栏12中的“图片2”，则产生了图片切换数据，其中包括切换动作标识和时间t，显示界面13中相应地显示第二张图片。当然，制作者也可以不按图片序列的排列顺序进行切换选择，比如在时间点t可以跳过一些图片而直接选择显示第n张图片，为此生成的数据还应当能够指示具体要显示的图片信息。The data obtained in the process of recording audio is used as description information for the corresponding operation of the producer, such as picture switching data, that is, it is used to describe that the producer switches and displays a certain picture in the picture sequence at a certain time in the recorded audio. Specifically, Fig. 1 shows an application program interface based on this method, a picture in the picture sequence is displayed at a certain time point in the display interface 13, and the progress display bar 11 in the interface shows the current recording progress of the audio data, Suppose that the producer selects the first picture in the picture selection column 12 during the initial recording, and the first picture is displayed in the display interface 13. When the recording reaches a certain time point t, the producer clicks "Picture 2" in the picture selection column 12 ”, the picture switching data is generated, which includes the switching action identifier and the time t, and the second picture is displayed in the display interface 13 accordingly. Of course, the producer can also switch selections not in the order of the picture sequence. For example, at time point t, some pictures can be skipped and the nth picture can be directly selected to be displayed. The data generated for this purpose should also be able to indicate the specific picture to be displayed. information.

关于涂鸦数据，本方案允许制作者在显示界面中画圈、点、线等标记，实际上是制作者针对当前所显示的某个图片内容的涂鸦操作。假设录制到时间点t时，制作者点击添加涂鸦按钮14，进而在图2所示界面中划线等涂鸦内容21，由此产生涂鸦数据，其用于描述涂鸦的内容、位置和时间等。Regarding the graffiti data, this solution allows the producer to draw circles, dots, lines and other marks on the display interface, which is actually a graffiti operation by the producer on the content of a currently displayed picture. Suppose that when recording reaches time point t, the producer clicks the add graffiti button 14, and then draws graffiti content 21 such as lines in the interface shown in FIG. 2, thereby generating graffiti data, which is used to describe the content, location and time of the graffiti.

关于文字添加数据，本方案允许制作者在显示界面中额外添加文字内容，比如是针对当前所显示的某个图片内容的注解内容。假设录制到时间点t时，制作者点击添加文字按钮15，进而在图3所示界面中来添加一些文字内容22，由此产生文字添加数据，其用于描述添加文字的内容、位置和时间等。Regarding the text addition data, this solution allows the producer to add additional text content in the display interface, such as annotation content for a currently displayed image content. Suppose that when recording to time point t, the producer clicks the add text button 15, and then adds some text content 22 in the interface shown in FIG. 3, thereby generating text addition data, which is used to describe the content, location and time of the added text Wait.

在录制音频的过程中，制作者可以点击界面中的暂停按钮来中止录制，以便调整此前的动作。当制作者需要修改某些动作时，可以选择已录制音频的某个时间点，然后删除相应的图片切换动作、涂鸦或者所添加的文字，或者对这些内容进行重新编辑。While recording audio, producers can click the pause button in the interface to abort the recording and adjust previous actions. When producers need to modify certain actions, they can select a certain time point in the recorded audio, and then delete the corresponding picture switching actions, graffiti or added text, or re-edit these contents.

S3，将图片切换数据、涂鸦数据、文字添加数据、音频数据、图片序列和/或其地址信息封装为媒体文件。当音频录制完成后，则将录制过程中记录的各种动作的数据及图片序列、音频数据封装为一个媒体文件。S3, encapsulate the picture switching data, graffiti data, text adding data, audio data, picture sequence and/or address information thereof into a media file. After the audio recording is completed, the data of various actions recorded in the recording process, the picture sequence, and the audio data are encapsulated into a media file.

在可选的实施例中，可支持制作者在已完成的内容中添加新的动作。具体地，制作者可以选择回放音频数据。在回放过程中，显示界面将呈现图片，并且在各个时间点重现其执行的各种动作。与录制过程类似地，制作者可以在任意时间点进行图片切换、涂鸦、添加文字，由此获取相应的数据，其中分别至少包括基于回放进程的时间信息。In an alternative embodiment, the producer may be supported to add new actions to the completed content. Specifically, the producer may choose to play back the audio data. During playback, the display interface will present pictures and reproduce the various actions it performs at various points in time. Similar to the recording process, the producer can switch pictures, scribble, and add text at any point in time, thereby obtaining corresponding data, including at least time information based on the playback process.

上述音频数据、图片、对图片的切换、涂鸦、文字分别作为独立的元件，这些元件分别对应不同的字段。音频数据对应的字段是其存储地址，可以是本地的硬盘存储地址，也可以是某网站(服务器)中的地址；图片序列存储成一个表单类文件，其中包括每一个图片的地址信息和序号，此外还可以包括对应于各个图片的缩略图，用于切换和索引；图片切换数据、涂鸦数据和文字添加数据也分别作为独立的元件，其中包括各种用于描述相应动作的字段。The above-mentioned audio data, pictures, switching of pictures, graffiti, and text are respectively independent components, and these components correspond to different fields respectively. The field corresponding to the audio data is its storage address, which can be a local hard disk storage address or an address in a website (server); the picture sequence is stored as a form file, which includes the address information and serial number of each picture, In addition, thumbnails corresponding to each picture may be included for switching and indexing; picture switching data, graffiti data and text adding data are also separately used as independent elements, including various fields for describing corresponding actions.

关于媒体文件中的内容，其中可以仅包括上述各种元件在互联网中的存储地址信息而非元件内容本身。在一个可选的实施例中，上述各个元件在本地被生成，在完成录制音频数据后，可以将图片切换数据、涂鸦数据、文字添加数据、音频数据、图片序列上传至远程存储设备并获取存储地址，即互联网中的地址，然后在在封装媒体文件时仅将各个地址信息封装为媒体文件，以最大程度减少该文件的数据量。当需要播放这种媒体文件时，通过其中的地址信息下载元件的实际内容。Regarding the content in the media file, it may only include the storage address information of the above-mentioned various elements in the Internet instead of the element content itself. In an optional embodiment, the above components are generated locally, and after the audio data is recorded, the picture switching data, graffiti data, text addition data, audio data, and picture sequence can be uploaded to a remote storage device and obtained and stored An address is an address in the Internet, and then only each address information is encapsulated into a media file when encapsulating a media file, so as to minimize the data amount of the file. When such a media file needs to be played, the actual content of the element is downloaded through the address information in it.

根据本发明实施例提供的媒体文件生成方法，在录制音频数据的过程中，根据制作者对于显示图片的切换操作、涂鸦动作和文字添加操作获取相应的数据，这些数据中分别包括基于录制进程的时间信息，由此记录制作者在录制过程中对于显示内容的各种操作。制作者可以任意选取素材，在进行语音讲解的同时翻阅图片并随时添加内容，制作者不需要拍摄视频，可以像可视化编辑文档一样所见即所得的编辑和制作知识内容。最终所生成的媒体文件最多包括音频数据、图片数据和一些用于记录制作者操作的数据，该文件相比于视频文件的数据量小5-10倍，并且便于知识内容的检索、查询、以及由计算机程序进行加工处理。According to the media file generation method provided by the embodiment of the present invention, in the process of recording audio data, corresponding data is obtained according to the producer's switching operation, graffiti action and text adding operation for displaying pictures, and these data respectively include recording process-based Time information, thereby recording the various operations of the producer on the display content during the recording process. Producers can choose any material, flip through pictures and add content at any time while giving voice explanations. Producers do not need to shoot videos, and can edit and produce knowledge content just like visual editing of documents. The final generated media file includes at most audio data, picture data and some data used to record the operation of the producer. Compared with the video file, the data volume of this file is 5-10 times smaller, and it is convenient for retrieval, query, and retrieval of knowledge content. Processed by a computer program.

在优选的实施例中，上述图片切换数据具体包括制作者在录制进程中切换图片的时间点、切换后的图片在图片序列中的序号。比如图片切换数据{t:0,idx:5}表示在音频录制的第0秒，所要显示的是图片序列中的第5张图片；图片切换数据{t:3529,idx:1}表示在音频录制的第3529毫秒，所要显示的是图片序列中的第1张图片。这种图片切换数据可支持制作者任意跳转显示图片。In a preferred embodiment, the picture switching data specifically includes a time point when the producer switches pictures during the recording process, and the sequence number of the switched pictures in the picture sequence. For example, the picture switching data {t:0,idx:5} indicates that the 0th second of the audio recording, the 5th picture in the picture sequence is to be displayed; the picture switching data {t:3529,idx:1} indicates that the audio At the 3529th millisecond of the recording, what is to be displayed is the first picture in the picture sequence. This kind of picture switching data can support the producer to arbitrarily jump and display pictures.

上述涂鸦数据包括涂鸦轨迹在显示界面中的位置信息，以及轨迹的颜色信息和/或宽度信息，其时间信息包括涂鸦轨迹的出现时间和消失时间。比如{c:3,w:3,s:17275,d:2229,p:[(36.97,15.89)、(37.09,16.73)……]}其中c代表颜色，w代表宽度，s代表开始时间(毫秒)，d代表结束时间(毫秒)，p表示组成涂鸦轨迹的点的坐标位置。The above graffiti data includes position information of the graffiti track in the display interface, and color information and/or width information of the track, and the time information includes the appearance time and disappearance time of the graffiti track. For example {c:3,w:3,s:17275,d:2229,p:[(36.97,15.89), (37.09,16.73)...]} where c represents color, w represents width, and s represents start time ( milliseconds), d represents the end time (milliseconds), and p represents the coordinate position of the points that make up the graffiti track.

为了进一步压缩媒体文件的数据量，在记录上述位置信息时，采取稀疏采样，比如针对涂鸦轨迹中的全部点每隔5个点采样1点，以此来减少有效数字位数、减少点的相对坐标，用于压缩位置信息的数据量。In order to further compress the data volume of the media file, sparse sampling is adopted when recording the above position information, for example, one point is sampled every 5 points for all points in the graffiti track, so as to reduce the number of significant digits and reduce the relative number of points. Coordinates, the amount of data used to compress the location information.

上述文字添加数据包括文字在显示界面中的位置信息、文字的颜色信息和尺寸信息，其时间信息包括文字的出现时间和消失时间。比如{"v":"xxx","c":3,"x":0,"y":0,"s":1,"te":49040,"tl":55542}，其中v表示文字内容，c表示颜色，x，y表示相对坐标，s表示放大倍数(文字尺寸)，te表示开始出现时间，tl表示消失时间。The above-mentioned text addition data includes position information of the text in the display interface, color information and size information of the text, and the time information includes the appearance time and disappearance time of the text. For example {"v":"xxx","c":3,"x":0,"y":0,"s":1,"te":49040,"tl":55542}, where v means Text content, c represents the color, x, y represents the relative coordinates, s represents the magnification (text size), te represents the start time, and tl represents the disappearance time.

关于上述媒体文件的封装格式(也称为封装容器)，本发明提供一种优选的封装结构。具体地，媒体文件中除了音视频流之外，还需要包括一些辅助信息以及组织视音频的方式信息，本实施例中的封装容器包括六部分：Regarding the encapsulation format (also called encapsulation container) of the above-mentioned media files, the present invention provides a preferred encapsulation structure. Specifically, in addition to the audio and video streams, the media file also needs to include some auxiliary information and information on how to organize the video and audio. The encapsulation container in this embodiment includes six parts:

第一部分为头文件，其中包括版本信息(1byte)、格式信息(2byte)、头长度信息 (8byte)，以及元件索引的长度信息(8byte)、帧索引数据的长度信息(8byte)、元件数据的长度信息(8byte)和帧数据的长度信息(8byte)。其中元件数据是指上述图片切换数据、涂鸦数据、文字添加数据和图片序列；The first part is the header file, which includes version information (1byte), format information (2byte), header length information (8byte), and the length information of the element index (8byte), the length information of the frame index data (8byte), the Length information (8byte) and frame data length information (8byte). The component data refers to the above picture switching data, graffiti data, text adding data and picture sequence;

第二部分为元件索引，每条元件索引数据8个字节，其中包括元件类型信息，以及各个元件数据在媒体文件中的具***置信息The second part is the element index, each element index data is 8 bytes, including the element type information, and the specific location information of each element data in the media file

第三部分为帧索引数据，其中包括帧数据对应的播放时间，以及各个帧数据在媒体文件中的具***置信息。帧索引数据是为了应对不可变长帧或者帧不确定长度的问题所配置的数据，便于做seek播放，以及作为流是否可以播放的基本依据。The third part is the frame index data, which includes the playback time corresponding to the frame data and the specific position information of each frame data in the media file. The frame index data is the data configured to deal with the problem of immutable length frames or frames of uncertain length, which is convenient for seek playback, and serves as the basic basis for whether the stream can be played.

第四部分为元件数据，即上述图片切换数据、涂鸦数据、文字添加数据、和图片序列。The fourth part is component data, that is, the above-mentioned picture switching data, graffiti data, text adding data, and picture sequence.

第五部分为帧数据,其中包括关键帧数据和变化帧数据。所述关键帧数据，是指在当前画面相对于上一帧有较大变化时，比如制作者切了一个图片时，针对当前画面所建立的帧数据，该帧数据包括当前界面所显示的所有元件的所有属性。比如第300ms显示的界面包含了一个新的图片a，其中还包括文字添加数据b，则相应的关键帧数据可表示为The fifth part is frame data, including key frame data and change frame data. The key frame data refers to the frame data established for the current screen when the current screen has a large change relative to the previous frame, such as when the producer cuts a picture, and the frame data includes all the data displayed on the current interface. All properties of the element. For example, the interface displayed at the 300th ms contains a new picture a, which also includes text addition data b, then the corresponding key frame data can be expressed as

300:{300:{

a:图片索引id。图片数据、背景颜色、初始宽度和高度、显示或隐藏状态。a: The image index id. Image data, background color, initial width and height, show or hide state.

b:文本字体，文本颜色，文本大小。文本所在位置相对于当前界面的宽高。显示或隐藏状态。b: Text font, text color, text size. The position of the text is relative to the width and height of the current interface. Show or hide status.

}。}.

所述变化帧数据是指相对于最近的上一个关键帧中的元件的差异属性，比如图片的地址信息发生变化、比如宽度变化等等，变化帧数据中只包括这些差异的属性。比如第310ms相对于第300ms显示的界面，只是文字添加数据b的属性发生了变化(比如是文字的显示位置)，则相应的变化帧数据可表示为The change frame data refers to the difference attributes relative to the element in the last key frame, for example, the address information of the picture changes, such as width changes, etc. The change frame data only includes these difference attributes. For example, compared with the interface displayed at the 300th ms, only the attribute of the text added data b has changed (for example, the display position of the text), then the corresponding change frame data can be expressed as

310：{310: {

b:新的位置。top leftb: new location. top left

}。}.

第六部分为音频数据，可以是将mp3、aac等格式的音频文件的二进制数据直接附加到帧数据之后。The sixth part is audio data, which can be directly appending binary data of audio files in formats such as mp3 and aac to the frame data.

本发明还提供了一种媒体文件播放方法，用于播放根据上述方法生成的媒体文件。本方法可以由计算机、平板电脑、智能手机等电子设备执行，包括如下步骤：The present invention also provides a method for playing a media file, which is used for playing the media file generated according to the above method. The method can be executed by electronic devices such as computers, tablet computers, and smart phones, and includes the following steps:

获取媒体文件，其中包括音频数据、图片序列、图片切换数据、涂鸦数据、文字添加数据和/或其地址信息，数据分别至少包括基于音频数据录制进程的时间信息；Acquiring media files, including audio data, picture sequences, picture switching data, graffiti data, text addition data and/or address information thereof, the data respectively including at least time information based on the audio data recording process;

解析媒体文件，播放音频数据；Parse media files and play audio data;

在播放过程中根据时间信息在显示界面中显示图片序列中的图片以及涂鸦和文字。During the playing process, the pictures, graffiti and texts in the picture sequence are displayed on the display interface according to the time information.

对于观看者而言，与观看普通视频的体验相同，但对于播放机制而言，播放本发明实施例提供的媒体文件的方式与播放普通视频文件的方式完全不同。本实施例的媒体文件中的音频数据被持续播放，对于当前的播放时间t，获取t对应的图片切换数据、涂鸦数据和文字添加数据，进而分层渲染，在不同的层中根据这些数据中的属性信息在显示界面中渲染内容。For the viewer, the experience is the same as watching an ordinary video, but for the playing mechanism, the mode of playing the media file provided by the embodiment of the present invention is completely different from the mode of playing the ordinary video file. The audio data in the media file in this embodiment is continuously played, and for the current playback time t, the picture switching data, graffiti data and text addition data corresponding to t are obtained, and then rendered in layers, according to the data in different layers. The attribute information of the render content in the display interface.

另外，全部图片序列、图片切换数据、涂鸦数据、文字添加数据可以被呈现给观看者，观看者可以选择任意某个文字添加数据、图片序列中的某个图片等内容，进而得到对应的音频时间点，以快速地在全部内容中检索想要观看的内容。针对上述优选的封装格式，在播放时可根据头长度信息确定头文件部分的长度，当读取到元件数据和帧索引数据后，只需要加载若干帧数据即可开始播放。当播放到某个时间点时，只需要对比当前状态和初始状态的差距就可以直接还原当前时间的界面显示内容。即使观看者选择跳转播放，也可以计算出前后帧的基本位置和时间，因此只加载对应的数据即可以基于跳转的时间点继续播放。In addition, all picture sequences, picture switching data, graffiti data, and text addition data can be presented to the viewer, and the viewer can select any text addition data, a certain picture in the picture sequence, etc., and then obtain the corresponding audio time. point to quickly retrieve the content you want to watch in all content. For the above-mentioned preferred encapsulation format, the length of the header file portion can be determined according to the header length information during playback. After reading the component data and frame index data, it only needs to load several frames of data to start playing. When the playback reaches a certain point in time, the interface display content of the current time can be directly restored only by comparing the gap between the current state and the initial state. Even if the viewer chooses to skip playback, the basic position and time of the preceding and following frames can be calculated, so the playback can be continued based on the skipped time point only by loading the corresponding data.

下面介绍一种解析播放上述包括六部分内容的媒体文件的方法：The following introduces a method for parsing and playing the above-mentioned media file including six parts:

S1，下载媒体文件的前几个字节，以确定头文件的长度。S1, download the first few bytes of the media file to determine the length of the header file.

S2，完整下载头文件，以确定后续各部分的长度以及起始位置。S2, the header file is completely downloaded to determine the length and starting position of the subsequent parts.

S3，完整下载元件索引数据和帧索引数据，解析帧索引数据，以确定后续的关键帧数据和变化帧数据对应的播放时间。S3: Completely download the component index data and the frame index data, and analyze the frame index data to determine the play time corresponding to the subsequent key frame data and the change frame data.

S4，下载帧数据，解析内部依赖元件信息，并异步加载对应元件数据。考虑到元件数据可能较大，所以不进行提前加载。当播放到某个时间点，根据元件索引数据确定元件数据位置，从而异步加载元件数据，由此缩短播放前下载文件所需的时间，尽快开始播放。S4, download frame data, parse internal dependent component information, and asynchronously load corresponding component data. Considering that the component data may be large, no advance loading is performed. When the playback reaches a certain point in time, the position of the component data is determined according to the component index data, so as to load the component data asynchronously, thereby shortening the time required for downloading the file before playback, and starting playback as soon as possible.

S5，随着时间进程，播放关键帧和变化帧。遇到关键帧只需要按照元件属性解析即可；遇到变化帧则在最近的关键帧上叠加一次数据合并即可绘制出当前变化帧的数据，持续播放直至结尾。S5, play key frames and change frames as time progresses. When encountering a key frame, you only need to parse it according to the component properties; when encountering a changing frame, superimpose a data merge on the nearest key frame to draw the data of the current changing frame, and continue to play until the end.

另外，音频数据按照音频的正常格式，比如mp3平台内置解码器就能支持，通过range请求去加载mp3数据进行播放即可。In addition, the audio data is in the normal format of the audio, for example, the built-in decoder of the mp3 platform can support it, and the mp3 data can be loaded and played through the range request.

本领域内的技术人员应明白，本发明的实施例可提供为方法、***、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程属性处理设备的处理器以产生一个机器，使得通过计算机或其他可编程属性处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable attribute processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable attribute processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程属性处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable property processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程属性处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable attribute processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

显然，上述实施例仅仅是为清楚地说明所作的举例，而并非对实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。Obviously, the above-mentioned embodiments are only examples for clear description, and are not intended to limit the implementation manner. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. And the obvious changes or changes derived from this are still within the protection scope of the present invention.

Claims

一种媒体文件生成方法，其特征在于，包括：A method for generating media files, comprising:

获取图片序列；Get image sequence;

录制音频数据，在录制过程中通过显示界面向制作者呈现所述图片序列，根据制作者对所述图片序列的切换动作获取图片切换数据、根据制作者在显示界面中的涂鸦动作获取涂鸦数据、根据制作者在显示界面中添加文字的操作获取文字添加数据，其中所述图片切换数据、所述涂鸦数据、所述文字添加数据分别至少包括基于录制进程的时间信息；Recording audio data, presenting the picture sequence to the producer through the display interface during the recording process, obtaining picture switching data according to the producer's switching action on the picture sequence, obtaining graffiti data according to the producer's graffiti action on the display interface, The text addition data is obtained according to the operation of the producer adding text in the display interface, wherein the picture switching data, the graffiti data, and the text addition data respectively include at least time information based on the recording process;

将所述图片切换数据、所述涂鸦数据、所述文字添加数据、所述音频数据、所述图片序列和/或其地址信息封装为媒体文件。The picture switching data, the graffiti data, the text adding data, the audio data, the picture sequence and/or its address information are encapsulated into a media file.
根据权利要求1所述的方法，其特征在于，在完成录制音频数据后，还包括：The method according to claim 1, characterized in that, after completing the recording of the audio data, the method further comprises:

根据制作者的操作回放所述音频数据；Playing back the audio data according to the producer's operation;

在回放过程中通过显示界面向制作者呈现所述图片序列，根据制作者对所述图片序列的切换动作获取图片切换数据、根据制作者在显示界面中的涂鸦动作获取涂鸦数据、根据制作者在显示界面中添加文字的操作获取文字添加数据，其中所述图片切换数据、所述涂鸦数据、所述文字添加数据分别至少包括基于回放进程的时间信息。In the playback process, the picture sequence is presented to the producer through the display interface, the picture switching data is obtained according to the switch action of the producer on the picture sequence, the graffiti data is obtained according to the producer's graffiti action in the display interface, and the graffiti data is obtained according to the producer's graffiti action on the display interface. The operation of adding text in the display interface obtains text adding data, wherein the picture switching data, the graffiti data, and the text adding data respectively include at least time information based on the playback process.
一种媒体文件播放方法，其特征在于，包括：A method for playing media files, comprising:

获取媒体文件，其中包括音频数据、图片序列、图片切换数据、涂鸦数据、文字添加数据和/或其地址信息，所述图片切换数据、所述涂鸦数据、所述文字添加数据中分别至少包括基于所述音频数据录制进程的时间信息；Acquire media files, including audio data, picture sequences, picture switching data, graffiti data, text addition data and/or address information thereof, wherein the picture switching data, the graffiti data, and the text addition data respectively include at least one based on Time information of the audio data recording process;

解析所述媒体文件，播放所述音频数据；Parse the media file, and play the audio data;

在播放过程中根据所述时间信息在显示界面中显示所述图片序列中的图片以及涂鸦和文字。During the playing process, pictures, graffiti and texts in the picture sequence are displayed on the display interface according to the time information.
根据权利要求1-3中任一项所述的方法，其特征在于，所述图片切换数据中的时间信息包括制作者在录制进程中切换图片的时间点；所述图片切换数据还包括切换后的图片在所述图片序列中的序号。The method according to any one of claims 1-3, wherein the time information in the picture switching data includes a time point when the producer switches pictures during the recording process; the picture switching data also includes The sequence number of the picture in the picture sequence.
根据权利要求1-3中任一项所述的方法，其特征在于，所述涂鸦数据还包括涂鸦轨迹在所述显示界面中的位置信息，所述涂鸦数据中的时间信息包括所述涂鸦轨迹的出现时间和消失时间。The method according to any one of claims 1-3, wherein the graffiti data further includes location information of the graffiti track in the display interface, and the time information in the graffiti data includes the graffiti track time of appearance and disappearance.
根据权利要求5所述的方法，其特征在于，所述涂鸦数据还包括所述涂鸦轨迹的颜色信息和/或宽度信息。The method according to claim 5, wherein the graffiti data further comprises color information and/or width information of the graffiti track.
根据权利要求5所述的方法，其特征在于，在获取涂鸦数据的过程中，针对所述涂鸦轨迹的位置信息进行稀疏采样，用于压缩所述位置信息的数据量。The method according to claim 5, wherein in the process of acquiring the graffiti data, sparse sampling is performed on the position information of the graffiti track, so as to compress the data amount of the position information.
根据权利要求1-3中任一项所述的方法，其特征在于，所述文字添加数据的时间信息包括文字的出现时间和消失时间；所述文字添加数据还包括文字在所述显示界面中的位置信息。The method according to any one of claims 1-3, wherein the time information of the text addition data includes the appearance time and the disappearance time of the text; the text addition data also includes the text in the display interface. location information.
根据权利要求8所述的方法，其特征在于，所述文字添加数据还包括文字的颜色信息和/或尺寸信息。The method according to claim 8, wherein the text addition data further includes color information and/or size information of the text.
根据权利要求1所述的方法，其特征在于，所述媒体文件包括头文件、元件索引、帧索引数据、元件数据、帧数据和所述音频数据，其中The method of claim 1, wherein the media file comprises a header file, an element index, frame index data, element data, frame data, and the audio data, wherein

头文件包括版本信息、头长度信息，以及元件索引的长度信息、帧索引数据的长度信息、元件数据的长度信息和帧数据的长度信息；The header file includes version information, header length information, and length information of component index, length information of frame index data, length information of component data, and length information of frame data;

元件索引包括各个元件数据的位置信息；The component index includes the location information of each component data;

帧索引数据包括帧数据对应的播放时间和位置信息；The frame index data includes the playback time and position information corresponding to the frame data;

元件数据为所述图片切换数据、所述涂鸦数据、所述文字添加数据和所述图片序列；The component data is the picture switching data, the graffiti data, the text adding data and the picture sequence;

帧数据包括关键帧数据和变化帧数据，所述关键帧数据是指在当前画面相对于上一帧有较大变化时，针对当前画面所建立的帧数据；所述变化帧数据是指相对于最近的上一个关键帧中的元件的差异属性该。Frame data includes key frame data and change frame data, the key frame data refers to the frame data established for the current picture when the current picture has a large change relative to the previous frame; the change frame data refers to the frame data relative to the previous frame. The difference property of the symbol in the most recent last keyframe.
根据权利要求10所述的方法，其特征在于，解析所述媒体文件具体包括：The method according to claim 10, wherein parsing the media file specifically comprises:

获取所述头长度信息确定所述头文件的长度；Obtain the header length information to determine the length of the header file;

获取所述头文件，以确定后续各部分的长度以及起始位置；Obtain the header file to determine the length and starting position of subsequent parts;

获取所述元件索引和帧索引数据，解析帧索引数据，以确定后续的关键帧数据和变化帧数据对应的播放时间；Obtain the element index and frame index data, and analyze the frame index data to determine the play time corresponding to the subsequent key frame data and the change frame data;

获取帧数据，通过解析内部依赖元件信息异步加载对应元件数据，随着时间进程，播放所述关键帧数据和所述变化帧数据，以及播放所述音频数据。Acquire frame data, load corresponding component data asynchronously by parsing internal dependent component information, play the key frame data and the change frame data, and play the audio data as time progresses.
一种媒体文件生成设备，其特征在于，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器执行如权利要求1、2-11中任意一项所述的媒体文件生成方法。A media file generating device, comprising: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores instructions executable by the one processor, the The instructions are executed by the at least one processor to cause the at least one processor to perform the media file generation method of any one of claims 1, 2-11.
一种媒体文件播放设备，其特征在于，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器执行如权利要求3-11中任意一项所述的媒体文件播放方法。A media file playback device, comprising: at least one processor; and a memory connected in communication with the at least one processor; wherein the memory stores instructions executable by the one processor, The instructions are executed by the at least one processor, so that the at least one processor executes the media file playing method according to any one of claims 3-11.