CN118296178A - File generation method, device, equipment and readable storage medium - Google Patents

File generation method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN118296178A
CN118296178A CN202310011411.XA CN202310011411A CN118296178A CN 118296178 A CN118296178 A CN 118296178A CN 202310011411 A CN202310011411 A CN 202310011411A CN 118296178 A CN118296178 A CN 118296178A
Authority
CN
China
Prior art keywords
audio
video
file
object block
video file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310011411.XA
Other languages
Chinese (zh)
Inventor
赵志立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310011411.XA priority Critical patent/CN118296178A/en
Publication of CN118296178A publication Critical patent/CN118296178A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

The embodiment of the application discloses a file generation method, a device, equipment and a readable storage medium, relating to cloud technology, wherein the method comprises the following steps: acquiring an audio and video stream, and performing coding processing on the audio and video stream to obtain a first audio and video file; the first audio and video file comprises N audio and video clips, wherein N is a positive integer; adding an audio and video index object block at a first position of a first audio and video file; the audio and video index object block comprises index information of N audio and video clips; modifying the encoding format object block in the first audio and video file into an audio and video object block, and adding N audio and video fragments and fragment index object blocks associated with each audio and video fragment into the audio and video object block; and constructing a second audio and video file based on the added audio and video index object block and the added audio and video object block. By adopting the embodiment of the application, the disk occupation can be reduced, and the complexity of file generation can be reduced.

Description

File generation method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of cloud technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for generating a file.
Background
When recording audio and video data, the recorded audio and video data is required to be stored as a file, for example, as an audio compression technology (MPEG-4 part 14, MP 4) file, and the MP4 file format is one of the most widely used audio and video storage formats, and has the advantages of compact format, small redundancy, friendly playing, perfect ecology and the like. However, once the MP4 file is exited due to an abnormal situation, such as program crash, device power failure, etc., the generated incomplete file cannot be used and is difficult to repair, so that the generated whole file is discarded and cannot be played.
At present, a private expansion mode is generally used, user-defined information is continuously written in the MP4 file generation process, and after an abnormal situation occurs, an incomplete file is repaired according to the written expansion information, but the expansion information is required to be continuously written in the mode, so that the disk occupation is increased, and the complexity of file generation is improved.
Disclosure of Invention
The embodiment of the application provides a file generation method, a device, equipment and a readable storage medium, which can reduce disk occupation and reduce the complexity of file generation.
In a first aspect, the present application provides a file generating method, including:
Acquiring an audio and video stream, and performing coding processing on the audio and video stream to obtain a first audio and video file; the first audio-video file comprises N audio-video fragments, the first audio-video file comprises coding format object blocks, N audio-video fragments and fragment index object blocks of each audio-video fragment in the N audio-video fragments, wherein the N audio-video fragments, the N audio-video fragments and the fragment index object blocks of each audio-video fragment in the N audio-video fragments are arranged in sequence, and N is a positive integer;
Adding an audio and video index object block at a first position of the first audio and video file; the audio/video index object block comprises index information of the N audio/video clips, and the first position comprises a file tail position of the first audio/video file or a position between the coding format object block and the clip index object block;
Modifying the encoding format object block in the first audio and video file into an audio and video object block, and adding the N audio and video fragments and fragment index object blocks associated with each audio and video fragment into the audio and video object block;
And constructing and obtaining a second audio and video file based on the added audio and video index object block and the added audio and video object block.
In a second aspect, the present application provides a document generating apparatus including:
the data acquisition unit is used for acquiring an audio-video stream, and carrying out coding processing on the audio-video stream to obtain a first audio-video file; the first audio-video file comprises N audio-video fragments, the first audio-video file comprises coding format object blocks, N audio-video fragments and fragment index object blocks of each audio-video fragment in the N audio-video fragments, wherein the N audio-video fragments, the N audio-video fragments and the fragment index object blocks of each audio-video fragment in the N audio-video fragments are arranged in sequence, and N is a positive integer;
The index adding unit is used for adding an audio and video index object block at a first position of the first audio and video file; the audio/video index object block comprises index information of the N audio/video clips, and the first position comprises a file tail position of the first audio/video file or a position between the coding format object block and the clip index object block;
The object modification unit is used for modifying the encoding format object blocks in the first audio and video file into audio and video object blocks, and adding the N audio and video fragments and fragment index object blocks associated with each audio and video fragment into the audio and video object blocks;
And the file construction unit is used for constructing and obtaining a second audio and video file based on the added audio and video index object block and the added audio and video object block.
In a third aspect, the present application provides a computer device comprising a processor, a memory, wherein the memory is for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the above-described file generation method.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the above-described file generating method.
In a fifth aspect, the present application provides a computer program product or computer program comprising computer instructions which, when executed by a processor, implement the above-described file generation method.
In the embodiment of the application, the audio and video index object blocks including the index information of the N audio and video fragments are added in the first audio and video file, so that the added audio and video index object blocks can be obtained, and the N audio and video fragments in the audio and video file can be analyzed and played later by using the audio and video index object blocks. Further, the coding format object block in the first audio and video file is modified into an audio and video object block, N audio and video clips and the clip index object blocks associated with each audio and video clip are added into the audio and video object block, so that an added audio and video object block can be obtained, and a second audio and video file is constructed and obtained based on the added audio and video index object block and the added audio and video object block. Because the method does not need to write in the self-defined information, the occupation of a disk can be reduced, and the complexity of file generation is reduced. Further, since the composition structure of the second audio/video file is the same as that of the MP4 file, the reliability of file generation can be increased, and the generated file can be downward compatible, so that various application programs and terminal devices can be played. In addition, because the audio and video object blocks in the second audio and video file comprise a plurality of audio and video fragments and fragment index object blocks associated with each audio and video fragment, even if an abnormal situation occurs in the file generation process to cause the addition of the audio and video index object blocks to be abnormal, the generated audio and video fragments can be analyzed by using the associated fragment index object blocks, so that the file can be played, and the reliability of file playing is improved because the audio and video file can be played.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a network architecture of a file generation system according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a file generating method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a composition structure of a first audio/video file according to an embodiment of the present application;
Fig. 4 is a schematic diagram of a composition structure of an audio/video file added with an audio/video index according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating comparison of composition structures of an audio/video file according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating another method for generating a file according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a file generating apparatus according to an embodiment of the present application;
fig. 8 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing. The scheme provided by the embodiment of the application belongs to cloud security in the field of cloud technology.
Cloud Security (Cloud Security) refers to a generic term for Security software, hardware, users, institutions, secure Cloud platforms based on Cloud computing business model applications. Cloud security fuses emerging technologies and concepts such as parallel processing, grid computing, unknown virus behavior judgment and the like, acquires the latest information of Trojan horse and malicious programs in the Internet through abnormal monitoring of a large number of network clients on software behaviors, sends the latest information to a server for automatic analysis and processing, and distributes solutions of viruses and Trojan horse to each client.
The main research directions of cloud security include: 1. cloud computing security, namely, how to guarantee security of cloud and various applications on the cloud, including cloud computer system security, security storage and isolation of user data, user access authentication, information transmission security, network attack protection, compliance audit and the like; 2. clouding of a safety infrastructure, mainly researching how to build and integrate safety infrastructure resources by adopting cloud computing, and optimizing a safety protection mechanism, wherein the cloud computing technology is used for constructing a super-large-scale safety event and an information acquisition and processing platform, realizing acquisition and association analysis of mass information, and improving the control capability and risk control capability of the whole-network safety event; 3. cloud security services, mainly research on various security services provided for users based on cloud computing platforms, such as anti-virus services and the like. For example, in the application, the cloud security technology can be used for processing the first audio and video file to obtain the second audio and video file, and the second audio and video file can be encrypted for transmission, so that the security of file transmission is improved, and the like.
It should be specifically noted that, in the embodiment of the present application, data (such as an audio/video stream, an audio/video clip, etc.) related to object information is related, when the embodiment of the present application is applied to a specific product or technology, permission or consent of the object needs to be obtained, and collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions. An object may refer to a user of a terminal device or a computer device.
The technical scheme of the application can be applied to the fields of video transcoding, security protection or other scenes needing to record audio and video data and store files. Generating a first audio-video file by processing recorded audio-video data, adding index information of the audio-video data in the first audio-video file, modifying an encoding format object block in the first audio-video file into an audio-video object block, and adding N audio-video fragments included in the first audio-video file and fragment index object blocks of each audio-video fragment into the audio-video object block. Under the condition that the second audio and video file is abnormal in the process of generating the second audio and video file, decoding processing can be carried out on each audio and video fragment by using the fragment index of each audio and video fragment, so that the audio and video fragment before the occurrence of the abnormality can be played. Under the condition that no abnormality occurs in the process of generating the second audio/video file, decoding processing can be carried out on each audio/video fragment by using the audio/video index, so that the playing of the whole audio/video file is realized, and the reliability of file playing is improved. The technical scheme of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.
Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture of a file generating system according to an embodiment of the present application, as shown in fig. 1, a computer device may perform data interaction with terminal devices, and the number of terminal devices may be one or at least two, for example, when the number of terminal devices is plural, the terminal devices may include terminal device 101a, terminal device 101b, and terminal device 101c in fig. 1. Taking the terminal device 101a as an example, the computer device 102 may obtain an audio/video stream, for example, the audio/video stream may include multi-frame audio/video data, and encode the audio/video stream to obtain a first audio/video file. Further, the computer device 102 may add an audio and video index object block at a first location of the first audio and video file, where the audio and video index object block may include index information of N audio and video clips; and modifying the encoding format object block in the first audio and video file into an audio and video object block, and adding N audio and video fragments and fragment index object blocks associated with each audio and video fragment into the audio and video object block. Further, the computer device 102 may construct a second audio-video file based on the added audio-video index object block and the added audio-video object block. Optionally, the computer device 102 may also send the second audio/video file to the terminal device 101a to play the second audio/video file on the terminal device 101a, for example, the terminal device 101a may further perform decoding processing on the second audio/video file to obtain a decoded second audio/video file, so that the decoded second audio/video file is played on the terminal device 101 a.
It is understood that the computer devices mentioned in the embodiments of the present application include, but are not limited to, terminal devices or servers. In other words, the computer device may be a server or a terminal device, or may be a system formed by the server and the terminal device. The above-mentioned terminal device may be an electronic device, including, but not limited to, a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a vehicle-mounted device, an intelligent voice interaction device, an augmented Reality (AR/VR) device, a head mounted display, a wearable device, a smart speaker, a smart home appliance, an aircraft, a digital camera, a camera, and other mobile internet devices (mobile INTERNET DEVICE, MID) with network access capability, etc. The servers mentioned above may be independent physical servers, or may be server clusters or distributed systems formed by a plurality of physical servers, or may be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, vehicle-road collaboration, content distribution networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Further, referring to fig. 2, fig. 2 is a flow chart of a file generating method according to an embodiment of the present application; as shown in fig. 2, the file generation method may be applied to a computer device, and includes, but is not limited to, the following steps:
s101, acquiring an audio and video stream, and performing coding processing on the audio and video stream to obtain a first audio and video file.
In the embodiment of the application, the computer equipment can record the audio and video stream through the camera device with the association relationship, so as to acquire the audio and video stream; or the computer equipment can also acquire the audio and video stream sent by the terminal equipment; or the computer device may also obtain the audio/video stream from the local storage, etc., and the embodiment of the present application does not limit the manner of obtaining the audio/video stream. The audio and video streams may include an audio stream and a video stream, where the audio stream may refer to continuous multi-frame audio data, and the video stream may refer to continuous multi-frame video data. The audio and video stream may include, but is not limited to, an audio and video stream recorded in a security field, an audio and video stream recorded in a live scene, an audio and video stream recorded by a vehicle recorder in a car networking field or an audio and video stream recorded by other video shooting scenes, and the like. When the computer equipment acquires the audio and video stream, the audio and video stream can be subjected to coding processing to obtain a first audio and video file, wherein the first audio and video file comprises N audio and video fragments, and N is a positive integer. The first audio video file may include, but is not limited to, a fMP4 (FRAGMENTED MP, clip MP 4) file that supports streaming.
In one implementation scenario, the traditional MP4 format is one of the most widely used audio and video storage formats, and has the advantages of compact format, small redundancy, friendly playing, perfect ecology and the like. If the files are normally generated in the process of generating MP4 files, the complete MP4 files can be obtained, and the files can be directly played later. However, in the process of generating the MP4 file, an abnormal situation easily occurs, so that the generated file is abnormal, for example, once the MP4 file is exited due to the abnormal situation, such as program crash, equipment outage, etc., in the process of generating the MP4 file, the generated incomplete file cannot be used, and the damaged file is difficult to repair. Mainly because the audio and video index object block includes all index information of audio and video data when MP4 file is generated, the audio and video index object block is written into the file only when MP4 file generation is finished. When abnormal conditions occur and the audio and video index object blocks are lost, the audio and video data cannot be identified and analyzed, so that the MP4 file generation process is unreliable and difficult to repair when errors occur.
Compared with the traditional MP4 file, the fMP file is mainly different in that the fMP file does not contain index information of audio and video data in the encoding format object block, so that the encoding format object block can be written into the file at the beginning of generating the fMP file. The audio and video data may be stored in segments in a plurality of audio and video segment object blocks, i.e. one audio and video segment is stored in one audio and video segment object block, for example, one audio and video segment may be several seconds. The index information of the audio and video data in the fMP file can be stored in the segment index object block associated with each audio and video segment in a segmented way, and each segment index object block is provided with a segment index object block in front of the audio and video segment object block, that is, each segment index object block is associated with the previous audio and video object block, and the index information of the associated audio and video segment is stored in the segment index object block. Because the encoding format object block is written in advance, the audio and video data and the index are stored in a segmented mode, the fMP file has high reliability, and the file is safely abnormally terminated when the fMP file is generated, namely the file is generated and terminated due to the occurrence of the abnormality, and the file generated before the abnormal termination can be decoded and played. However, the compatibility of various application programs for fMP is not as good as that of the traditional MP4 structure, and when generating fMP file, there may be a situation that playing is not supported for some application programs or terminal devices. MP4 files are generated in fMP4 format, fMP files are still available even if the intermediate exception exits.
In some scenes with relatively long time consumption, such as video recording scenes, because the video recording time is long, a long time is required to generate the MP4 file, and abnormal situations are difficult to avoid in the process of generating the MP4 file. Therefore, the technical scheme of the application is improved on the generated fMP file and combined with the traditional MP4 file structure, so that the generated second audio and video file has the same structure as the traditional MP4 file, thereby realizing high reliability and being downward compatible, and enabling various application programs and terminal equipment to play the second audio and video file. Even if an abnormal situation occurs in the process of generating the file, because the fMP file contains the fragment index object block of each audio and video fragment, the audio and video fragment before the abnormal situation occurs can be analyzed and played, and the reliability of the file generating process is improved.
Optionally, the audio and video stream can be divided frame by frame, and each frame of audio and video stream is an audio and video fragment, so as to obtain N audio and video fragments; or the audio and video stream can be divided according to the division period, the audio and video data in the same division period is an audio and video fragment, N audio and video fragments are obtained, the audio and video stream can be divided according to other division modes, and the mode for dividing the audio and video fragments is not limited in the embodiment of the application.
In the embodiment of the application, the first audio/video file comprises an encoding format object block, N audio/video fragments and a fragment index object block of each audio/video fragment in the N audio/video fragments which are arranged in sequence. The arranging in order may mean arranging in the first audio-video file in order from top to bottom. N audio and video clips can be stored in the audio and video clip object blocks, and then a plurality of object blocks in the first audio and video file can be sequentially arranged into an encoding format object block, an audio and video clip object block and a clip index object block. Further, the first audio and video file may further include a file description object block, and the plurality of object blocks of the first audio and video file arranged in sequence may be a file description object block, a coding format object block, an audio and video fragment object block, and a fragment index object block in sequence.
In one embodiment, the method for encoding an audio/video stream to obtain a first audio/video file may include: dividing an audio and video stream into N audio and video clips, and obtaining a clip index object block of each audio and video clip in the N audio and video clips; adding a file description object block at a second position of the initial audio and video file, adding an encoding format object block at a third position of the initial audio and video file, and adding N audio and video fragments and fragment index object blocks of each audio and video fragment in the N audio and video fragments at a fourth position of the initial audio and video file to obtain the first audio and video file.
The file description object block is used for indicating the file format of the first audio and video file, the coding format object block is used for indicating the coding format information of the first audio and video file, and the fragment index object block of the audio and video fragment comprises the coding format information of the audio and video fragment and the storage position of each audio and video fragment.
In the embodiment of the present application, the initial audio/video file may refer to an empty file, for example, may be a file frame, and the first audio/video file may be obtained by adding various object blocks (such as boxes) in the file frame, such as a file description object block, a coding format object block, a fragment index object block, and the like. For example, a file description object block may be added to a second location of the original audio-video file, such as a start location of the original audio-video file, and the file description object block may be used to describe a format of the file, such as fMP format, version of the audio-video file, compatible protocol, and so on. The file description object block may include, but is not limited to, an ftyp box (a file format identifier), and by parsing the ftyp box, an application program (e.g., a player, a parser) can be made aware of which protocol should be used to parse the file, which is the basis for subsequent playback of the file.
Further, an encoding format object block may be added to a third location of the original audio-video file, such as a next location of the file description object block, and the encoding format object block may be used to describe encoding format information of the first audio-video file. For audio Video files, the audio Coding format may include, but is not limited to, advanced audio Coding (Advanced Audio Coding, AAC), an audio compression technology (MPEG-1Audio Layer III,MP3), and the like, and the Video Coding format may include, but is not limited to, coding formats such as H264 (Advanced Video Coding ), H265 (HIGH EFFICIENCY Video Coding, also called HEVC), and the like. The encoding format object block may include, but is not limited to, a moov Box (Movie Box), which may be used to describe metadata of audio and video data, which may refer to data used to describe the data. The moov box in fMP4 file only contains the coding format information of the first audio and video file, and does not include the storage position of each frame of audio and video data.
Further, a plurality of audio and video clip object blocks and clip index object blocks corresponding to each audio and video clip object block may be added at a fourth position of the initial audio and video file, where the fourth position may refer to, for example, a next position of the encoding format object block, and the audio and video clip object blocks may include audio and video clip data. One audio and video fragment object block can be associated with one fragment index object block, and the fragment index object block can be used for analyzing the associated audio and video fragment, so that the playing of the associated audio and video fragment is realized. For example, the fragment index object block may include information on the encoding format of the audio/video fragment and the storage location of each audio/video fragment. The storage position of the audio and video clips may refer to an offset of each audio and video clip in the first audio and video file, and because the size of each frame of audio and video data is different after compression, where to start to end in the first audio and video file and what the size of the frame of audio and video data is, when the audio and video file is analyzed later, analysis of each frame of audio and video data can be achieved, and further playing of the audio and video data is achieved. The audio and video clip object blocks may include, but are not limited to, mdat Box (MEDIA DATA Box), and the clip index object blocks may include, but are not limited to, moof Box (descriptive information for storing video clips such as coding format information and storage locations).
Optionally, the second position, the third position and the fourth position may be sequentially arranged from top to bottom. The object blocks arranged in sequence are file description object blocks, coding format object blocks and fragment index object blocks in sequence.
As shown in fig. 3, fig. 3 is a schematic structural diagram of a first audio/video file according to an embodiment of the present application, where a second position of the first audio/video file is a file description object block, a third position of the first audio/video file is a coding format object block, a fourth position of the first audio/video file is an audio/video fragment object block and a fragment index object block, and one fragment index object block is associated with one audio/video fragment object block (i.e., a fragment index object block is associated with an audio/video fragment object block pointed by the fragment index object block). The first audio and video file may include a plurality of audio and video clip object blocks and clip index object blocks corresponding to each audio and video clip object block.
S102, adding an audio and video index object block at a first position of a first audio and video file.
In the embodiment of the application, the encoding format object block in the first audio and video file only contains the encoding format information of the first audio and video file and does not contain the storage position of audio and video data of each frame, so that the analysis of N audio and video fragments in the first audio and video file by using the encoding format object block cannot be realized, the file playing is realized, and therefore, the computer equipment can add the audio and video index object block at the first position of the first audio and video file. The audio/video index object block may include, but is not limited to, index information of N audio/video clips in the first audio/video file, and the first position may be, for example, an end position or a middle position of the first audio/video file. For example, the first location may include a file tail location of the first audio-video file or a location between the encoding format object block and the segment index object block.
It can be understood that when the audio/video index object block is added at the tail position of the file, the storage position of each audio/video segment in the first audio/video file does not need to be changed, that is, the offset of each audio/video segment in the first audio/video file is unchanged, and then the storage position of each audio/video segment can be added into the audio/video index object block. When the audio and video index object block is added between the files, the storage position of each audio and video segment in the first audio and video file needs to be changed, for example, the offset of each audio and video segment in the first audio and video file can be moved backwards, the storage position of each audio and video segment can be adjusted first, and the adjusted storage position of each audio and video segment is added into the audio and video index object block. By adding the audio and video index object blocks comprising the index information of the N audio and video clips at the first position of the first audio and video file, the first audio and video file can be analyzed based on the audio and video index object blocks, so that the playing of the audio and video file is realized.
Because the audio-video index object block includes index information of N audio-video segments, when generating the second audio-video file, index information including N audio-video segments is generally generated at an end position of the first audio-video file, when generating the second audio-video file, index information for each audio-video segment may be generated according to each audio-video segment, and when generating index information of N audio-video segments, the index information of N audio-video segments may be written into the audio-video index object block in the first audio-video file.
As shown in fig. 4, fig. 4 is a schematic diagram of a composition structure of an audio/video file added with an audio/video index according to an embodiment of the present application, where a second position of a first audio/video file is a file description object block, a third position of the first audio/video file is a coding format object block, a fourth position of the first audio/video file is an audio/video clip object block and a clip index object block, and the first position of the first audio/video file is added with the audio/video index object block. The audio and video index object block may include index information of N audio and video clips in the first audio and video file.
S103, modifying the coding format object block in the first audio and video file into an audio and video object block, and adding N audio and video clips and the clip index object block associated with each audio and video clip into the audio and video object block.
In the embodiment of the application, the encoding format object block in the first audio and video file only contains the encoding format information of the first audio and video file and does not contain the storage position of each frame of audio and video data, so that the analysis of N audio and video fragments in the first audio and video file by using the encoding format object block cannot be realized. Therefore, the computer device can modify the encoding format object block in the first audio/video file into an audio/video object block, and add N audio/video clips and the clip index object block associated with each audio/video clip into the audio/video object block. Because one audio and video fragment is associated with one fragment index object block, the audio and video fragment associated with each fragment index object block in the audio and video object block can be analyzed based on each fragment index object block, so that each audio and video fragment in the audio and video object block can be played.
Wherein, since the N audio/video clips and the clip index object block associated with each audio/video clip are located in the first audio/video file, for example, may be located in the fourth position in the first audio/video file, adding the N audio/video clips and the clip index object block associated with each audio/video clip to the audio/video object block may refer to: and moving the N audio and video clips and the clip index object blocks associated with each audio and video clip from a fourth position in the first audio and video file to the audio and video object blocks.
It can be seen that, by modifying the encoding format object block in the first audio/video file into the audio/video object block, the composition structure in the second audio/video file obtained at this time is sequentially the file description object block, the audio/video object block, and the audio/video index object block from the start position to the end position (from top to bottom). As shown in fig. 5, fig. 5 is a schematic diagram illustrating the comparison of the composition structures of an audio and video file according to the embodiment of the present application, 5a in fig. 5 is a second audio and video file obtained by adding an audio and video index object block to a first audio and video file and adding an audio and video clip and a clip index object block to an audio and video object block, and 5b in fig. 5 is a conventional MP4 file, and it can be seen that the composition structures of two files are a file description object block, an audio and video object block, and an audio and video index object block from top to bottom. The original segment index object block and audio/video segment object block after modification in the second audio/video file are invisible to the outside (i.e., the object block in the dashed box is invisible to the outside), and are contained in the new mdat box (audio/video object block). Because the modified audio and video file (namely the second audio and video file) has the same composition structure as the traditional MP4 file, high reliability and downward compatibility can be realized, and various application programs and terminal equipment can play the second audio and video file.
In one embodiment, the encoded format object block in the first audio-video file includes an object type and an object size, and the audio-video object block includes an object type and an object size; when the encoding format object block in the first audio-video file is modified into an audio-video object block, the object type of the encoding format object block in the first audio-video file can be modified into the object type of the audio-video object block, and the object size of the encoding format object block in the first audio-video file can be modified into the object size of the audio-video object block.
Here, since the object types and sizes of the various object blocks in the first audio/video file are different, the modification of the object blocks may be achieved by modifying the types and sizes of the object blocks. For example, the type of the object block may be represented by 4 bytes, or the size of the object block may be represented by 4 bytes, so that the encoding format object block in the first audio-video file may be modified into an audio-video object block by modifying these 8 bytes of the encoding format object block in the first audio-video file. For example, the object type of the encoding format object block is moov, moov may be represented by 4 bytes, the object size of the encoding format object block is 8 kilobytes (Kilobyte, KB), and 8KB may be represented by 4 bytes (e.g., z1z2z3z 4); the object type of the audio-video object block is mdat, which can be represented by 4 bytes, and the object size of the audio-video object block is 100 MegaBytes (MB), which can be represented by 4 bytes (e.g., z5z6z7z 8). The encoding format object block in the first audio-video file may be modified to an audio-video object block by modifying the object type of moov to mdat and modifying the object size of moov (e.g., z1z2z3z 4) to 100MB (e.g., z5z6z7z 8). Since the byte modification operation is light, secondary operations such as file copying are not involved, and thus the overhead of a disk, a central processing unit (Central Processing Unit, CPU) and the like can be saved.
S104, based on the added audio and video index object block and the added audio and video object block, constructing and obtaining a second audio and video file.
In the embodiment of the application, the audio and video index object blocks including the index information of N audio and video clips are added in the first audio and video file, so that the added audio and video index object blocks can be obtained; the audio/video object blocks after being added can be obtained by modifying the encoding format object blocks in the first audio/video file into audio/video object blocks and adding N audio/video clips and the clip index object blocks associated with each audio/video clip into the audio/video object blocks, and then the second audio/video file can be obtained by constructing based on the added audio/video index object blocks and the added audio/video object blocks, for example, the added audio/video index object blocks and the added audio/video object blocks are directly used as the second audio/video file. The second audio and video file comprises a fragment index object block associated with each audio and video fragment, and each fragment index object block can analyze and process the associated audio and video fragment to realize playing the audio and video file.
In the embodiment of the application, because the method for generating the second audio and video file has higher reliability, the obtained audio and video file is still effective when the process of constructing the second audio and video file is abnormally terminated, and the analysis and play of the file before the abnormality can be realized. Further, since the operation is light, secondary operations such as file copying are not involved, and the cost of a disk, a CPU and the like can be saved. In addition, the second audio and video file completely accords with MP4 standard, and private expansion is not needed when the audio and video file is produced, so that data recovery does not need to be carried out by relying on private tools when the audio and video file is analyzed and played. In addition, the method only needs to output a single MP4 file once, and does not need to generate a plurality of small fragment files to carry out secondary merging operation, so that the file analysis complexity and efficiency can be reduced. And the finally generated second audio and video file can be downward compatible to support the playing of old equipment.
In the embodiment of the application, the audio and video index object blocks including the index information of the N audio and video fragments are added in the first audio and video file, so that the added audio and video index object blocks can be obtained, and the N audio and video fragments in the audio and video file can be analyzed and played later by using the audio and video index object blocks. Further, the coding format object block in the first audio and video file is modified into an audio and video object block, N audio and video clips and the clip index object blocks associated with each audio and video clip are added into the audio and video object block, so that an added audio and video object block can be obtained, and a second audio and video file is constructed and obtained based on the added audio and video index object block and the added audio and video object block. Because the method does not need to write in the self-defined information, the occupation of a disk can be reduced, and the complexity of file generation is reduced. Further, since the composition structure of the second audio/video file is the same as that of the MP4 file, the reliability of file generation can be increased, and the generated file can be downward compatible, so that various application programs and terminal devices can be played. In addition, because the audio and video object blocks in the second audio and video file comprise a plurality of audio and video fragments and fragment index object blocks associated with each audio and video fragment, even if an abnormal situation occurs in the file generation process to cause the addition of the audio and video index object blocks to be abnormal, the generated audio and video fragments can be analyzed by using the associated fragment index object blocks, so that the file can be played, and the reliability of file playing is improved because the audio and video file can be played.
Optionally, referring to fig. 6, fig. 6 is a flowchart of another file generating method according to an embodiment of the present application. The file generation method can be applied to computer equipment; as shown in fig. 6, the file generation method includes, but is not limited to, the steps of:
S201, obtaining an audio and video stream, and performing coding processing on the audio and video stream to obtain a first audio and video file.
S202, adding an audio and video index object block at a first position of a first audio and video file.
S203, modifying the encoding format object block in the first audio/video file into an audio/video object block, and adding N audio/video clips and the clip index object block associated with each audio/video clip into the audio/video object block.
S204, based on the added audio and video index object block and the added audio and video object block, a second audio and video file is constructed.
In the embodiment of the present application, the specific implementation manner of step S201 to step S204 may refer to the implementation manner of step S101 to step S104, and will not be described herein.
S205, if a decoding instruction for the second audio/video file is acquired, analyzing the second audio/video file based on the index information of the N audio/video clips in the audio/video index object block to obtain an analyzed second audio/video file.
In the embodiment of the present application, if no abnormality occurs in the steps S201 to S204, a second audio/video file may be constructed, and since the second audio/video file includes an audio/video index object block, and the audio/video index object block includes index information of N audio/video segments, the audio/video index object block may be used to analyze N audio/video segments in the second audio/video file, thereby implementing analysis of the second audio/video file and obtaining an analyzed second audio/video file. Optionally, the parsed second audio/video file may be played by using a playing device associated with the computer device, or the parsed second audio/video file may be sent to any terminal device that needs to play the second audio/video file, so as to play the second audio/video file at the terminal device.
In one embodiment, the method for analyzing the first audio/video file to obtain the analyzed second audio/video file may include: acquiring N audio and video clips based on storage positions of the N audio and video clips in the audio and video index object block; and decoding the N audio and video clips based on the coding format information of the first audio and video file to obtain a decoded second audio and video file.
Here, since the audio/video index object block includes index information of N audio/video clips, the index information of N audio/video clips includes coding format information of the first audio/video file and a storage location of each of the N audio/video clips. For example, determining the offset of each audio and video segment in the first audio and video file and determining that the coding format information of the audio and video segment is in the H256 coding format, it may be achieved that each audio and video segment is obtained and the audio and video segment is parsed by using the H256 coding format. The parsing process may include, but is not limited to, a decoding process to effect decoding of each audio segment. And respectively decoding the N audio and video clips to obtain decoded N audio and video clips, and splicing the N audio and video clips according to the time sequence of each audio and video clip to obtain a decoded second audio and video file.
S206, when an abnormality occurs in the process of constructing the second audio and video file, acquiring a third audio and video file, and if a decoding instruction for the third audio and video file is acquired, analyzing the associated audio and video fragment based on a fragment index object block contained in the third audio and video file to obtain the third audio and video file after analysis.
In the embodiment of the application, when detecting that an abnormality occurs in the process of constructing the second audio/video file, acquiring a third audio/video file; the third audio and video file comprises M audio and video fragments acquired before abnormality occurs and fragment index object blocks associated with the M audio and video fragments, wherein M is a positive integer less than or equal to N; and if a decoding instruction for the third audio/video file is acquired, decoding the third audio/video file based on the fragment index object blocks associated with the M audio/video fragments to obtain the decoded third audio/video file.
In the embodiment of the present application, the occurrence of the abnormality in the process of constructing the second audio/video file may refer to the occurrence of the abnormality in any of the steps S202 to S204, and when the occurrence of the abnormality in any of the steps S202 to S204 is detected, a third audio/video file may be obtained, and then the analysis processing may be performed on the associated audio/video fragment based on the fragment index object block in the third audio/video file, so as to obtain the third audio/video file after the analysis.
In one embodiment, the above-mentioned process of constructing the second audio/video file may have different abnormal positions, and the number of the obtained audio/video clips may be different, for example, the abnormal positions may include, but are not limited to, an abnormality in a process of generating the first audio/video file, an abnormality in a process of adding the audio/video index object block in the first position of the first audio/video file, and so on. If the abnormal position is abnormal in the process of generating the first audio/video file, the generated third audio/video file (namely the first audio/video file) only comprises part of the N audio/video fragments. If the abnormal position is abnormal in the process of adding the audio and video index object block at the first position of the first audio and video file, the generated third audio and video file may include part of the audio and video clips in the N audio and video clips or include the N audio and video clips, and the generated audio and video file may be decoded based on the clip index object block associated with the audio and video clip in the generated audio and video file, so as to obtain the decoded audio and video file.
In one embodiment, the segment index object block associated with each audio/video segment in the M audio/video segments includes coding format information of each audio/video segment and a storage location of each audio/video segment, and the method for parsing the third audio/video file based on the segment index object blocks associated with the M audio/video segments to obtain the parsed third audio/video file may include: acquiring each audio and video clip based on the storage position of the audio and video clip in the clip index object block associated with each audio and video clip in the M audio and video clips to obtain M audio and video clips; and decoding the M audio and video clips based on the encoding format information of the M audio and video clips to obtain a decoded third audio and video file.
Here, since one clip index object block is associated with one audio/video clip, and the clip index object block includes coding format information and storage locations of the audio/video clip associated with the clip index object block, decoding processing can be performed on the associated audio/video clip based on each clip index object block, so as to obtain the audio/video clip associated with the clip index object block. And decoding M audio and video clips acquired before abnormality occurs in the process of constructing the second audio and video file to obtain corresponding audio and video clip data, and splicing according to the time sequence of the M audio and video clip data to obtain the decoded first audio and video file.
It can be understood that if the process of constructing the second audio/video file is performed normally, the second audio/video file may be constructed, and since the second audio/video file includes an audio/video index object block and the audio/video index object block includes index information of N audio/video segments, the N audio/video segments may be respectively parsed based on the index information of the N audio/video segments, so as to obtain N audio/video segments, thereby obtaining the parsed second audio/video file.
If the process of constructing the second audio and video file is abnormal, the index information in the audio and video index object block is abnormal, for example, the index information in the audio and video index object block is incomplete and only includes the index information of a part of the audio and video clips in the N audio and video clips, and the N audio and video clips in the first audio and video file cannot be analyzed based on the index information in the audio and video index object block. At this time, the audio and video clips can be parsed based on the clip index information of the audio and video clips in the clip index object block in the first audio and video file, so as to obtain the parsed first audio and video file. Because different file analysis methods are adopted under different conditions to analyze the audio and video files, the file analysis mode is more flexible, the playing of the audio and video files can be realized, and the file playing experience is improved.
In one embodiment, the file transmission environment of the second audio and video file may be further detected, and whether to process the second audio and video file is determined based on the security detection result, so as to improve the security of file transmission. Specifically, when a file acquisition request sent by the terminal device and used for requesting the second audio/video file is acquired, the computer device can perform security detection on the file transmission environment of the second audio/video file to obtain security detection information; the security detection information is used for indicating whether the file transmission environment is secure or not; if the security detection information indicates that the file transmission environment is unsafe, a target encryption mode matched with the security level in the security detection information is acquired; and encrypting the second audio and video file by adopting a target encryption mode, and sending the encrypted second audio and video file to the terminal equipment.
The file acquisition request is used for requesting to acquire the second audio and video file. The method for performing security detection on the file transmission environment of the second audio/video file includes, but is not limited to, detecting at least one of a file level of the second audio/video file, detecting a network type of a file transmission network, and acquiring security detection information uploaded by the terminal device. The file level of the second audio and video file is used for reflecting the importance level of the second audio and video file, and the higher the file level of the second audio and video file is, the higher the importance level of the second audio and video file is, the higher the security of file transmission is when the second audio and video file is transmitted. Network types of file transfer networks include, but are not limited to, public networks, private networks, and the like. The public network has lower file transfer security than the private network. If the file grade of the second audio/video file is the first grade, the security detection information is used for indicating that the file transmission environment is unsafe; and if the file grade of the second audio/video file is the second grade, the security detection information is used for indicating the security of the file transmission environment. The second level is less important than the first level. If the network type of the file transmission network is a public network, the security detection information is used for indicating that the file transmission environment is unsafe; if the network type of the file transmission network is a private network, the security detection information is used for indicating the security of the file transmission environment.
Further, after determining whether the file transmission environment is safe, if the file transmission environment is safe, the second audio/video file may be sent to the terminal device. If the file transmission environment is unsafe, the second audio and video file can be encrypted, and the encrypted file is sent to the terminal equipment. Optionally, the computer device may obtain a target encryption mode matching the security level in the security detection information; and encrypting the second audio and video file by adopting a target encryption mode. The security detection information may include security levels, and the security levels may include a plurality of levels, and when determining which security level is, the file may be encrypted by obtaining an encryption scheme matching the level. For example, the security levels include a first security level, a second security level, and a third security level, the security of the encrypted file obtained by the encryption method corresponding to the first security level is greater than the security of the encrypted file obtained by the encryption method corresponding to the second security level, and the security of the encrypted file obtained by the encryption method corresponding to the second security level is greater than the security of the encrypted file obtained by the encryption method corresponding to the third security level.
In the embodiment of the application, the security detection information such as the security level can be obtained by performing security detection on the file transmission environment, so that the second audio/video file is encrypted by selecting a matched encryption mode according to the security level. If the security level is higher, a simple encryption mode (encryption modes with lower security include but are not limited to a compression encryption mode) can be adopted for encryption, so that the efficiency of file encryption is improved. If the security level is low, the file can be encrypted by adopting a complex encryption mode (the encryption mode with higher security includes but is not limited to public and private key encryption, hash encryption and the like), so that the security of file transmission is improved.
In one possible implementation manner, the audio and video file may be further judged to determine whether the audio and video file is tampered, so as to determine the security of the audio and video file transmission. Specifically, the computer device may send the decoded second audio and video file to the terminal device, so that the terminal device decodes the second audio and video file based on index information of N audio and video clips included in the second audio and video file to obtain a fourth audio and video file, determines a security detection result of the second audio and video file based on the fourth audio and video file and the decoded second audio and video file, and sends prompt information to the computer device when the security detection result indicates that the second audio and video file is not secure; and when the fifth audio and video file is obtained, the fifth audio and video file is encrypted, and the encrypted fifth audio and video file is sent to the terminal equipment.
The security detection result may be used to indicate whether the second audio/video file is transmitted securely. And the computer equipment can decode the second audio and video file and send the decoded audio and video file to the terminal equipment. The terminal equipment can also receive the second audio-video file which is not decoded, and decode the second audio-video file which is not decoded at the terminal equipment to obtain a fourth audio-video file, so that whether the two decoded audio-video files are consistent is determined based on the two decoded audio-video files. If the two decoded audio and video files are inconsistent, a security detection result for indicating that the second audio and video file is unsafe can be generated. Or if the terminal device decodes the second audio/video file and fails to decode, a security detection result for indicating that the second audio/video file is unsafe may also be generated.
Optionally, when determining whether the two decoded audio and video files are consistent, if the two decoded audio and video files are audio files, audio detection may be performed on the two audio files respectively, and whether the two audio files are consistent is determined based on the audio detection result. If the two decoded audio and video files are video files, image detection can be performed on image frames in the two video files respectively, and whether the two video files are consistent is determined based on an image detection result. It can be seen that the security of audio and video file transmission can be determined by the method, if the audio and video files are inconsistent, possibly indicating that the files are tampered, the file transmission is unsafe, and then the computer equipment can encrypt the subsequent audio and video files such as the fifth audio and video file and transmit the encrypted files to the terminal equipment, so that the security of file transmission is improved.
In one possible scenario, MP4 is typically used as a storage format in a conventional video transcoding scenario, and video transcoding may include, but is not limited to, video decoding, video preprocessing, video encoding, encapsulating and storing as MP4 files, where each link operates synchronously, and streaming video data. Video coding is a CPU-intensive task and tends to be time consuming. If an abnormal situation occurs, the link of packaging and storing the MP4 file is not normally finished, the generated file cannot be used, and only one more transcoding pass can be performed from the beginning, and recovery operation cannot be performed from the failed place. Therefore, the video data can be stored as fMP files by the method in the embodiment of the application, and the second audio/video file can be obtained by modifying the fMP files, so that the problem of reliability of encapsulation storage can be solved, the cost of abnormal recovery can be saved, and the real-time video stream is ensured not to be lost.
In one embodiment, when the audio/video stream is encoded to obtain the first audio/video file, the audio/video stream may be preprocessed in advance, so that the first audio/video file is encoded subsequently. Specifically, the computer device may decompress the audio and video stream to obtain a decompressed audio and video stream; carrying out noise reduction treatment on the decompressed audio and video stream to obtain a noise-reduced audio and video stream; the noise-reduced audio and video stream can be subjected to coding processing to obtain a first audio and video file.
In this case, the audio/video stream is taken as video data as an example, and the decompression process may decompress the compressed video data, and the decompression may restore the compressed video data to an initial state. For example, when the terminal device uploads a video stream, in order to improve data transmission efficiency, the video stream may be compressed to obtain compressed video data, so that the compressed video data is sent to the computer device, and when the computer device obtains the compressed video data, the computer device may perform decompression processing to restore the compressed video data to a state on the terminal device. The video data is compressed and then transmitted, so that the code rate of the video data can be reduced, and the data transmission efficiency is improved.
Further, the computer device may perform video pre-processing on the decompressed video data, which may include, but is not limited to, noise reduction processing. The video preprocessing can reduce noise in the image and keep original information in the image as much as possible. The video data may be noise reduced, for example, using gaussian filtering, median filtering, or other filtering. For example, a gaussian filtering method can be used to scan each pixel point in each frame of image in video data by using a template (or convolution or mask), and the weighted average gray value of pixels in the neighborhood determined by the template is used to replace the value of the central pixel point of the template, so as to realize noise reduction of each frame of image.
Further, the encoding process of the noise-reduced video stream is to recompress the image data to reduce the file size. Because the video stream after noise reduction is not encapsulated, the video stream after noise reduction can be encapsulated, which is equivalent to using a container to pack data, so that the subsequent distribution processing of the first audio/video file is facilitated. Video transcoding refers to converting a video bitstream that has been compression encoded into another video bitstream to accommodate different network bandwidths, different terminal processing capabilities, and different object requirements.
In one possible scenario, for example, in the security field, a video stream transmitted by the image capturing device may be obtained, where the video stream transmitted by the image capturing device may be decoded video data, and the video stream is not required to be decoded again, and a frame of image is obtained by performing streaming processing on the video stream, and then the frame of image is encoded and encapsulated, so that the first audio/video file is obtained by encoding.
In the embodiment of the application, the audio and video index object blocks including the index information of the N audio and video fragments are added in the first audio and video file, so that the added audio and video index object blocks can be obtained, and the N audio and video fragments in the audio and video file can be analyzed and played later by using the audio and video index object blocks. Further, the coding format object block in the first audio and video file is modified into an audio and video object block, N audio and video clips and the clip index object blocks associated with each audio and video clip are added into the audio and video object block, so that an added audio and video object block can be obtained, and a second audio and video file is constructed and obtained based on the added audio and video index object block and the added audio and video object block. Because the method does not need to write in the self-defined information, the occupation of a disk can be reduced, and the complexity of file generation is reduced. Further, since the composition structure of the second audio/video file is the same as that of the MP4 file, the reliability of file generation can be increased, and the generated file can be downward compatible, so that various application programs and terminal devices can be played. In addition, because the audio and video object blocks in the second audio and video file comprise a plurality of audio and video fragments and fragment index object blocks associated with each audio and video fragment, even if an abnormal situation occurs in the file generation process to cause the addition of the audio and video index object blocks to be abnormal, the generated audio and video fragments can be analyzed by using the associated fragment index object blocks, so that the file can be played, and the reliability of file playing is improved because the audio and video file can be played.
The method of the embodiment of the application is described above, and the device of the embodiment of the application is described below.
Referring to fig. 7, fig. 7 is a schematic diagram of a composition structure of a file generating apparatus according to an embodiment of the present application, where the file generating apparatus may be deployed on a computer device; the file generating device can be used for executing corresponding steps in the file generating method provided by the embodiment of the application. The file generation device 70 includes:
The data acquisition unit 701 is configured to acquire an audio/video stream, and encode the audio/video stream to obtain a first audio/video file; the first audio-video file comprises N audio-video fragments, the first audio-video file comprises coding format object blocks, N audio-video fragments and fragment index object blocks of each audio-video fragment in the N audio-video fragments, wherein the N audio-video fragments, the N audio-video fragments and the fragment index object blocks of each audio-video fragment in the N audio-video fragments are arranged in sequence, and N is a positive integer;
An index adding unit 702, configured to add an audio and video index object block at a first position of the first audio and video file; the audio/video index object block comprises index information of the N audio/video clips, and the first position comprises a file tail position of the first audio/video file or a position between the coding format object block and the clip index object block;
an object modifying unit 703, configured to modify an object block of an encoding format in the first audio/video file into an audio/video object block, and add the N audio/video clips and a clip index object block associated with each audio/video clip to the audio/video object block;
The file construction unit 704 is configured to construct a second audio and video file based on the added audio and video index object block and the added audio and video object block.
Optionally, the file generating apparatus 70 further includes: a file decoding unit 705 for:
When detecting that an abnormality occurs in the process of constructing the second audio/video file, acquiring a third audio/video file; the third audio and video file comprises M audio and video clips acquired before abnormality occurs and clip index object blocks associated with the M audio and video clips, wherein M is a positive integer less than or equal to N;
And if a decoding instruction aiming at the third audio/video file is acquired, decoding the third audio/video file based on the fragment index object blocks associated with the M audio/video fragments to obtain the decoded third audio/video file.
Optionally, the segment index object block associated with each audio-video segment in the M audio-video segments includes coding format information of the audio-video segment and a storage location of the audio-video segment; the file decoding unit 705 is specifically configured to:
acquiring each audio and video clip based on the storage position of the audio and video clip in the clip index object block associated with each audio and video clip in the M audio and video clips to obtain the M audio and video clips;
and decoding the M audio and video clips based on the encoding format information of the M audio and video clips to obtain a decoded third audio and video file.
Optionally, the encoding format object block in the first audio-video file includes an object type and an object size, and the audio-video object block includes an object type and an object size; the object modifying unit 703 is specifically configured to:
modifying the object type of the encoding format object block in the first audio-video file into the object type of the audio-video object block, and modifying the object size of the encoding format object block in the first audio-video file into the object size of the audio-video object block.
Optionally, the data acquisition unit 701 is specifically configured to:
Dividing the audio and video stream into N audio and video clips, and obtaining a clip index object block of each audio and video clip in the N audio and video clips;
Adding a file description object block at a second position of an initial audio/video file, adding a coding format object block at a third position of the initial audio/video file, and adding the N audio/video fragments and a fragment index object block of each audio/video fragment in the N audio/video fragments at a fourth position of the initial audio/video file to obtain a first audio/video file;
The file description object block is used for indicating the file format of the first audio and video file, and the coding format object block is used for indicating the coding format information of the first audio and video file.
Optionally, the index information of the N audio/video clips includes coding format information of the first audio/video file and a storage location of each audio/video clip of the N audio/video clips; the file decoding unit 705 is further configured to:
if a decoding instruction aiming at the second audio and video file is acquired, acquiring N audio and video fragments based on the storage positions of the N audio and video fragments in the audio and video index object block;
And decoding the N audio and video clips based on the encoding format information of the first audio and video file to obtain a decoded second audio and video file.
Optionally, the file generating apparatus 70 further includes: a file transmission unit 706, configured to:
When a file acquisition request sent by the terminal equipment and used for requesting the second audio and video file is acquired, carrying out security detection on the file transmission environment of the second audio and video file to obtain security detection information; the security detection information is used for indicating whether the file transmission environment is secure or not;
If the security detection information indicates that the file transmission environment is unsafe, a target encryption mode matched with the security level in the security detection information is acquired;
and encrypting the second audio and video file by adopting the target encryption mode, and sending the encrypted second audio and video file to the terminal equipment.
It should be noted that, in the embodiment corresponding to fig. 7, the content not mentioned may be referred to the description of the method embodiment, and will not be repeated here.
In the embodiment of the application, the audio and video index object blocks including the index information of the N audio and video fragments are added in the first audio and video file, so that the added audio and video index object blocks can be obtained, and the N audio and video fragments in the audio and video file can be analyzed and played later by using the audio and video index object blocks. Further, the coding format object block in the first audio and video file is modified into an audio and video object block, N audio and video clips and the clip index object blocks associated with each audio and video clip are added into the audio and video object block, so that an added audio and video object block can be obtained, and a second audio and video file is constructed and obtained based on the added audio and video index object block and the added audio and video object block. Because the method does not need to write in the self-defined information, the occupation of a disk can be reduced, and the complexity of file generation is reduced. Further, since the composition structure of the second audio/video file is the same as that of the MP4 file, the reliability of file generation can be increased, and the generated file can be downward compatible, so that various application programs and terminal devices can be played. In addition, because the audio and video object blocks in the second audio and video file comprise a plurality of audio and video fragments and fragment index object blocks associated with each audio and video fragment, even if an abnormal situation occurs in the file generation process to cause the addition of the audio and video index object blocks to be abnormal, the generated audio and video fragments can be analyzed by using the associated fragment index object blocks, so that the file can be played, and the reliability of file playing is improved because the audio and video file can be played.
Referring to fig. 8, fig. 8 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application. As shown in fig. 8, the above-mentioned computer device may include: a processor 801, and a memory 802. Optionally, the computer device may further include a network interface or a power module. Data may be exchanged between the processor 801 and the memory 802.
The Processor 801 may be a central processing unit (Central Processing Unit, CPU) that may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The network interface may include input devices, such as a control panel, microphone, receiver, etc., and/or output devices, such as a display screen, transmitter, etc., which are not shown.
The memory 802 may include read only memory and random access memory, and provides program instructions and data to the processor 801. A portion of memory 802 may also include non-volatile random access memory. Wherein the processor 801, when calling the program instructions, is configured to execute:
Acquiring an audio and video stream, and performing coding processing on the audio and video stream to obtain a first audio and video file; the first audio-video file comprises N audio-video fragments, the first audio-video file comprises coding format object blocks, N audio-video fragments and fragment index object blocks of each audio-video fragment in the N audio-video fragments, wherein the N audio-video fragments, the N audio-video fragments and the fragment index object blocks of each audio-video fragment in the N audio-video fragments are arranged in sequence, and N is a positive integer;
Adding an audio and video index object block at a first position of the first audio and video file; the audio/video index object block comprises index information of the N audio/video clips, and the first position comprises a file tail position of the first audio/video file or a position between the coding format object block and the clip index object block;
Modifying the encoding format object block in the first audio and video file into an audio and video object block, and adding the N audio and video fragments and fragment index object blocks associated with each audio and video fragment into the audio and video object block;
And constructing and obtaining a second audio and video file based on the added audio and video index object block and the added audio and video object block.
Optionally, the program instructions may further implement other steps of the method in the above embodiment when executed by the processor, which is not described herein.
The embodiments of the present application also provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method as in the previous embodiments, the computer being part of a computer device as mentioned above. As an example, the program instructions may be executed on one computer device or on multiple computer devices located at one site, or on multiple computer devices distributed across multiple sites and interconnected by a communication network, which may constitute a blockchain network.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions which, when executed by a processor, implement some or all of the steps of the above-described method. For example, the computer instructions are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps performed in the embodiments of the methods described above.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, may include processes of the embodiments of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (11)

1. A method of generating a file, the method comprising:
Acquiring an audio and video stream, and performing coding processing on the audio and video stream to obtain a first audio and video file; the first audio-video file comprises N audio-video fragments, the first audio-video file comprises coding format object blocks, the N audio-video fragments and fragment index object blocks of each audio-video fragment in the N audio-video fragments, wherein the N audio-video fragments, the N audio-video fragments and the fragment index object blocks of each audio-video fragment in the N audio-video fragments are arranged in sequence, and N is a positive integer;
adding an audio and video index object block at a first position of the first audio and video file; the audio/video index object block comprises index information of the N audio/video clips, and the first position comprises a file tail position of the first audio/video file or a position between the coding format object block and the clip index object block;
Modifying the coding format object blocks in the first audio and video file into audio and video object blocks, and adding the N audio and video clips and the clip index object blocks associated with each audio and video clip into the audio and video object blocks;
And constructing and obtaining a second audio and video file based on the added audio and video index object block and the added audio and video object block.
2. The method according to claim 1, wherein the method further comprises:
when detecting that an abnormality occurs in the process of constructing the second audio and video file, acquiring a third audio and video file; the third audio and video file comprises M audio and video clips acquired before abnormality occurs and clip index object blocks associated with the M audio and video clips, wherein M is a positive integer less than or equal to N;
And if a decoding instruction for the third audio/video file is acquired, decoding the third audio/video file based on the segment index object blocks associated with the M audio/video segments to obtain a decoded third audio/video file.
3. The method of claim 2, wherein the segment index object block associated with each of the M audio-video segments includes coding format information of each of the audio-video segments and a storage location of each of the audio-video segments;
the decoding processing is performed on the third audio/video file based on the segment index object blocks associated with the M audio/video segments to obtain a decoded third audio/video file, which includes:
Acquiring each audio and video clip based on the storage position of the audio and video clip in the clip index object block associated with each audio and video clip in the M audio and video clips to obtain the M audio and video clips;
And decoding the M audio and video clips based on the encoding format information of the M audio and video clips to obtain a decoded third audio and video file.
4. The method of claim 1, wherein the encoded format object blocks in the first audio video file comprise an object type and an object size, and wherein the audio video object blocks comprise an object type and an object size;
The modifying the encoding format object block in the first audio-video file into an audio-video object block includes:
And modifying the object type of the encoding format object block in the first audio-video file into the object type of the audio-video object block, and modifying the object size of the encoding format object block in the first audio-video file into the object size of the audio-video object block.
5. The method according to any one of claims 1-4, wherein the encoding the audio-video stream to obtain a first audio-video file includes:
Dividing the audio and video stream into N audio and video clips, and obtaining a clip index object block of each audio and video clip in the N audio and video clips;
Adding a file description object block at a second position of an initial audio/video file, adding a coding format object block at a third position of the initial audio/video file, and adding the N audio/video fragments and a fragment index object block of each audio/video fragment in the N audio/video fragments at a fourth position of the initial audio/video file to obtain a first audio/video file;
the file description object block is used for indicating the file format of the first audio and video file, and the coding format object block is used for indicating the coding format information of the first audio and video file.
6. The method according to any one of claims 1-4, wherein the index information of the N audio-video clips includes coding format information of the first audio-video file and a storage location of each of the N audio-video clips; the method further comprises the steps of:
If a decoding instruction aiming at the second audio and video file is acquired, acquiring N audio and video fragments based on storage positions of the N audio and video fragments in the audio and video index object block;
And decoding the N audio and video clips based on the coding format information of the first audio and video file to obtain a decoded second audio and video file.
7. The method of claim 6, wherein the method further comprises:
When a file acquisition request sent by a terminal device and used for requesting the second audio and video file is acquired, carrying out security detection on the file transmission environment of the second audio and video file to obtain security detection information; the security detection information is used for indicating whether the file transmission environment is secure or not;
If the security detection information indicates that the file transmission environment is unsafe, a target encryption mode matched with the security level in the security detection information is acquired;
And encrypting the second audio and video file by adopting the target encryption mode, and sending the encrypted second audio and video file to the terminal equipment.
8. A document generating apparatus, the apparatus comprising:
the data acquisition unit is used for acquiring an audio and video stream, and carrying out coding processing on the audio and video stream to obtain a first audio and video file; the first audio-video file comprises N audio-video fragments, the first audio-video file comprises coding format object blocks, the N audio-video fragments and fragment index object blocks of each audio-video fragment in the N audio-video fragments, wherein the N audio-video fragments, the N audio-video fragments and the fragment index object blocks of each audio-video fragment in the N audio-video fragments are arranged in sequence, and N is a positive integer;
The index adding unit is used for adding an audio and video index object block at a first position of the first audio and video file; the audio/video index object block comprises index information of the N audio/video clips, and the first position comprises a file tail position of the first audio/video file or a position between the coding format object block and the clip index object block;
The object modification unit is used for modifying the encoding format object blocks in the first audio and video file into audio and video object blocks, and adding the N audio and video fragments and fragment index object blocks associated with each audio and video fragment into the audio and video object blocks;
And the file construction unit is used for constructing and obtaining a second audio and video file based on the added audio and video index object block and the added audio and video object block.
9. A computer device comprising a processor, a memory, wherein the memory is for storing a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having a processor to perform the method of any of claims 1-7.
11. A computer program product, characterized in that the computer program product comprises computer instructions which, when executed by a processor, implement the method according to any of claims 1-7.
CN202310011411.XA 2023-01-05 2023-01-05 File generation method, device, equipment and readable storage medium Pending CN118296178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310011411.XA CN118296178A (en) 2023-01-05 2023-01-05 File generation method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310011411.XA CN118296178A (en) 2023-01-05 2023-01-05 File generation method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN118296178A true CN118296178A (en) 2024-07-05

Family

ID=91674915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310011411.XA Pending CN118296178A (en) 2023-01-05 2023-01-05 File generation method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN118296178A (en)

Similar Documents

Publication Publication Date Title
US20130042100A1 (en) Method and apparatus for forced playback in http streaming
US11218784B1 (en) Method and system for inserting markers in a media presentation
US20120078864A1 (en) Electronic data integrity protection device and method and data monitoring system
WO2021072878A1 (en) Audio/video data encryption and decryption method and apparatus employing rtmp, and readable storage medium
CN110012260B (en) Video conference content protection method, device, equipment and system
CN110996160A (en) Video processing method and device, electronic equipment and computer readable storage medium
CN111294591B (en) Video information processing method, multimedia information processing method and device
US11315605B2 (en) Method, device, and computer program product for storing and providing video
CN110611830A (en) Video processing method, device, equipment and medium
CN113965381A (en) Method, device, processor and computer readable storage medium for realizing security encryption function of monitoring video
JP2022100261A (en) Encoding of modified video
CN111600879B (en) Data output/acquisition method and device and electronic equipment
CN110868610B (en) Streaming media transmission method, device, server and storage medium
CN118296178A (en) File generation method, device, equipment and readable storage medium
US11599570B2 (en) Device and method to render multimedia data stream tamper-proof based on block chain recording
EP3985989A1 (en) Detection of modification of an item of content
CN108965939B (en) Media data processing method, device and system and readable storage medium
EP3977750A1 (en) An apparatus, a method and a computer program for video coding and decoding
CN106534137B (en) Media stream transmission method and device
EP4369725A1 (en) Transcodable signed video data
CN111064717A (en) Data encoding method, data decoding method, related terminal and device
CN113674387B (en) Video processing method and device for unnatural scene video
CN116702218B (en) Rendering method, device, terminal and storage medium of three-dimensional model in applet
CN112788341B (en) Video information processing method, multimedia information processing method, device and electronic equipment
US20240259580A1 (en) Transmitter, a receiver and methods therein for validation of a video sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication