CN112437315B - Audio adaptation method and system for adapting to multiple system versions - Google Patents

Audio adaptation method and system for adapting to multiple system versions Download PDF

Info

Publication number
CN112437315B
CN112437315B CN202010911906.4A CN202010911906A CN112437315B CN 112437315 B CN112437315 B CN 112437315B CN 202010911906 A CN202010911906 A CN 202010911906A CN 112437315 B CN112437315 B CN 112437315B
Authority
CN
China
Prior art keywords
data
audio
ith
data segment
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010911906.4A
Other languages
Chinese (zh)
Other versions
CN112437315A (en
Inventor
陈建宇
徐胜
朱林伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hode Information Technology Co Ltd
Original Assignee
Shanghai Hode Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hode Information Technology Co Ltd filed Critical Shanghai Hode Information Technology Co Ltd
Priority to CN202010911906.4A priority Critical patent/CN112437315B/en
Publication of CN112437315A publication Critical patent/CN112437315A/en
Application granted granted Critical
Publication of CN112437315B publication Critical patent/CN112437315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4392Processing of audio elementary streams involving audio buffer management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The application discloses an audio adaptation method for adapting to multiple system versions, which comprises the following steps: acquiring the ith batch of audio data provided by a live broadcast recording tool of the system, wherein i is a positive integer; and obtaining corresponding first data fragments and second data fragments based on at least the ith batch of audio data. The audio adaptation method for adapting to the multi-system version can be compatible with various audio data with inconsistent data volume caused by the difference of the adaptive system version, and ensures that the encoding is correct.

Description

Audio adaptation method and system for adapting to multiple system versions
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an audio adaptation method, system, computer device, and computer readable storage medium adapted to multiple system versions.
Background
Webcast is one of the popular items of the current internet. At present, a large number of live APP based on Android operating systems or IOS operating systems are developed in the market for live operation. However, with the update iterations of Android or IOS system versions, these live APPs may not be applicable to multiple versions of the operating system at the same time. Taking an IOS system of apple company as an example, these live APP needs to call a system live recording tool replayKit in the IOS system to obtain audio data, but: the audio data output by the replayKit of the IOS system before the 13.0 version and the audio data output by the replayKit of the IOS system after the 13.0 version are greatly different. This data variability is undoubtedly prone to audio adaptation problems and thus coding errors.
Disclosure of Invention
An objective of the embodiments of the present application is to provide an audio adaptation method, a system, a computer device and a computer readable storage medium adapted to multiple system versions, which are used for solving the problem of audio discomfort caused by system version differences and further causing coding errors.
An aspect of an embodiment of the present application provides an audio adaptation method adapted to multiple system versions, the method including: acquiring the ith batch of audio data provided by a live broadcast recording tool of the system, wherein i is a positive integer; obtaining corresponding first data segments and second data segments based at least on the ith batch of audio data: when i=1, splitting the first batch of audio data into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integer multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the rest data segments except the first data segment in the first batch of audio data; transmitting the first data segment to a next audio processing module; temporarily storing the first and second data fragments in an audio buffer; forming an ith first data segment and an ith second data segment based on an ith-1 second audio segment and the ith audio data remaining in the ith-1 batch of audio data when i is equal to or greater than 2, wherein the data amount of the ith first data segment is the maximum integer multiple of slice data amount provided by the total audio data of the ith-1 second data segment and the ith audio data, and the ith second data segment is the remaining data segment except the ith first data segment in the total audio data of the ith-1 second data segment and the ith audio data; transmitting the ith first data segment to the next audio processing module; and temporarily storing the ith second data fragment in the audio buffer.
Optionally, the method further comprises: determining a timestamp of the audio, wherein the timestamp of the audio is equal to the master timestamp minus the temporary data timestamp; the master time stamp is the sum of time stamp increment of the 1 st batch to the i th batch of audio data, and the temporary time stamp is the time stamp increment corresponding to the i second data segment.
Optionally, the method further comprises: processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; and placing the plurality of tasks into a serial queue to perform asynchronous processing operations on the ith batch of audio data.
Optionally, the method further comprises: the priority of the serial queue is raised.
Optionally, the method further comprises: acquiring a data format of the ith batch of audio data; and if the data format is not the preset data format, converting the data format of the ith first data segment.
Optionally, the data format includes a size end; if the data format is not the preset data format, converting the data format of the ith first data segment, including: and if the size end is not the preset size end, replacing the j-th bit data and the j+1-th bit data in the ith first data segment, wherein j is a positive integer.
Optionally, the data format includes a channel number; if the data format is not the preset data format, converting the data format of the ith first data segment, including: if the data format is mono and the preset data format is bi-channel, performing channel number conversion on the ith first data segment: copying the kth bit data of the ith first data segment to a k x 2 bit address and a k x 2+2 bit address of a double-channel finger; copying the (k+1) th bit data of the (i) th first data segment to a k+2+1 bit address and a k+2+3 bit address of a dual-channel pointer, wherein k is a positive integer.
Optionally, the data format includes a sampling rate; if the data format is not the preset data format, converting the data format of the ith first data segment, including: and if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment through a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate.
Optionally, the system live recording tool is a replayKit of the IOS system.
An aspect of an embodiment of the present application further provides an audio adaptation system adapted to multiple system versions, including: the acquisition module is used for acquiring the ith batch of audio data provided by the live broadcast recording tool of the system, wherein i is a positive integer; the processing module is used for obtaining corresponding first data fragments and second data fragments at least based on the ith batch of audio data: when i=1, splitting the first batch of audio data into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integer multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the rest data segments except the first data segment in the first batch of audio data; transmitting the first data segment to a next audio processing module; temporarily storing the first and second data fragments in an audio buffer; forming an ith first data segment and an ith second data segment based on an ith-1 second audio segment and the ith audio data remaining in the ith-1 batch of audio data when i is equal to or greater than 2, wherein the data amount of the ith first data segment is the maximum integer multiple of slice data amount provided by the total audio data of the ith-1 second data segment and the ith audio data, and the ith second data segment is the remaining data segment except the ith first data segment in the total audio data of the ith-1 second data segment and the ith audio data; transmitting the ith first data segment to the next audio processing module; and temporarily storing the ith second data fragment in the audio buffer.
An aspect of the embodiments of the present application further provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the audio adaptation method as described above for adapting to multiple system versions when said computer program is executed.
An aspect of the embodiments of the present application further provides a computer readable storage medium comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of adapting audio of a multisystem version as described above when said computer program is executed.
According to the audio adaptation method, the system, the equipment and the computer readable storage medium for adapting to the multi-system version, each batch of audio data is segmented, so that the audio data fed into the next audio processing module are regular audio data, and various audio data with inconsistent data quantity caused by the difference of the adaptive system version are compatible. That is, the input processing of audio data of any magnitude requires data normalization processing to ensure that the encoding is correct.
Drawings
FIG. 1 schematically illustrates an application environment diagram of an audio adaptation method for adapting to multiple system versions according to an embodiment of the application;
fig. 2 schematically illustrates a flowchart of an audio adaptation method for adapting to a multi-system version according to an embodiment of the present application;
FIG. 3 schematically illustrates a flowchart of the added steps of an audio adaptation method for adapting to multiple system versions according to an embodiment of the present application;
FIG. 4 is a sub-step diagram of step S302 in FIG. 3;
fig. 5 schematically illustrates another flow chart of an audio adaptation method for adapting a multi-system version according to an embodiment of the present application;
FIG. 6 schematically illustrates another additional step flow diagram of an audio adaptation method for adapting to multiple system versions according to an embodiment of the present application;
FIG. 7 schematically illustrates another additional step flow diagram of an audio adaptation method for adapting to multiple system versions according to an embodiment of the present application;
fig. 8 schematically illustrates a block diagram of an audio adaptation system adapted to a multi-system version according to a second embodiment of the present application; a kind of electronic device with high-pressure air-conditioning system
Fig. 9 schematically illustrates a hardware architecture diagram of a computer device adapted to implement an audio adaptation method adapted to multiple system versions according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It should be noted that the descriptions of "first," "second," etc. in the embodiments of the present application are for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be regarded as not exist and not within the protection scope of the present application.
In the description of the present application, it should be understood that the numerical references before the steps do not identify the order of performing the steps, but are only used for convenience in describing the present application and distinguishing each step, and thus should not be construed as limiting the present application.
Fig. 1 schematically illustrates an environment application diagram of an audio adaptation method adapted to a multi-system version according to a first embodiment of the present application. In a live broadcast scene, the anchor terminal 2 can push live broadcast data to the audience terminal 4 in real time.
And the main broadcasting end 2 is used for generating live broadcast data in real time and performing push stream operation of the live broadcast data. The live data may include audio data or video data. The live end 2 may be a smart phone, a tablet computer, etc. based on the IOS system. In other embodiments, the live terminal 2 may be a live device based on an Android system or the like.
The viewer side 4 may be configured to receive live data from the anchor side 2 in real time. Viewer end 4 may be any type of computing device, such as a smart phone, tablet device, laptop, set top box, smart television, etc. The viewer side 4 may have a built-in browser or a dedicated program through which the live data is received to output content to the user. The content may include video, audio, comments, text data, and/or the like.
Wherein, the anchor terminal 2 is internally provided with a system live broadcast recording tool and a live broadcast APP (for example, bilibiliLink).
The anchor can start the live APP to conduct live broadcast operation. The live APP belongs to the application layer APP. In a live broadcast scene, a live broadcast APP needs to call a system live broadcast recording tool of a system layer, and live broadcast data are acquired through the system live broadcast recording tool. Thus, this involves the problem of adaptation of live APP and system live recording tools. The live broadcast recording tools of the systems corresponding to different system versions may be different, so that the live broadcast APP is difficult to be compatible with the plurality of system versions, and the problems of coding errors, frame loss, asynchronous audio and video, pull stream blocking, unstable push stream and the like are caused.
Taking IOS system as an example: the live recording tool of the system in the IOS system is a replayKit. The amount of audio data, the sampling rate, and the number of channels used by the replayKit are greatly different between the before and after IOS systems of version 13.0. Specific: IOS system before release 13.0: the replayKit audio callback outputs 441000 bytes of data every 0.5s, the number of channels is mono, and the microphone sampling rate is 44100hz; IOS system after release 13.0: the replayKit audio callback outputs 4096 bytes of data every 0.023220s, the number of channels is two, and the microphone sampling rate becomes 48000hz. The replayKit difference caused by different system versions may cause problems such as coding errors, frame loss, asynchronous audio and video, pull stream blocking, unstable push stream and the like.
The application aims to provide an audio adaptation scheme in a live scene to solve the problem of system version difference. Various embodiments are provided below that may be used to solve one or more of the technical problems described above, to achieve stable push, no frame loss, synchronization of audio and video, and no clip-on of live video recording and broadcasting of a live APP (e.g., bilibilili link).
Example 1
The IOS system and replayKit are described below as examples. It should be understood that the present application is not limited to the audio adaptation method of the IOS system.
Fig. 2 schematically shows a flow chart of an audio adaptation method for adapting to a multi-system version according to an embodiment of the present application. It should be noted that, in the following, an exemplary description is made with the computer device (the anchor side 2) as an execution subject. As shown in fig. 2, the audio adaptation method for adapting to multiple system versions may include steps S200 to S206, wherein:
step S200, the ith batch of audio data provided by the live broadcast recording tool of the system is obtained, wherein i is a positive integer.
And the ith batch of audio data is the audio data which is called back by the live broadcast recording tool of the system in real time in a live broadcast scene. The ith batch of audio data is also an 'ith audio data packet', and is the smallest data packet provided by the live broadcast recording tool of the system every time. The replayKit in the IOS system before version 13.0 provides 441000 bytes of data at a time, and the replayKit in the IOS system after version 13.0 provides 2048 bytes of data at a time. Thus: if the current system is an IOS system before 13.0 version, the data size of the ith batch of audio data is 441000 bytes; if the current system is the IOS system after version 13.0, the data size of the ith batch of audio data is 2048 bytes.
Step S202, obtaining a corresponding first data segment and a second data segment based on at least the ith batch of audio data:
when i=1, splitting the first batch of audio data into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integer multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the rest data segments except the first data segment in the first batch of audio data; transmitting the first data segment to a next audio processing module; temporarily storing the first and second data fragments in an audio buffer;
forming an ith first data segment and an ith second data segment based on an ith-1 second audio segment and the ith audio data remaining in the ith-1 batch of audio data when i is equal to or greater than 2, wherein the data amount of the ith first data segment is the maximum integer multiple of slice data amount provided by the total audio data of the ith-1 second data segment and the ith audio data, and the ith second data segment is the remaining data segment except the ith first data segment in the total audio data of the ith-1 second data segment and the ith audio data; transmitting the ith first data segment to the next audio processing module; and temporarily storing the ith second data fragment in the audio buffer.
The amount of audio data that it provides for the application layer each time may be different for different versions of the operating system. Due to the difference of the data volume, the problems of coding errors, frame loss and the like of a mixer, an encoder and the like are easy to occur. Taking the IOS system as an example, the replayKit in the IOS system after the 13.0 version has a fixed data callback amount of 1024 audio frames, and the data size is 2048 bytes, which is equivalent to unpacking the audio data by the system layer, so that the replayKit can be directly used for subsequent audio mixing or encoding processing. The callback data amount of replayKit in the IOS system before version 13.0 is 441000 bytes/0.5 s. If 441000 bytes of data are split in 2048 bytes, data which is not an integer multiple of 1024 bytes will remain each time, and this part of data cannot be immediately mixed or sent to an encoder, resulting in coding errors and frame loss.
In view of this, the present application performs a normalization process on the ith batch of audio data to solve the problem of inconsistent data size. Reference may be made in particular to the following exemplary steps: (1) calculating a slice data amount; wherein the slice data amount may be the data amount of one audio frame, namely: 1024 x byte sfrespreframe x channel number, byte sfresframe is used to represent the number of bytes each audio frame contains. (2) Iterations are established, the (i-th batch of audio data/slice data amount) is executed for times, and the i-th batch of audio data is unpacked. For example: in IOS systems, the encoder must fix the amount of data that is input into one audio frame, so "iteration" refers to the process of slicing the ith batch of audio data into how many audio frames to facilitate the subsequent encoding process. (3) Obtaining an ith first data segment A1 and an ith second data segment A2; wherein, the data size of the ith first data segment A1 is the largest integer multiple of the slice data size, namely: and M is less than or equal to the ith first data segment A1 less than or equal to (M+1), and M is a positive integer. The i-th first data segment A1 is a regular data segment, and the i-th first data segment A1 is used for subsequent encoding, audio mixing and other processes, so that the encoding can be correct. The ith second data segment A2 is an irregular data segment, and is temporarily not used for subsequent encoding, mixing, etc. to prevent encoding errors and frame loss.
It should be understood that the next audio processing module may include various audio processing modules such as an encoding module, a mixing module, and the like.
It should be appreciated that the data amount of the ith second data segment is less than one slice data amount or 0.
The computer device 2 may pre-establish the audio buffer. The audio buffer may be a char type pointer. The audio buffer initialization size should be large enough to prevent overflow problems of data. For example: the computer device 2 may create an array of blank pointers as an audio buffer for accepting the ith second data fragment A2. The blank pointer array is a temporary variable, and the life cycle is iterated currently.
The computer device 2 integrates the ith second data segment A2 remained in the ith batch of audio data with the (i+1) th batch of audio data, and performs an iterative operation on the integrated total audio data, where the iterative operation may have the following results:
the total audio data of the first, i-th second data segment A2 and the i+1st batch of audio data is just an integer multiple of the slice data amount. That is, the i+1th first data segment B1 is the i-th second data segment a2+the i+1th audio data.
The total audio data of the second, i-th second data segment A2 and the i+1th batch of audio data is not divisible by the slice data amount. The (i+1) th first data segment B1 is the (i) th second data segment a2+most of the (i+1) th audio data. The (i+1) th second data segment B2 is the data remaining after the total audio data is iterated.
For example, i=1, the slice data amount is 2048 bytes, the data amount of the 1 st batch of audio data is 441000 bytes, and the time sequence is:
(1) Acquiring 1 st batch of audio data;
(2) Obtaining data of 440320 bytes (2048 bytes 215) of a first data segment A1 in the 1 st audio data;
(3) Obtaining 680 bytes of data (data positioned at the last of the ith batch of audio data) of a first second data fragment A2 in the 1 st batch of audio data;
(4) The first data segment A1 in the step (3) sends the next audio processing module;
(5) Temporarily storing the first and second data segments A2 in the step (3) in an audio buffer area;
(6) Acquiring the 2 nd batch of audio data;
(7) Integrating the first and second data segments A2 and 2 nd batch of audio data in 680 bytes of the audio buffer in the step (5), and performing iterative operation, wherein:
(7.1) first iterating to obtain first data corresponding to the slice data volume, namely: the first second data segment a2+ of 680 bytes in the audio buffer of step (5) is the first 1368 bytes of data in the 2 nd batch of audio data;
(7.2) obtaining second data corresponding to the slice data amount through a second iteration;
… and so on, to give:
a second first data segment B1 (the 680 bytes of first second data segment + 439640 bytes of data in the 2 nd batch of audio data); a second data segment B2 (last 1360 bytes of data located in the batch 2 of audio data).
The audio adaptation method adapting to the multi-system version of the embodiment establishes general logic, and the input processing of audio data of any magnitude is required to be subjected to unified processing so as to ensure the regularity of the data and ensure the correct encoding. It can be known that the embodiment can solve the problem of system version difference, and realize unified adaptation of audio of different system versions.
The data normalization operation in the above embodiment solves the problem of audio adaptation in one aspect caused by the difference of system versions. The following embodiments also provide other aspects of audio adaptation problems caused by system version differences.
In an exemplary embodiment, in order to solve the problem that the data formats are not uniform and thus the audio is not adapted due to the difference of the system versions, as shown in fig. 3, the method for adapting the audio of the system versions further includes steps S300 to S302, where: step S300, obtaining the data format of the ith batch of audio data; in step S302, if the data format is not the preset data format, the data format of the ith first data segment is converted.
The data format may involve a number of aspects such as size end, number of channels, sampling rate, depth, byte-stream, etc.
(1) Regarding the size end:
big end (bigndian) refers to the high bytes of data stored in the low addresses of memory and the low bytes of data stored in the high addresses of memory.
A small end (littleendian) refers to the high byte of data being stored in the high address of the memory and the low byte of data being stored in the low address of the memory.
The data format includes a size end. If the size end in the data format is different from the size end in the preset data format, the size conversion is needed. As shown in fig. 4, the step S302 may be as follows: s402, if the size end is not the preset size end, replacing the j-th bit data and the j+1-th bit data in the i-th first data segment, wherein j is a positive integer.
(2) Regarding the utilization rate:
the data format includes a sampling rate. If the sampling rate in the data format is different from the preset sampling rate, sampling rate conversion is required. As shown in fig. 4, the step S302 may be as follows: s404, if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment through a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate. The fast sampling rate conversion strategy can meet the real-time performance of the live scene, and the performance of the screen recording system is not affected. It should be noted that the sample rate conversion may use library, and the write array needs to initialize a large enough space.
(3) Regarding the number of channels:
the data format includes a number of channels. If the number of channels in the data format is different from the number of channels in the preset data format, channel number conversion is required. The following example of converting from mono to binaural as shown in fig. 4, the step S302 may be as follows: s404, if the data format is mono and the preset data format is bi-channel, performing channel number conversion on the ith first data segment: copying the kth bit data of the ith first data segment to a k x 2 bit address and a k x 2+2 bit address of a double-channel finger; copying the (k+1) th bit data of the (i) th first data segment to a k+2+1 bit address and a k+2+3 bit address of a dual-channel pointer, wherein k is a positive integer.
The inventors have also found that: while the normalization of the data solves the problem of audio adaptation due to system version differences in one aspect, another problem may be derived, namely, out-of-sync of the sound. The reasons are as follows: taking the first batch of audio data as an example, 440320 bytes of data (i.e., the first data segment A1) are transferred to the subsequent audio processing module during the slicing iteration, and the remaining 680 bytes of data (i.e., the first second data segment A2) are buffered in the audio buffer without being immediately transferred to the subsequent audio processing module. This situation results in: the time stamp of the audio in the live broadcast process is increased continuously, and the audio and the video are not synchronous.
In an exemplary embodiment, to avoid the dyssynchrony of the painting, as shown in fig. 5, the method for adapting to the audio of the multi-system version further includes step S500, determining a timestamp of the audio. Wherein the audio time stamp is equal to the master time stamp minus the temporary data time stamp; the master time stamp is the sum of time stamp increment of the 1 st batch to the i th batch of audio data, and the temporary time stamp is the time stamp increment corresponding to the i second data segment. The temporary data timestamp corresponding to the ith second data segment a 2=timestamp increment×remainder/(1024×byte sfresframe×number of channels), that is, timestamp increment×680/(1024×byte sfresframe×number of channels). It should be appreciated that the timestamp increment is the corresponding timestamp increment for each batch of audio data.
The decoding behaviors of different players are different, the time stamp of the first audio frame can be set as zero point, then the time stamp of the audio is set as zero point, the time stamp of one audio frame is added per iteration, and the time stamp of the picture and the audio is ensured to be synchronous by subtracting the part of time (temporary data time stamp) from less than one audio frame, so that the streaming is ensured not to be blocked.
The inventors have also found that: when the computer device 2 processes more audio data, the problem of the plug flow instability easily occurs. Continuing to take the IOS system as an example, the replayKit of the IOS system has the limitation of the process memory 50M, and when exceeding the memory, the replayKit can cause program crash, so that live push is interrupted. Therefore, in order to realize stable push, as shown in fig. 6, the audio adaptation method for adapting to multiple system versions further includes steps S600 to S602, where: step S600, processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; step S602, placing the tasks into a serial queue to perform an asynchronous processing operation on the ith batch of audio data. That is, when processing the audio data stream, the task is put into the self-managed serial queue to execute asynchronously, so that the processing time is prevented from blocking the thread of the data callback. The advantages of this embodiment are: the memory consumption during operation is reduced to a certain extent, and the plug flow stability is ensured.
In order to further improve stability, as shown in fig. 7, the method for adapting audio of multiple system versions further includes step S604: the priority of the serial queue is raised. The embodiment aims at avoiding task accumulation and further improving the plug flow stability.
Example two
Fig. 8 schematically illustrates a block diagram of a multi-system version-adaptive audio adaptation system according to a second embodiment of the present application, which may be partitioned into one or more program modules, which are stored in a storage medium and executed by one or more processors to complete the embodiments of the present application. Program modules in the embodiments of the present application refer to a series of computer program instruction segments capable of implementing specific functions, and the following description specifically describes the functions of each program module in the embodiment.
As shown in fig. 8, the multi-system version-compliant audio adaptation system 800 may include an acquisition module 810 and a processing module 820, wherein:
the obtaining module 810 is configured to obtain the ith batch of audio data provided by the live recording tool, where i is a positive integer;
a processing module 820, configured to obtain a corresponding first data segment and a corresponding second data segment based at least on the ith batch of audio data: when i=1, splitting the first batch of audio data into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integer multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the rest data segments except the first data segment in the first batch of audio data; transmitting the first data segment to a next audio processing module; temporarily storing the first and second data fragments in an audio buffer; forming an ith first data segment and an ith second data segment based on an ith-1 second audio segment and the ith audio data remaining in the ith-1 batch of audio data when i is equal to or greater than 2, wherein the data amount of the ith first data segment is the maximum integer multiple of slice data amount provided by the total audio data of the ith-1 second data segment and the ith audio data, and the ith second data segment is the remaining data segment except the ith first data segment in the total audio data of the ith-1 second data segment and the ith audio data; transmitting the ith first data segment to the next audio processing module; and temporarily storing the ith second data segment in the audio buffer
In an exemplary embodiment, the audio adaptation system 800 further comprises a timestamp determination module (not shown) for: determining a timestamp of the audio, wherein the timestamp of the audio is equal to the master timestamp minus the temporary data timestamp; the master time stamp is the sum of time stamp increment of the 1 st batch to the i th batch of audio data, and the temporary time stamp is the time stamp increment corresponding to the i second data segment.
In an exemplary embodiment, the audio adaptation system 800 further comprises a task processing module (not shown). The task processing module is used for: processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; and placing the plurality of tasks into a serial queue to perform asynchronous processing operations on the ith batch of audio data.
In an exemplary embodiment, the task processing module is further configured to: the priority of the serial queue is raised.
In an exemplary embodiment, the audio adaptation system 800 further comprises a format conversion module (not shown). The format conversion module is used for: acquiring a data format of the ith batch of audio data; and if the data format is not the preset data format, converting the data format of the ith first data segment.
In an exemplary embodiment, the data format includes a size end; the format conversion module is used for: and if the size end is not the preset size end, replacing the j-th bit data and the j+1-th bit data in the ith first data segment, wherein j is a positive integer.
In an exemplary embodiment, the data format includes a channel number; the format conversion module is used for: if the data format is mono and the preset data format is bi-channel, performing channel number conversion on the ith first data segment: copying the kth bit data of the ith first data segment to a k x 2 bit address and a k x 2+2 bit address of a double-channel finger; copying the (k+1) th bit data of the (i) th first data segment to a k+2+1 bit address and a k+2+3 bit address of a dual-channel pointer, wherein k is a positive integer.
In an exemplary embodiment, the data format includes a sampling rate; the format conversion module is used for: and if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment through a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate.
In an exemplary embodiment, the system live recording tool is a replayKit of the IOS system.
Example III
Fig. 9 schematically shows a hardware architecture diagram of a computer device 2 adapted to implement an audio adaptation method adapted to multiple system versions according to a third embodiment of the present application. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with instructions set or stored in advance. For example, it may be a smart phone, tablet computer, etc. As shown in fig. 9, the computer device 2 includes at least, but is not limited to: memory 910, processor 920, and network interface 930 may be communicatively linked to each other by a system bus. Wherein:
the memory 910 includes at least one type of computer-readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 910 may be an internal storage module of the computer device 2, such as a hard disk or memory of the computer device 2. In other embodiments, the memory 910 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 2. Of course, the memory 910 may also include both internal memory modules of the computer device 2 and external memory devices. In this embodiment, the memory 910 is typically used to store an operating system and various types of application software installed on the computer device 2, such as program codes of an audio adaptation method adapted to multiple system versions, and the like. In addition, the memory 910 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 920 may be a central processing unit (Central Processing Unit, simply CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 920 is typically used to control the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2, and the like. In this embodiment, the processor 920 is configured to execute program codes or process data stored in the memory 910.
The network interface 930 may include a wireless network interface or a wired network interface, which network interface 930 is typically used to establish a communication link between the computer device 2 and other computer devices. For example, the network interface 930 is used to connect the computer device 2 with an external terminal through a network, establish a data transmission channel and a communication link between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, abbreviated as GSM), wideband code division multiple access (Wideband Code Division Multiple Access, abbreviated as WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, etc.
It should be noted that FIG. 9 only shows a computer device having components 910-930, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.
In this embodiment, the audio adaptation method for adapting to multiple system versions stored in the memory 910 may also be divided into one or more program modules and executed by one or more processors (the processor 920 in this embodiment) to complete the embodiments of the present application.
Example IV
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the audio adaptation method of the embodiment adapted to multiple system versions.
In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may also be an external storage device of a computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), etc. that are provided on the computer device. Of course, the computer-readable storage medium may also include both internal storage units of a computer device and external storage devices. In this embodiment, the computer readable storage medium is typically used to store an operating system and various types of application software installed on a computer device, for example, program code for an audio adaptation method adapted to multiple system versions in the embodiment, and the like. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
It should be noted that the foregoing is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent protection of the present application, and all equivalent structures or equivalent processes using the descriptions and the contents of the present application or direct or indirect application to other related technical fields are included in the scope of the patent protection of the present application.

Claims (12)

1. A method of audio adaptation for multiple system versions, the method comprising:
acquiring the ith batch of audio data provided by a live broadcast recording tool of the system, wherein i is a positive integer;
obtaining corresponding first data segments and second data segments based at least on the ith batch of audio data:
when i=1, splitting the first batch of audio data into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integer multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the rest data segments except the first data segment in the first batch of audio data; transmitting the first data segment to a next audio processing module; temporarily storing the first and second data fragments in an audio buffer;
forming an ith first data segment and an ith second data segment based on an ith-1 second audio segment and the ith audio data remaining in the ith-1 batch of audio data when i is equal to or greater than 2, wherein the data amount of the ith first data segment is the maximum integer multiple of slice data amount provided by the total audio data of the ith-1 second data segment and the ith audio data, and the ith second data segment is the remaining data segment except the ith first data segment in the total audio data of the ith-1 second data segment and the ith audio data; transmitting the ith first data segment to the next audio processing module; and temporarily storing the ith second data fragment in the audio buffer;
Wherein the slice data amount is a value that can be immediately mixed or sent to an encoder for encoding.
2. The method for adapting a multi-system version of audio adaptation according to claim 1, further comprising:
determining a timestamp of the audio, wherein the timestamp of the audio is equal to the master timestamp minus the temporary data timestamp; the master time stamp is the sum of time stamp increment of the 1 st to the i th batch of audio data, and the temporary data time stamp is the time stamp increment corresponding to the i second data segment.
3. The method for adapting a multi-system version of audio adaptation according to claim 1, further comprising:
processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; a kind of electronic device with high-pressure air-conditioning system
The plurality of tasks are placed in a serial queue to perform asynchronous processing operations on the ith batch of audio data.
4. The multi-system version-adaptive audio adaptation method of claim 3, further comprising:
the priority of the serial queue is raised.
5. The method for adapting a multi-system version of audio adaptation according to claim 1, further comprising:
Acquiring a data format of the ith batch of audio data; a kind of electronic device with high-pressure air-conditioning system
And if the data format is not the preset data format, converting the data format of the ith first data segment.
6. The method of adapting to a multi-system version of audio according to claim 5, wherein the data format comprises a size end;
if the data format is not the preset data format, converting the data format of the ith first data segment, including:
and if the size end is not the preset size end, replacing the j-th bit data and the j+1-th bit data in the ith first data segment, wherein j is a positive integer.
7. The method for adapting a multi-system version of audio according to claim 5, wherein the data format comprises a number of channels;
if the data format is not the preset data format, converting the data format of the ith first data segment, including:
if the data format is mono and the preset data format is bi-channel, performing channel number conversion on the ith first data segment:
copying the kth bit data of the ith first data segment to a k x 2 bit address and a k x 2+2 bit address of a double-channel finger; copying the (k+1) th bit data of the (i) th first data segment to a k+2+1 bit address and a k+2+3 bit address of a dual-channel pointer, wherein k is a positive integer.
8. The method of adapting a multi-system version of audio adaptation according to claim 5, wherein the data format comprises a sampling rate;
if the data format is not the preset data format, converting the data format of the ith first data segment, including:
and if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment through a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate.
9. The method for adapting to multiple system versions according to any one of claims 1-8, wherein the system live recording tool is a replayKit of an IOS system.
10. An audio adaptation system adapted to a multi-system version, comprising:
the acquisition module is used for acquiring the ith batch of audio data provided by the live broadcast recording tool of the system, wherein i is a positive integer;
the processing module is used for obtaining corresponding first data fragments and second data fragments at least based on the ith batch of audio data:
when i=1, splitting the first batch of audio data into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integer multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the rest data segments except the first data segment in the first batch of audio data; transmitting the first data segment to a next audio processing module; temporarily storing the first and second data fragments in an audio buffer;
Forming an ith first data segment and an ith second data segment based on an ith-1 second audio segment and the ith audio data remaining in the ith-1 batch of audio data when i is equal to or greater than 2, wherein the data amount of the ith first data segment is the maximum integer multiple of slice data amount provided by the total audio data of the ith-1 second data segment and the ith audio data, and the ith second data segment is the remaining data segment except the ith first data segment in the total audio data of the ith-1 second data segment and the ith audio data; transmitting the ith first data segment to the next audio processing module; temporarily storing the ith second data fragment into an audio buffer area;
wherein the slice data amount is a value that can be immediately mixed or sent to an encoder for encoding.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor is adapted to implement the steps of the multi-system version-adapted audio adaptation method of any one of claims 1 to 9 when the computer program is executed by the processor.
12. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program executable by at least one processor to cause the at least one processor to perform the steps of the multi-system version-adapted audio adaptation method of any one of claims 1 to 9.
CN202010911906.4A 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions Active CN112437315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010911906.4A CN112437315B (en) 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010911906.4A CN112437315B (en) 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions

Publications (2)

Publication Number Publication Date
CN112437315A CN112437315A (en) 2021-03-02
CN112437315B true CN112437315B (en) 2023-06-27

Family

ID=74689976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010911906.4A Active CN112437315B (en) 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions

Country Status (1)

Country Link
CN (1) CN112437315B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113923065B (en) * 2021-09-06 2023-11-24 贵阳语玩科技有限公司 Cross-version communication method, system, medium and server based on chat room audio

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415723A (en) * 2019-07-30 2019-11-05 广州酷狗计算机科技有限公司 Method, apparatus, server and the computer readable storage medium of audio parsing

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344887B (en) * 2008-06-06 2011-09-14 网易有道信息技术(北京)有限公司 Audio search method and device
US9338523B2 (en) * 2009-12-21 2016-05-10 Echostar Technologies L.L.C. Audio splitting with codec-enforced frame sizes
CN102810313B (en) * 2011-06-02 2014-01-01 华为终端有限公司 Audio decoding method and device
CN108235052A (en) * 2018-01-09 2018-06-29 安徽小马创意科技股份有限公司 Multi-audio-frequency channel hardware audio mixing, acquisition and the method for broadcasting may be selected based on IOS
CN110335615B (en) * 2019-05-05 2021-11-16 北京字节跳动网络技术有限公司 Audio data processing method and device, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415723A (en) * 2019-07-30 2019-11-05 广州酷狗计算机科技有限公司 Method, apparatus, server and the computer readable storage medium of audio parsing

Also Published As

Publication number Publication date
CN112437315A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN108989885B (en) Video file transcoding system, segmentation method, transcoding method and device
US20130117271A1 (en) Method and Apparatus for Automatically Classifying Application in Mobile Terminal
CN110213614B (en) Method and device for extracting key frame from video file
CN110704202B (en) Multimedia recording data sharing method and terminal equipment
WO2020248375A1 (en) Method and system for synchronizing data between databases, computer device and storage medium
EP3866481A1 (en) Audio/video switching method and apparatus, and computer device and readable storage medium
CN109963167B (en) Audio and video processing method, server, device and storage medium
CN109743757B (en) Data processing method and device, wireless module and Internet of things equipment
EP3905596A1 (en) Internet speed measuring method and device, computer equipment and readable storage medium
CN109151505B (en) Video transcoding method, system, device and computer readable storage medium
CN112437315B (en) Audio adaptation method and system for adapting to multiple system versions
CN111490947A (en) Data packet transmitting method, data packet receiving method, system, device and medium
CN112069195A (en) Database-based message transmission method and device, electronic equipment and storage medium
US20230106217A1 (en) Web-end video playing method and apparatus, and computer device
CN111651338B (en) System and method for acquiring log formatting time
CN111367916B (en) Data storage method and device
CN111061518B (en) Data processing method, system, terminal equipment and storage medium based on drive node
CN112423104A (en) Audio mixing method and system for multi-channel audio in live scene
CN114143486A (en) Video stream synchronization method and device, computer equipment and storage medium
CN112423120A (en) Audio time delay detection method and system
CN111639055B (en) Differential packet calculation method, differential packet calculation device, differential packet calculation equipment and storage medium
CN111797158A (en) Data synchronization system, method and computer-readable storage medium
CN117270902B (en) OTA upgrade package generation method and device, OTA upgrade method and device
CN111770413B (en) Multi-sound-source sound mixing method and device and storage medium
CN112615869B (en) Audio data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant