CN116155876A

CN116155876A - Data processing method and device

Info

Publication number: CN116155876A
Application number: CN202310190680.7A
Authority: CN
Inventors: 李盼盼; 徐浩煜; 曹凯
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-05-23

Abstract

The present disclosure provides a data processing method and apparatus, the method including: in response to the acquisition of the first video frame, confirming a first audio frame corresponding to the first video frame; adding a first real-time transmission protocol packet corresponding to the first audio frame into the first video frame to generate a second video frame; transmitting a video stream comprising a second video frame and an audio stream comprising a corresponding first audio frame; the first real-time transmission protocol packet is used for adjusting the play corresponding relation between the video stream and the audio stream.

Description

Data processing method and device

Technical Field

The disclosure relates to the technical field of real-time communication, and in particular relates to a data processing method and device.

Background

In a real-time audio and video communication scene, the problem that an audio signal and a video signal received by a receiving end are often asynchronous is mainly represented by that a video picture does not correspond to audio, and user experience is affected.

Disclosure of Invention

The present disclosure provides a data processing method and apparatus, so as to at least solve the above technical problems in the prior art.

According to a first aspect of the present disclosure, there is provided a data processing method, applied to a transmitting end, including:

in response to the acquisition of the first video frame, confirming a first audio frame corresponding to the first video frame;

adding a first real-time transmission protocol packet corresponding to the first audio frame into the first video frame to generate a second video frame;

transmitting a video stream comprising a second video frame and an audio stream comprising a corresponding first audio frame;

the first real-time transmission protocol packet is used for adjusting the play corresponding relation between the video stream and the audio stream.

According to a second aspect of the present disclosure, there is provided a data processing method, applied to a receiving end, including:

based on a first real-time transmission protocol packet included in a received second video frame, confirming a corresponding relation between a first video frame included in the second video frame and a first audio frame corresponding to the first video frame;

and adjusting the playing corresponding relation of the video stream comprising the first video frame and the audio stream comprising the first audio frame based on the corresponding relation between the first video frame and the first audio frame.

According to a third aspect of the present disclosure,

the audio confirming unit is used for responding to the collected first video frame and confirming a first audio frame corresponding to the first video frame;

the video frame generation unit is used for adding a first real-time transmission protocol packet corresponding to the first audio frame into the first video frame to generate a second video frame;

a transmitting unit, configured to transmit a video stream including a second video frame and an audio stream including a first audio frame;

According to a fourth aspect of the present disclosure, there is provided a data processing apparatus, applied to a receiving end, including:

a confirmation unit, configured to confirm a correspondence between a first video frame included in a received second video frame and a first audio frame corresponding to the first video frame, based on a first real-time transport protocol packet included in the second video frame;

and the adjusting unit is used for adjusting the playing corresponding relation of the video stream comprising the first video frame and the audio stream comprising the first audio frame based on the corresponding relation between the first video frame and the first audio frame.

In the data processing method, a first audio frame corresponding to a first video frame is confirmed in response to the first video frame being acquired; adding a first real-time transmission protocol packet corresponding to the first audio frame into the first video frame to generate a second video frame; transmitting a video stream comprising a second video frame and an audio stream comprising a corresponding first audio frame; the first real-time transmission protocol packet is used for adjusting the play corresponding relation between the video stream and the audio stream; therefore, the delay adjustment coefficient between the first video frame and the first audio frame can be confirmed based on the first real-time transmission protocol packet, and the play correspondence between the video stream and the audio stream is adjusted based on the delay adjustment coefficient, so that the video picture corresponds to the audio play, and the user experience is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 illustrates an alternative flow diagram of a data processing method provided by an embodiment of the present disclosure;

FIG. 2 shows another alternative flow diagram of a data processing method provided by an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of yet another alternative flow of a data processing method provided by an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of adding RTP packets in a video frame provided by an embodiment of the present disclosure;

FIG. 5 shows a further alternative flow diagram of a data processing method provided by an embodiment of the present disclosure;

FIG. 6 illustrates an alternative architecture diagram of a data processing apparatus provided by an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of another alternative configuration of a data processing apparatus provided by an embodiment of the present disclosure;

fig. 8 shows a schematic diagram of a composition structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, features and advantages of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure will be clearly described in conjunction with the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

In the related art, the main stream audio and video synchronization scheme is an audio and video synchronization algorithm based on Web Real-time communication (Web Real-Time Communications, webRTC): the receiving end calculates the corresponding relation between the RTP timestamp of a certain received code stream and the network time protocol (Network Time Protocol, NTP) timestamp according to the Real-time transport protocol (Real-time Transport Protocol, RTP) timestamp and the corresponding NTP timestamp contained in a Sender Report message (Sender Report, SR) (an RTCP message), and further unifies the timestamps of the audio stream and the video stream under the same time reference; calculating the relative delay of the audio and video stream at the current moment, and adjusting the target delay of the audio and video in a certain amplitude range; and periodically updating the flow, gradually obtaining the optimal target delay, and applying the optimal target delay to the corresponding audio/video playing and rendering time to achieve the purpose of smoothness and synchronization.

However, the above scheme relies on SR packets periodically received by the audio/video receiving end, and performs estimation of the correspondence between RTP timestamps and NTP timestamps. The SR transmission period is generally long, 5 to 10 seconds, and the SR transmission period of audio and video is generally different, which reduces the accuracy of estimation to some extent. The RTP control protocol (RTCP) is generally sent without a retransmission mechanism, which results in that when the SR loses packets, the data update of the synchronization algorithm is also affected, and the speed and accuracy of synchronization are reduced. The adjustment of the target delay is based on smooth filtering of the average value of the audio-video relative delay difference values, which has the problem of slow synchronization when the synchronization problem is serious, and cannot reach the optimum as soon as possible within the acceptable adjustment amplitude range.

Based on the defects existing in the related art, the present disclosure provides a data transmission method, in the process of sending video frames, periodically adding audio frames at the same acquisition time (or other times requiring synchronization) into an RTP extension header of a video stream; the receiving end analyzes the audio packet (RTP packet) in the video RTP extension header, periodically adjusts the target delay of the audio-video stream by taking the video frame where the RTP packet is positioned and the analyzed audio frame as a synchronous reference, gradually achieves the moment of playing the audio-video by the receiving end and the synchronous moment of the two bar code streams at the transmitting end as close as possible, and achieves smoothness and synchronization.

Fig. 1 shows an alternative flowchart of a data processing method according to an embodiment of the present disclosure, and will be described according to the steps.

Step S101, in response to the first video frame being acquired, confirming a first audio frame corresponding to the first video frame.

In some embodiments, the sending end and the receiving end confirm, in a handshake phase, information of an RTP extension header corresponding to an audio frame transmitted in a video frame, that is, information of a first real-time transmission protocol extension header, where the information may include an identification number (ID) and a Uniform Resource Identifier (URI); the information of the first RTP extension header is used for receiving the RTP extension header that is partially carrying the RTP packets of the audio frames and the extension header of the RTP packets that are not carrying the audio frames.

In some embodiments, the first audio frame corresponding to the first video frame includes audio corresponding to a video frame of the first video frame. Optionally, if the video frame is mute, the video frame does not have a corresponding audio frame. Optionally, the audio frame corresponding to the video frame may be confirmed based on the sampling time stamp of the video frame and the audio frame at the transmitting end, for example, the audio frame with the interval between the sampling time stamp and the sampling time stamp of the first video frame smaller than the first threshold is confirmed to be the audio frame corresponding to the first video frame.

Step S102, adding a first real-time transmission protocol packet corresponding to the first audio frame to the first video frame, and generating a second video frame.

In some embodiments, the sending end may periodically add RTP packets of corresponding audio frames to each video frame included in the video stream, so that the receiving end may periodically adjust the play correspondence between the video stream and the audio stream based on the RTP packets of the audio frames that are periodically added.

The play correspondence may be a play speed, a play delay, or a play time stamp between the video stream and a corresponding audio stream, or a play speed, a play delay, or a play time stamp between a video frame included in the video stream and an audio frame corresponding to the video frame.

In some embodiments, the first video frame includes a first real-time transport protocol extension header having contents of a first real-time transport protocol packet of the first audio frame. The first real-time transmission protocol extension header and the second real-time transmission protocol packet included in the first video frame are both in the first video frame.

Step S103, transmitting a video stream including the second video frame and an audio stream including the first audio frame.

In some embodiments, the video stream includes at least one video frame, where the at least one video frame includes an original video frame and further includes a video frame inserted after the RTP packet; alternatively, if each video frame of the video stream has a corresponding audio frame, the video frames after inserting the RTP packets may appear in a periodic form.

Therefore, according to the data processing method provided by the embodiment of the disclosure, the RTP packet of the corresponding audio frame is inserted into the video frame included in the video stream, so that after the receiving end receives the video stream and the audio stream, the play corresponding relation between the corresponding video frame and the audio frame can be adjusted based on the RTP packet, and further the play corresponding relation between the video stream and the audio stream is adjusted, so that the play delay of the video stream and the audio stream is smaller than a certain value (or synchronous), and the user experience is improved.

Fig. 2 shows another alternative flow diagram of a data processing method according to an embodiment of the disclosure, which will be described according to the steps.

Step S201, based on a first real-time transmission protocol packet included in a received second video frame, confirms a correspondence between a first video frame included in the second video frame and a first audio frame corresponding to the first video frame.

In some embodiments, the sending end and the receiving end confirm the information of the RTP extension header corresponding to the audio frame transmitted in the video frame in the handshake phase, that is, the information of the first real-time transmission protocol extension header, where the information may include an identification number and a uniform resource identifier; the information of the first RTP extension header is used for receiving the RTP extension header that is partially carrying the RTP packets of the audio frames and the extension header of the RTP packets that are not carrying the audio frames.

In some embodiments, the correspondence between the first video frame and the first audio frame may include a delay adjustment coefficient between the first video frame and the first audio frame, where the delay adjustment coefficient may be determined based on a first RTP extension header included in the first video frame received by the receiving end, a second RTP packet, a sampling timestamp of the first video frame received by the receiving end, and a sampling timestamp of the first audio frame received by the receiving end.

Step S202, based on the correspondence between the first video frame and the first audio frame, adjusting a play correspondence between a video stream including the first video frame and an audio stream including the first audio frame.

In some embodiments, when a receiving end receives a video frame of an RTP packet carrying an audio frame, a synchronous time reference of the video stream and the audio stream is established, that is, a correspondence between RTP timestamps of the video stream and the audio stream and NTP timestamps; and based on the corresponding relation, adjusting the playing corresponding relation of the video stream comprising the first video frame and the audio stream comprising the first audio frame. And the corresponding relation between the RTP time stamp and the NTP time stamp of the video stream and the audio stream, namely the corresponding relation between the first video frame and the first audio frame.

In some embodiments, the receiving end may adjust a play correspondence of a video stream including the first video frame and an audio stream including the first audio frame based on a first RTP extension header, a second RTP packet included in the first video frame received by the receiving end, a sampling timestamp of the first video frame received by the receiving end, and a sampling timestamp of the first audio frame received by the receiving end.

For example, after confirming a playing reference (i.e. a uniform playing time stamp) based on the first RTP extension header and the second RTP packet, the playing time stamp of the first video frame is X, the playing time stamp of the first audio frame is Y, which are very different, and the user experience is affected; the receiving end may adjust X and/or Y based on the correspondence, for example, advance or retard Y; or, adjusting the playing speed of the first audio frame, for example, playing the first audio frame at 2 times speed or playing the first audio frame at 0.5 times speed, so that the subsequent video frame and the audio frame can correspond; and then or playing the first audio frame or the first video frame with delay, so that the first audio frame or the first video frame and the first video frame are played within a certain threshold value.

Fig. 3 shows a schematic flowchart of still another alternative data processing method according to an embodiment of the present disclosure, and will be described according to the steps.

In step S301, the information of the RTP extension header corresponding to the transmission audio frame in the video frame is confirmed.

Step S302, in response to the first video frame being acquired, a first audio frame corresponding to the first video frame is confirmed.

In some embodiments, the transmitting end confirms that the interval between the sampling time stamp and the sampling time stamp of the first video frame is smaller than the first threshold value, and the audio frame is the first audio frame corresponding to the first video frame; the first threshold value can be confirmed according to actual requirements or experimental results.

In some optional embodiments, if there is no audio frame whose interval between the sampling time stamp and the sampling time stamp of the first video frame is smaller than the first threshold, confirming a third audio frame corresponding to the third video frame at the interval first period; wherein the first video frame and the third video frame belong to the same video stream.

In practical applications, there may be a mute period, i.e. the video frame does not have a corresponding audio frame, or the audio frame is not continuously acquired and transmitted, in this case, no subsequent operation is performed, and the next execution period is waited for, i.e. the third audio frame corresponding to the third video frame is confirmed at intervals of the first period.

Next, assuming that the first video frame has a corresponding first audio frame, the transmitting end performs the following operations:

step S303, adding a first real-time transmission protocol packet corresponding to the first audio frame to the first video frame, and generating a second video frame.

In some embodiments, the transmitting end may add a first real-time transport protocol packet corresponding to the first real-time transport protocol extension header in the first audio frame to the first real-time transport protocol extension header corresponding to the first video frame, so as to generate a second video frame; the interval between the sampling time stamps of the first video frame and the first audio frame is smaller than a second threshold, and the first real-time transmission protocol extension header and the first real-time transmission protocol packet are used for adjusting the play corresponding relation of the first video frame and the first audio frame. The second threshold value can be determined according to actual requirements and experimental results.

In some optional embodiments, the first RTP extension header is carried by a second RTP packet (which may be a first RTP packet, a last RTP packet, or any one RTP packet corresponding to the first video frame) in the first video frame, and the second RTP packet is transmitted to the receiving end together.

In some embodiments, the first video frame includes a first real-time transport protocol extension header having contents of a first real-time transport protocol packet of the first audio frame.

Step S304, transmitting a video stream including the second video frame and an audio stream including the first audio frame.

In step S305, a fourth audio frame corresponding to the fourth video frame is periodically confirmed.

In some embodiments, the sending end is spaced by a first period, and in response to acquiring a fourth video frame, a fourth audio frame corresponding to the fourth video frame is confirmed, a third real-time transmission protocol packet corresponding to the fourth audio frame is added in the fourth video frame, and a fifth video frame is generated; transmitting a video stream comprising the fifth video frame and an audio stream comprising a fourth audio frame; and at least one video frame is included between the first video frame and the third video frame, and the first video frame and the third video frame belong to the same video stream.

Fig. 4 shows a schematic diagram of adding RTP packets in a video frame provided by an embodiment of the present disclosure.

As shown in fig. 4, the sending end periodically (in an audio-video binding period) adds PTP packets corresponding to audio frames in the RTP extension header of the video frame, and makes other PTP packets in the video carry the RTP extension header for transmission, and adds PTP packets corresponding to audio frames in the RTP extension header of the video frame N after adding PTP packets corresponding to audio frames in the RTP extension header of the video frame 1, so as to realize periodic addition.

Thus, according to the data processing method provided by the embodiment of the disclosure, sampling time stamps of the audio and video are determined at the source of audio and video generation, namely the transmitting end, and audio frames and video frames at the same acquisition time (the acquisition time stamp is close to or other time needing to be synchronized) are bound together and transmitted in a mode of adding the RTP packet to the RTP extension head. Thus, the receiving end can more accurately determine the synchronous reference of the audio and video stream; the method does not depend on the SR packet corresponding to the audio/video stream, and is more accurate than a method for estimating the corresponding relation between the RTP timestamp and the NTP timestamp by using the SR; since the RTP packets of the audio frame are usually smaller, the RTP extension header is periodically added to a certain RTP packet of the video frame, so that the occupation of network bandwidth is not large. The audio packets that are additionally appended to the video frames increase the anti-packet ability of the audio to some extent compared to the original audio stream.

Fig. 5 shows a schematic flowchart of still another alternative data processing method according to an embodiment of the present disclosure, and will be described according to the steps.

In step S401, the information of the PTP extension header corresponding to the transmission audio frame in the video frame is confirmed.

In some alternative embodiments, the receiving end may confirm the first PTP extension header based on an identification number and a uniform resource identifier corresponding to the first RTP extension header.

Step S402, based on a first real-time transmission protocol packet included in the received second video frame, confirms a correspondence between a first video frame included in the second video frame and a first audio frame corresponding to the first video frame.

In some embodiments, the receiving end may acquire the correspondence between the first video frame and the first audio frame based on the first real-time transport protocol extension header and a second real-time transport protocol packet included in the second video frame.

In some embodiments, the correspondence between the first video frame and the first audio frame may include a delay adjustment coefficient between the first video frame and the first audio frame, where the delay adjustment coefficient may be determined based on a sampling timestamp of the receiving end receiving the first video frame and a sampling timestamp of the receiving end receiving the first audio frame. Wherein, the interval between the sampling time stamps of the first video frame and the first audio frame at the transmitting end is smaller than a second threshold value.

Step S403, based on the correspondence between the first video frame and the first audio frame, adjusts the play correspondence between the video stream including the first video frame and the audio stream including the first audio frame.

Step S404, periodically based on the third real-time transmission protocol packet included in the received fifth video frame, confirming the correspondence between the fourth video frame included in the fifth video frame and the fourth audio frame corresponding to the fourth video frame.

In some embodiments, the receiving end confirms, at intervals of a certain period (considering transmission influence, the period may be a first period or a third period similar to the first period), a fourth video frame included in the fifth video frame and a correspondence between fourth audio frames corresponding to the fourth video frame based on a third real-time transmission protocol packet and a fourth real-time transmission protocol packet included in the fifth video frame (a first RTP packet, a last RTP packet or any RTP packet in the fourth video frame); and based on the corresponding relation, adjusting the playing corresponding relation between the video stream comprising the fourth video frame and the audio stream comprising the fourth audio frame.

Therefore, the play corresponding relation between the video stream and the audio stream is adjusted according to the corresponding relation between the video frame and the audio frame included in the video stream and the audio stream periodically, compared with the adjustment of the play corresponding relation only once, the situation that the video frame and the audio frame cannot be synchronized again due to the fact that the difference between the corresponding relation between the subsequent video frame and the audio frame and the corresponding relation between the previous video frame and the previous audio frame is too large in the play process can be avoided, and user experience is improved.

Therefore, according to the data processing method provided by the embodiment of the disclosure, the RTP extension header of the corresponding audio frame is inserted into the video frame included in the video stream, so that after the receiving end receives the video stream and the audio stream, the play corresponding relation between the response video frame and the audio frame can be adjusted based on the RTP extension header, and further the play corresponding relation between the video stream and the audio stream is adjusted, so that the play delay of the video stream and the audio stream is smaller than a certain value (or synchronous), and the user experience is improved.

Fig. 6 is a schematic diagram showing an alternative configuration of a data processing apparatus according to an embodiment of the present disclosure, which will be described in terms of the respective parts.

In some embodiments, the data processing apparatus 600 is applied to a transmitting end, and includes an audio confirmation unit 601, a video frame generation unit 602, and a transmitting unit 603.

The audio confirmation unit 601 is configured to, in response to acquiring the first video frame, confirm a first audio frame corresponding to the first video frame;

the video frame generating unit 602 is configured to add a first real-time transmission protocol packet corresponding to the first audio frame to the first video frame to generate a second video frame;

the sending unit 603 is configured to send a video stream including a second video frame, and an audio stream including a first audio frame;

In some embodiments, the identifying the first audio frame corresponding to the first video frame includes at least one of:

and confirming the audio frame with the interval between the sampling time stamp and the sampling time stamp of the first video frame smaller than a first threshold value as the audio frame corresponding to the first video frame.

The audio confirmation unit 601 is further configured to confirm, if there is no audio frame with a sampling time stamp smaller than a first threshold value, a third audio frame corresponding to a third video frame at a first period;

wherein the first video frame and the third video frame belong to the same video stream.

The video frame generating unit 602 is specifically configured to add a first real-time transport protocol packet corresponding to the first real-time transport protocol extension header in the first audio frame to the first real-time transport protocol extension header corresponding to the first video frame, so as to generate a second video frame;

the interval between the sampling time stamps of the first video frame and the first audio frame is smaller than a second threshold, and the first real-time transmission protocol extension header and the first real-time transmission protocol packet are used for adjusting the play corresponding relation of the first video frame and the first audio frame.

Fig. 7 is a schematic diagram showing another alternative configuration of a data processing apparatus provided in an embodiment of the present disclosure, which will be described in terms of the respective parts.

In some embodiments, the data processing apparatus 700 is applied to a receiving end, and includes a confirmation unit 701 and an adjustment unit 702.

The confirmation unit 701 is configured to confirm, based on a first real-time transport protocol packet included in a received second video frame, a correspondence between a first video frame included in the second video frame and a first audio frame corresponding to the first video frame;

the adjusting unit 702 is configured to confirm the second video frame based on the identification number and the uniform resource identifier corresponding to the first real-time transport protocol packet.

The confirmation unit 701 is further configured to obtain a delay adjustment coefficient between the first video frame and the first audio frame based on the first real-time transport protocol packet and a second real-time transport protocol packet included in the second video frame.

The confirmation unit 701 is specifically configured to obtain a delay adjustment coefficient between the first video frame and the first audio frame based on the first real-time transmission protocol packet and a first real-time transmission protocol extension header included in the second video frame.

The adjusting unit 702 is specifically configured to

Acquiring a delay adjustment coefficient between a first video frame and a first audio frame based on a timestamp of a first real-time transmission protocol extension head corresponding to the received first video frame and a timestamp of a received first real-time transmission protocol packet;

wherein, the interval between the sampling time stamps of the first video frame and the first audio frame at the transmitting end is smaller than a second threshold value.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in electronic device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it is intended to cover the scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A data processing method applied to a transmitting end, the method comprising:

2. The method of claim 1, wherein the confirming the first audio frame corresponding to the first video frame comprises at least one of:

3. The method of claim 2, the method further comprising:

if no audio frame with the interval between the sampling time stamp and the sampling time stamp of the first video frame being smaller than a first threshold value exists, confirming a third audio frame corresponding to the third video frame at the first interval period;

4. The method of claim 1, wherein adding the first real-time transport protocol packet corresponding to the first audio frame to the first video frame generates a second video frame, and comprises:

adding a first real-time transmission protocol packet corresponding to the first real-time transmission protocol extension header in the first audio frame into the first real-time transmission protocol extension header corresponding to the first video frame, and generating a second video frame;

the interval between the sampling time stamps of the first video frame and the first audio frame is smaller than a second threshold, and the first real-time transmission protocol packet and a second real-time transmission protocol packet corresponding to the first video frame are used for adjusting the play corresponding relation of the first video frame and the first audio frame.

5. A data processing method applied to a receiving end, the method comprising:

6. The method of claim 5, the method further comprising:

and confirming the second video frame based on the identification number and the uniform resource identifier corresponding to the first real-time transmission protocol packet.

7. The method of claim 5, wherein the validating the correspondence between the first video frame included in the second video frame and the first audio frame corresponding to the first video frame based on the first real-time transport protocol packet included in the received second video frame comprises:

and acquiring a delay adjustment coefficient between the first video frame and the first audio frame based on the first real-time transmission protocol packet and a second real-time transmission protocol packet included in the second video frame.

8. The method of claim 7, the obtaining a delay adjustment coefficient between the first video frame and the first audio frame based on the first real-time transport protocol extension header included in the first real-time transport protocol packet and the second video frame, comprising:

9. A data processing apparatus for use at a transmitting end, the apparatus comprising:

10. A data processing apparatus for use at a receiving end, the apparatus comprising: