WO2022111712A1 - Audio and video synchronization method and device - Google Patents

Audio and video synchronization method and device Download PDF

Info

Publication number
WO2022111712A1
WO2022111712A1 PCT/CN2021/134168 CN2021134168W WO2022111712A1 WO 2022111712 A1 WO2022111712 A1 WO 2022111712A1 CN 2021134168 W CN2021134168 W CN 2021134168W WO 2022111712 A1 WO2022111712 A1 WO 2022111712A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
audio
mobile phone
audio data
video
Prior art date
Application number
PCT/CN2021/134168
Other languages
French (fr)
Chinese (zh)
Inventor
张志军
张栋浩
王磊
张硕
王皓
方卫庆
杨峻峰
葛峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022111712A1 publication Critical patent/WO2022111712A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J3/00Time-division multiplex systems
    • H04J3/02Details
    • H04J3/06Synchronising arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J3/00Time-division multiplex systems
    • H04J3/02Details
    • H04J3/06Synchronising arrangements
    • H04J3/0635Clock or time synchronisation in a network
    • H04J3/0638Clock or time synchronisation among nodes; Internode synchronisation
    • H04J3/0658Clock or time synchronisation among packet nodes
    • H04J3/0661Clock or time synchronisation among packet nodes using timestamps
    • H04J3/0667Bidirectional timestamps, e.g. NTP or PTP for compensation of clock drift and for compensation of propagation delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72409User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by interfacing with external accessories
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Definitions

  • the embodiments of the present application relate to the field of electronic technologies, and in particular, to a method and device for synchronizing audio and video.
  • the electronic device In the process of recording a short video file by an electronic device such as a mobile phone, the electronic device may not be able to clearly and completely record the audio on the subject side due to the long distance between the electronic device and the subject, resulting in the final short video file.
  • the recording effect is poor.
  • one electronic device can be used to record the video on the side of the photographed object
  • another electronic device near the side of the photographed object can be used to record the audio on the side of the photographed object
  • the audio and video from different electronic devices are synchronously mixed. record.
  • the audio and video may become out of sync.
  • the embodiment of the present application provides a method and device for synchronizing audio and video.
  • the first electronic device can calculate and count the processing delay and transmission delay of each frame of audio data based on the second electronic device terminal, and the arrival time of each frame of audio data.
  • the time of the first electronic device is based on the clock of the first electronic device, and the generation time of each frame of audio data is calculated, thereby realizing synchronous mixed recording of the video from the first electronic device and the audio from the second electronic device.
  • an embodiment of the present application provides a method for synchronizing audio and video, which is applied in an audio and video recording scene of a first electronic device.
  • the method includes: in response to a first user operation, a first electronic device initiates an audio call to a second electronic device; in response to a second user operation, collecting video data; and acquiring audio data from the second electronic device, the audio data transmitting through a plurality of audio data packets; and acquiring delay data corresponding to each audio data packet in the plurality of audio data packets from the second electronic device.
  • the time delay data is used to represent the time delay between the time when each audio data packet in the plurality of audio data packets is generated and the time when the first electronic device obtains each audio data packet in the plurality of audio data packets;
  • the audio data and the video data are synchronized based on the time delay data.
  • the first electronic device can obtain the audio data from the second electronic device, and the delay data corresponding to each audio data packet in the multiple audio data packets for transmitting the audio data, and then use the audio data of the first electronic device.
  • the video data collected by the first electronic device is synchronized with the audio data collected by the second electronic device according to the time delay data, so as to obtain a video file with audio and video synchronization.
  • the delay data includes first delay data and second delay data.
  • the first delay data is the processing delay of the process of processing the audio data by the second electronic device
  • the second delay data is the transmission delay of the audio data from the second electronic device to the first electronic device.
  • the processing includes encoding processing and buffering processing of audio data.
  • the processing also includes assembly processing of the audio data.
  • the process of processing the audio data by the second electronic device generally includes assembly processing, encoding processing and buffering processing. Since the processing delay of the assembly processing is usually within an acceptable delay deviation range, in some embodiments, the assembly process can be ignored. Processing latency for processing.
  • each audio data packet in the multiple audio data packets also transmits delay data corresponding to each audio data packet.
  • audio data and delay data can be transmitted through the same audio data packet.
  • each audio data packet is a real-time transport protocol RTP message.
  • the delay data is located in the extended header field of the RTP packet header.
  • the delay data may be transmitted together with the audio data through the extension header field of the RTP header.
  • each audio data packet is an RTP packet.
  • the delay data is located in the timestamp field of the RTP packet header.
  • the delay data may be transmitted together with the audio data through the timestamp field of the RTP header.
  • the delay data is transmitted in a control signaling format.
  • the delay data may be transmitted in a control signaling format, that is, the delay data may be transmitted separately from the audio data.
  • audio data is still transmitted from the media channel through multiple audio data packets, and delayed data is transmitted from the control signaling channel in a control signaling format.
  • the delay data further includes sequence number information of each audio data packet.
  • the delay data includes the average value of the delay data corresponding to the N audio data packets respectively, And the start sequence number information and the end sequence number information of the N audio data packets.
  • synchronizing the audio data and the video data based on the delay data includes: calculating the time delay of each audio data packet based on the delay data and the moment when the first electronic device obtains each audio data packet. Generating time; according to the generating time of each audio data packet, synchronize the audio data and the video data to obtain a synchronized video file.
  • the video recording application displays a preset icon or a list of contact objects.
  • initiating an audio call to the second electronic device includes: in response to the user's operation on the preset icon, displaying A list of contact objects; in response to a user's selection operation on the second electronic device in the list of contact objects, an audio call is initiated to the second electronic device.
  • initiating an audio call to the second electronic device includes: responding to the user's selection of the first contact object in the list of contact objects.
  • the selection operation of the second electronic device initiates an audio call to the second electronic device.
  • the first electronic device may initiate an audio call to the second electronic device in response to the user's operation on the preset icon or the list of contact objects.
  • the information of the contact object comes from the address book of the first electronic device.
  • a Bluetooth connection is established between the electronic device corresponding to the contact object and the first electronic device, or the electronic device corresponding to the contact object and the first electronic device are in the same Wi-Fi local area network.
  • the first electronic device stores information of a historical contact object
  • the electronic device corresponding to the historical contact object includes an electronic device that has established an audio call with the first electronic device before.
  • the first electronic device can store the information of the electronic device with which the audio pass has been established before, so as to facilitate initiating an audio call next time.
  • an embodiment of the present application provides a first electronic device, where the first electronic device includes: one or more processors; a memory; and one or more computer programs. wherein one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the first electronic device, cause the first electronic device to perform the method of any possible design of the first aspect .
  • an embodiment of the present application provides a system for synchronizing audio and video.
  • the system includes a first electronic device and a second electronic device, and is applied in the audio and video recording scene of the first electronic device.
  • the first electronic device initiates an audio call to the second electronic device in response to the operation of the first user;
  • the second electronic device accepts the audio call;
  • the second electronic device collects audio data, and the audio data is transmitted through multiple audio data packets; and obtains the delay data corresponding to each audio data packet in the plurality of audio data packets, and transmits the audio data and the delay data to the first electronic device;
  • the first electronic device collects the video data in response to the operation of the second user, and obtains the audio data and time delay data from the second electronic device;
  • the first electronic device synchronizes the audio data and the video data based on the time delay data in response to the third user operation.
  • the first electronic device responds to the operation of the first user and sends the second electronic device to the second electronic device.
  • the device initiates an audio call.
  • the first electronic device may not be able to clearly and completely record the audio on the side of the second electronic device.
  • the device may initiate an audio call to the second electronic device, causing the second electronic device to collect audio data.
  • the second electronic device when the first electronic device collects video data, the second electronic device continuously displays a reminder message.
  • the reminder message is used to remind the user that the first electronic device is collecting video data.
  • embodiments of the present application provide a computer-readable storage medium, including computer instructions, which, when the computer instructions are executed on a computer, cause the computer to execute the method in any possible design of the first aspect.
  • the embodiments of the present application provide a computer program product.
  • the computer program product when run on a computer, causes the computer to perform the method of any possible design of the first aspect.
  • Fig. 1 is a kind of shooting scene diagram provided by the embodiment of this application.
  • FIG. 2 is a hardware structure diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a method for synchronizing audio and video according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of a preview interface of a video recording mode provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an interface provided by an embodiment of the present application.
  • FIG. 6 is another interface schematic diagram provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an audio data processing process provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a message structure provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
  • a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • plural means two or more.
  • the video recorded by one electronic device and the audio recorded by another electronic device can be periodically performed with a network time protocol (network time protocol, NTP) clock server. Synchronization, enabling simultaneous mixing and recording of video and audio from different electronic devices.
  • NTP network time protocol
  • the above scheme relies on the NTP time server to synchronize audio and video, and the WAN timing accuracy error usually reaches 50ms-500ms. Because the timing accuracy error of the WAN may exceed the delay deviation requirement of audio and video synchronization, relying on the NTP time server to perform synchronous mixed recording of audio and video may lead to a large delay deviation, and the phenomenon of audio and video synchronization still occurs.
  • An embodiment of the present application provides a method for synchronizing audio and video.
  • the first electronic device can calculate and count delay data (for example, processing delay and transmission delay) of each frame of audio data based on the second electronic device, and When each frame of audio data arrives at the first electronic device, the generation time of each frame of audio data is calculated based on the clock of the first electronic device, thereby realizing synchronous mixed recording of the video from the first electronic device and the second electronic device. 's audio.
  • the shooting scene 10 may include a first electronic device 11 and a second electronic device 12 and the like.
  • the first electronic device 11 is on the side of the photographer
  • the second electronic device 12 is on the side of the object to be photographed
  • the distance between the photographer and the object to be photographed is relatively far.
  • the distance between the photographer and the object to be photographed is greater than the effective sound pickup distance of a sound pickup module such as a microphone of the local device 11 on the photographer's side.
  • a video on the side of the photographed object may be recorded by the first electronic device 11 .
  • the second electronic device 12 near the photographed object may record the audio on the side of the photographed object, and transmit the recorded audio to the first electronic device 11 .
  • the second electronic device 12 may transmit audio to the first electronic device 11 based on any one of multiple communication protocols.
  • the communication protocol may include server relay transmission, P2P transmission, Bluetooth, zigbee, Wi-Fi transmission, etc. That is to say, an audio call can be established between the second electronic device 12 and the first electronic device 11 based on any one of the above communication protocols.
  • the first electronic device 11 can calculate the processing delay and transmission delay of each frame of audio data based on the calculation and statistics in the second electronic device 12 , and the moment when each frame of audio data reaches the first electronic device 11 .
  • the clock is used as a reference to calculate the generation time of each frame of audio data.
  • the synchronous mixed recording of audio and video from different electronic devices can be converted into a synchronous mixed recording of audio and video of the same device.
  • the video recorded by one electronic device 11 and the audio recorded by the second electronic device 12 are simultaneously mixed and recorded.
  • the first electronic device 11 usually enters the state of remote audio collection when the distance between the photographer and the object to be photographed is greater than the effective sound-receiving distance of the mobile phone. It can be understood that, when the distance between the photographer and the subject is small, the first electronic device 11 can record video and audio normally, or can choose to enter the state of remote audio collection, respectively, by the first electronic device 11. Video is collected, and audio is collected by the second electronic device 12, which is not limited in this embodiment of the present application.
  • the first electronic device and the second electronic device may be a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, a super mobile device, respectively.
  • Any of electronic devices such as a personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA).
  • FIG. 2 shows a schematic structural diagram of the foregoing electronic device, and the electronic device may be the foregoing first electronic device or the foregoing second electronic device.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (WiFi) networks), bluetooth (BT), global navigation satellite systems ( global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite systems
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi satellite system) -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the electronic device is a device with different roles, such as the first electronic device or the second electronic device
  • different components may also be included.
  • the second electronic device 12 since the second electronic device 12 is used for recording audio, the second electronic device 12 includes the audio module 170, but may not include the camera 193; while the first electronic device 11 is used for recording video, so the local device Camera 193 is included.
  • the first electronic device 11 can record the video on the side of the photographed object through the camera 193
  • the second electronic device 12 can record the side of the photographed object through the microphone 170C in the audio module 170 and transmits the recorded audio to the first electronic device 11.
  • the first electronic device 11 uses its own clock as a reference to calculate the generation time of the audio recorded by the second electronic device 12, and uses the processor 110 to convert the recorded audio. Video and received audio are recorded simultaneously and mixed.
  • the method for synchronizing audio and video provided by the embodiment of the present application will be described below by taking as an example that the first electronic device and the second electronic device are both mobile phones and the first electronic device uses a video recording application to shoot video in the shooting scene 10 .
  • the method may include:
  • the first mobile phone opens a video recording application, and enters a preview interface, where the preview interface includes a remote connection icon.
  • the first mobile phone may open the video recording application in response to the user's operation of opening the video recording application, and then enter a preview interface of a certain recording mode.
  • a certain recording mode is any one of the recording modes such as "normal recording”, “delayed recording” or “live streaming”.
  • Video recording applications can be or other apps with video recording capability. This application does not limit this.
  • the first mobile phone can open the video recording application in response to the user's click operation on the icon of the video recording application, and the first mobile phone can also open the video recording application in response to the user's voice input or gesture input.
  • the application does not limit the way of opening the video recording application on the first mobile phone.
  • the first mobile phone can enter the preview interface of a certain shooting mode in multiple ways.
  • the first mobile phone after opening the video recording application, the first mobile phone directly enters a preview interface of a certain recording mode.
  • other interfaces other than the preview interface are displayed, such as a video browsing interface or a video selection interface, etc., and the first mobile phone enters a certain video recording mode after detecting that the user has entered the video recording mode. preview interface.
  • the operation of the user entering the recording mode may include clicking on a specific control, voice input or gesture input, etc., which is not limited in this application.
  • FIG. 4 is a schematic diagram of a preview interface after the first mobile phone enters a certain recording mode.
  • the preview interface 401 includes a viewfinder window 402 and a remote connection icon 403 .
  • what is displayed in the viewfinder window 402 is a preview image of the photographed object.
  • the remote connection icon 403 indicates that the first handset has the ability to acquire remote audio.
  • the video recording application can call the system interface to determine whether the first mobile phone has the ability to acquire remote audio. If it is determined that the first mobile phone has the capability, the first mobile phone displays a remote connection icon 403 in the preview interface.
  • the preview interface displays the remote connection icon 403, it means that the video recording application integrates an audio and video call module between devices.
  • the audio and video call module between devices can be Huawei's CaasKit or SoundNet's RTC SDK, etc. .
  • the second mobile phone on the subject side can transmit the recorded audio to the first mobile phone. That is, the remote connection icon 403 indicates that the first cell phone can establish an audio call with the second cell phone.
  • the remote connection icon may also be called an "audio/video call icon” or a "specific icon” or other names, and the embodiment of the present application does not limit the name of the icon.
  • the first mobile phone can display the above-mentioned remote connection icon in the preview interface of any recording mode.
  • the first mobile phone displays the above-mentioned remote connection icon only in a specific recording mode that needs to acquire remote audio.
  • the video recording application can also be set to a "remote video" mode.
  • the user enters the remote video mode when the first mobile phone is far away from the object to be photographed. At this time, the first mobile phone may need to obtain remote audio. Therefore, the first mobile phone displays a remote connection icon on the preview interface when it enters the remote video recording, a specific recording mode that may need to acquire remote audio.
  • the first mobile phone may also not display the above-mentioned remote connection icon.
  • a selection window may be directly displayed, so that the user can view other electronic devices capable of establishing an audio call with the first mobile phone.
  • a message that the first mobile phone can acquire the remote audio can be displayed on the screen to notify the user that the first mobile phone can acquire the remote audio.
  • the first mobile phone initiates an audio call to the second mobile phone in response to the user's triggering operation on the remote connection icon.
  • the first mobile phone displays a connection icon in the preview interface of the video recording application, it means that the first mobile phone can obtain remote audio.
  • the first mobile phone detects the user's triggering operation on the connection icon, it can display (for example, display in a pop-up selection window) contact information or electronic device information that can conduct an audio call with the first mobile phone, and the first mobile phone can respond to the user
  • the selection of the second mobile phone on the side of the photographed object further initiates an audio call to the second mobile phone selected by the user.
  • the electronic device capable of making a call with the first mobile phone usually has an audio call application, such as a system phone, a Changlian call, or WeChat, and the like.
  • a selection window 404 as shown in (b) of FIG. 5 may pop up.
  • the selection window 404 includes information of other electronic devices capable of establishing an audio call with the first mobile phone.
  • the selection window 404 may include identification information of other electronic devices capable of establishing an audio call with the first mobile phone.
  • each electronic device corresponds to a user, and the electronic device can also be determined by the user information. Therefore, the selection window 404 may also include user information of other electronic devices that can establish an audio call with the first mobile phone.
  • An audio call is initiated to the electronic device corresponding to the selected user.
  • the first mobile phone may also trigger the remote connection icon 403 in response to the user's voice input or gesture input, etc., which is not limited in this embodiment of the present application.
  • the first mobile phone can initiate an audio call to the corresponding electronic device.
  • the second mobile phone is on the side of the object to be photographed. If the first mobile phone obtains the remote audio from the side of the photographed object, the first mobile phone needs to establish an audio call with the second mobile phone. Therefore, the user (photographer) on the side of the first mobile phone can select "second mobile phone" in the selection window 404 shown in (b) of FIG. The second mobile phone on the object side initiates an audio call.
  • the audio call from the first mobile phone can be received on the second mobile phone, as shown in (c) of FIG. 5 .
  • the information of the contact object displayed in the selection window 404 can be acquired through various ways.
  • the first mobile phone may display in the selection window 404 the information of the contact object whose audio calling function is enabled (for example, enable smooth call) in the address book of the first mobile phone.
  • the first mobile phone may send broadcast information to search for other electronic devices within a preset distance that can establish an audio call with the first mobile phone, and display relevant information of these electronic devices or corresponding user information in the selection window 404 middle.
  • the first mobile phone can send broadcast messages through Bluetooth, Wi-Fi, zigbee, etc.
  • the preset distance is generally related to the range of the first mobile phone, and the present application does not specifically limit the preset distance.
  • the electronic device corresponding to the contact object displayed in the selection window 404 may establish a Bluetooth connection with the first mobile phone, or the electronic device corresponding to the contact object displayed in the selection window 404 may be in the same Wi-Fi local area network as the first mobile phone , or the electronic device corresponding to the contact object displayed in the selection window 404 has established an audio call with the first mobile phone before. This embodiment of the present application does not limit this.
  • the first mobile phone may store identification information and other related information of these electronic devices as information of historical contact objects. In this way, when the first mobile phone displays the selection window 404 later, the selection window 404 may include the information of the historical contact object stored in the first mobile phone. That is, these electronic devices that have previously established an audio call with the first mobile phone can be displayed as contact objects in the selection window 404 .
  • the first mobile phone can initiate an audio call to the second mobile phone in other ways.
  • the remote connection icon may not be displayed, but a selection window may be directly displayed to present the user with other electronic devices capable of establishing an audio call with the first mobile phone .
  • the first mobile phone initiates an audio call to the second mobile phone.
  • the first mobile phone may directly initiate an audio call to the second mobile phone in response to a user's voice input, such as "establish an audio connection with the second mobile phone", without displaying the remote connection icon in the preview interface. This embodiment of the present application does not limit this.
  • the second mobile phone In response to the answering operation of the user, the second mobile phone records audio and transmits the audio to the first mobile phone.
  • the second mobile phone After receiving the audio call from the first mobile phone on the second mobile phone, if the user chooses to answer the audio call, the second mobile phone starts recording the audio on the side of the photographed object and transmits the audio to the first mobile phone.
  • the second mobile phone when the second mobile phone receives an audio call from the first mobile phone, the second mobile phone displays an interface for a call to be answered as shown in (c) in FIG. 5 .
  • the second mobile phone displays the interface of the ongoing call as shown in (d) in FIG. 5 , and the second mobile phone starts to record the The audio on the side of the subject is photographed, an audio call is established with the first mobile phone, and the recorded audio is transmitted to the first mobile phone.
  • the second mobile phone detects that the user triggers the "hang up" control in (d) of FIG. 5 , the second mobile phone can end the audio call with the first mobile phone.
  • the first mobile phone When an audio call is established between the second mobile phone and the first mobile phone, the first mobile phone also starts to obtain the remote audio from the second mobile phone, as shown in (a) in FIG. 6 , the first mobile phone can display a prompt in the preview interface Block 601, to prompt the photographer that the first mobile phone is acquiring the far-end audio from the second mobile phone on the side of the photographed object.
  • the remote connection icon 403 in the preview interface may continue to exist, or may not be displayed temporarily until the end of the audio collection.
  • the first mobile phone if the first mobile phone detects that the photographer has triggered the end audio control 604, the first mobile phone can end the audio call with the second mobile phone.
  • the second mobile phone After the second mobile phone detects the user's answering operation, it can activate the audio pickup device such as the microphone of the audio collection module to collect the audio data of the object to be photographed. As shown in (a) in Figure 7, the second mobile phone passes the audio collection module After starting to collect audio data, the collected audio data is processed through the audio assembling module, the audio coding module and the buffer sending module in sequence. After that, the second mobile phone transmits the processed audio data to the first mobile phone through one of multiple communication protocols such as server relay transmission, P2P transmission, Bluetooth, zigbee, Wi-Fi, etc.
  • the audio assembling module assembles the collected audio data into data packets of preset size; the audio coding module compresses the input data packets; the compressed data packets in the buffer sending module are queued for transmission. It should be noted that, if both the second mobile phone and the first mobile phone can be connected to the same router, the WiFi-P2P link is preferably used to transmit audio data, so as to obtain a more stable transmission effect.
  • processing delay delay_remote in the process of processing the audio data by the second mobile phone.
  • various intelligent speech recognition algorithms may dynamically adjust parameters such as the frequency of audio collection, assembly ratio or encoding method according to the requirements, resulting in changes in the processing delay.
  • transmission delay delay_trans in the process that the second mobile phone transmits the processed audio data to the first mobile phone, and changes in network quality may cause changes in the transmission delay.
  • the processing delay delay_remote and the transmission delay delay_trans are calculated for each frame of audio data.
  • the delay data of each frame of audio data is the sum of the processing delay and the transmission delay. It should be noted that each frame of audio data represents each audio data packet generated in the audio assembly module.
  • the following takes the audio sampling frequency of 44Khz, the assembly ratio of 50hz, and the delay deviation of the audio and video that the user can receive within the range of [-120ms, +40ms] as an example to describe the processing delay and transmission delay of each frame of audio data in detail. calculation process.
  • the processing delay of each frame of audio data is the sum of the processing delays of each processing module that each frame of audio data passes through.
  • the sampling frequency of the audio acquisition module is 44Khz, which means that the audio acquisition module can generate 44,000 small data packets per second.
  • the audio collection module outputs the collected small data packets to the audio assembly module.
  • the audio assembly module assembles data packets through the buffer mechanism, that is, when the number of small data packets output by the audio acquisition module to the audio assembly module meets the preset number, the audio assembly module generates an audio data packet, that is, a frame of audio data.
  • the preset number here is related to the assembly ratio of the audio assembly module. For example, an audio assembly module with an assembly ratio of 50hz means that the audio assembly module assembles 50 audio packets per second.
  • the audio assembly module assembles the 880 small data packets into one audio data packet, that is, one frame of audio data.
  • delay_1 is generally within the range of delay deviation of audio and video acceptable to the user [-120ms, +40ms], delay_1 can be ignored during calculation and processing.
  • the timestamp of each audio data packet generated in the audio assembly module is marked as t 1 . Since each generated audio data packet is directly output to the audio coding module, it can also be considered that , the timestamp of each audio data packet entering the audio coding module is t 1 .
  • the audio encoding module can compress each input audio data packet, and the specific compression method is related to the encoding format and the computing power of the electronic device (eg, software encoding method or hardware encoding method, etc.).
  • each audio data packet can be queued, waiting to be transmitted to the first mobile phone via the air interface.
  • the processing delay delay_3 of the buffer sending module is related to the size of the sending buffer set in the second mobile phone.
  • the method of calculating the total processing delay by calculating the processing delay of each module is universal. When the processing process is adjusted and changed, the total processing delay can still be obtained by calculating the delay of each module at this time. .
  • This method of directly calculating the processing delay is more simple and convenient, and can directly calculate the processing delay corresponding to each audio data packet in one step.
  • each audio data packet After each audio data packet is output from the buffer sending module, it can be transmitted to the first mobile phone via the air interface. There is a delay between the time when each audio data packet is sent from the second mobile phone and the time when each audio data packet is received by the first mobile phone, that is, the transmission delay delay_trans.
  • the second mobile phone may calculate the transmission delay by sending a probe packet to the first mobile phone.
  • the second mobile phone sends the detection packet to the first mobile phone, and the first mobile phone does not do anything after receiving the detection packet, and immediately returns the detection packet to the second mobile phone through the original path.
  • the second mobile phone may send a probe packet at a preset frequency to calculate the current transmission delay, and usually the preset frequency is lower than the assembly ratio. In this way, fewer probe packets are transmitted, which can reduce the generation of extra traffic.
  • the second mobile phone may also periodically send detection packets at the frequency of the assembly ratio, so as to obtain the transmission delay of each audio data packet. Since audio calls have low bandwidth requirements, sending probe packets at the frequency of the assembly ratio will not affect audio data transmission.
  • the process of calculating the processing delay and transmission delay of each audio data packet by the second mobile phone is described above, and the delay data of each frame of audio data can be obtained according to the processing delay and the transmission delay.
  • the first mobile phone can calculate the time when each audio data packet starts to be generated in the second mobile phone according to the time stamp of the arrival of each audio data packet and the above-mentioned time delay data, taking the clock of the first mobile phone as a reference, that is, each frame of audio frequency. The moment when the data was generated. If the time stamp when each audio data packet arrives at the first mobile phone (that is, the first mobile phone receives each audio data packet) is t_arrive, the first mobile phone can calculate the generation time t_start of each frame of audio data.
  • t_start t_arrive-(delay_remote+delay_trans).
  • the first mobile phone can obtain the generation time of each frame of audio data based on the clock of the first mobile phone, that is to say, the audio of the second mobile phone and the video of the first mobile phone can be synchronously mixed and converted to approximate the audio of the first mobile phone Mixed in sync with the video.
  • the second mobile phone In order to enable the first mobile phone to accurately calculate and count the moment when each audio data packet starts to be generated, the second mobile phone also needs to calculate the delay data of each audio data packet, such as processing delay and transmission delay, and also transmit Give the first phone. Since the delay data cannot be mixed and transmitted with the audio data, otherwise the decoding of the audio data by the first mobile phone may be affected. Therefore, the embodiments of the present application provide various transmission methods for the delay data, which are described below by way of examples.
  • the delay data may be transmitted from the second mobile phone to the first mobile phone through a real-time transport protocol (RTP) message together with the audio data.
  • RTP real-time transport protocol
  • the audio data packet is an RTP packet.
  • the delay data may be transmitted from the second mobile phone to the first mobile phone through a timestamp field in the standard header of the RTP message.
  • FIG. 8 shows the format of the standard header of the RTP message.
  • the fields in the standard header of the message defined in Figure 8 are described as follows:
  • the timestamp occupies 32 bytes, and the timestamp reflects the sampling time of the first octet of the RTP message.
  • the receiver uses the timestamp to calculate delay and delay jitter, and to perform synchronization control.
  • the version number (V) indicates the version number of the RTP protocol, the version number occupies 2 bytes, and the current version number of the RTP protocol is 2.
  • the padding flag (P) occupies 1 byte.
  • the CSRC counter occupies 4 bytes and indicates the number of CSRC identifiers.
  • the marker (M) occupies 1 byte.
  • Different payloads have different meanings, for example, for video, the end of a frame can be marked; for audio, the start of an audio session can be marked.
  • the payload type occupies 7 bytes and is used to describe the type of payload in the RTP message, such as GSM audio, JPEG image, and so on.
  • sequence number occupies 16 bytes and is used to identify the sequence number of the RTP message sent by the sender. Each time a message is sent, the sequence number increases by 1.
  • the receiver can detect the loss of the message through the serial number, so as to reorder the message and restore the data.
  • the synchronization source (synchronization source, SSRC) identifier (identifier) occupies 32 bytes and is used to identify the synchronization source.
  • the identifier is randomly selected, and two synchronization sources participating in the same video conference cannot have the same SSRC.
  • Contributing source may have 0 to 15, and each CSRC identifier occupies 32 bytes.
  • Each CSRC identifies all privileged sources contained in the payload of the RTP message.
  • each audio data packet may be transmitted from the second mobile phone to the first mobile phone through an RTP protocol message. If the above-mentioned timestamp field in the RTP protocol packet header is not used, the timestamp field can be directly used to transmit the delay data at the same time. Since the sum of the processing delay and transmission delay of each audio data packet is usually within 1000ms, only 2 bytes in the timestamp field are required.
  • the delay data may be transmitted from the second mobile phone to the first mobile phone through the extended header field of the RTP message.
  • an extension header may be added after the RTP header and before the payload.
  • the corresponding delay data can be transmitted through the extended header field.
  • the transmission delay data only needs to occupy 2 bytes in the extended header field.
  • the packet extension header field used in the embodiments of this application is not encrypted.
  • this operation needs to modify the RTP protocol header assembly logic code at the sending end, and the RTP protocol header parsing logic code at the receiving end, so as to realize the insertion and extraction of delay data.
  • the above two methods of transmitting delay data are based on the header of the media channel, such as the RTP header, and transmit the delay data corresponding to each audio data packet while transmitting each audio data packet. That is to say, the above two transmission delay methods are based on the single channel of the media channel.
  • the audio data and the time delay data may be respectively transmitted through different communication channels, as shown in the following transmission method 3. It can also be considered that the audio data is still transmitted through the RTP message, while the delay data is transmitted in the control signaling format.
  • the delay data may be transmitted with the control signaling channel in the control signaling format, and need not be transmitted with the audio data with the media channel.
  • the control signaling channel can transmit control information such as call start and stop, handshake or authentication.
  • the control signaling channel may transmit information and data based on a transmission control protocol (transmission control protocol, TCP). Data transmitted through the control signaling format is transmitted via the control signaling channel.
  • TCP transmission control protocol
  • each audio data packet and the corresponding delay data do not exist in the same packet when the delay data is transmitted through the control signaling channel, so it is necessary to add the delay data when transmitting the delay data through the control signaling channel.
  • Frame serial number label
  • each audio data packet corresponds to a frame sequence number.
  • the frame sequence number information corresponding to the audio data packet needs to be added to the delay data of each audio data packet, indicating the time Which audio packet the delay data corresponds to.
  • the second mobile phone may periodically transmit the delay data to which the corresponding frame sequence number information is added to the first mobile phone by means of periodic transmission (for example, synchronous transmission with the heartbeat packet load).
  • the second mobile phone may also transmit the delay data to which the corresponding frame sequence number information is added to the first mobile phone at one time.
  • the delayed data may also be compressed, thereby saving data transmission volume.
  • the processing delay and transmission delay of each audio data packet are also relatively stable, with small changes. For example, if the variance of the delay data of a continuous audio data frame is less than or equal to the preset deviation range, the corresponding frame sequence number information may not be added to each delay data, but the delay average of this continuous audio data frame can be directly taken. Average, and add the start frame sequence number and end frame sequence number of this audio data frame. In this way, the amount of time-delayed data can be reduced through data compression, thereby saving a large amount of data transmission.
  • step S303 the second mobile phone starts to collect audio data, and transmits the collected audio data to the first mobile phone.
  • the second mobile phone can also use any one of the transmission methods 1 to 3 to process the audio data.
  • the processing delay and transmission delay in the process of transmitting audio data are also transmitted to the first mobile phone, so that the first mobile phone can receive each audio data packet according to the received processing delay and transmission delay, and the time when the first mobile phone receives each audio data packet. , based on the clock of the first mobile phone, calculate the generation time of each audio data packet, so as to compare the received audio with the The video recorded by the first mobile phone is synchronously mixed and recorded.
  • the first mobile phone records the video on the side of the photographed object in response to the user's video recording operation, and acquires the audio from the second mobile phone.
  • the first mobile phone may start to record a video on the side of the photographed object in response to a user's recording operation.
  • the recording control 602 shown in (a) in FIG. 6 the first mobile phone starts to record a video, and displays the shooting interface shown in (b) in FIG. 6 .
  • the recording control 602 becomes the stop recording control 603 .
  • the recording operation may also be voice input or gesture input, which is not limited in this application.
  • the first mobile phone after the first mobile phone detects the user's recording operation, it does not immediately start recording the video on the side of the photographed object.
  • the first mobile phone can first send a request message to the second mobile phone, for example, a selection box will pop up in the second mobile phone, and in the selection box, the user on the second mobile phone side can be asked whether the user on the side of the photographed object agrees with the first mobile phone to record the recording of the subject.
  • the first mobile phone starts to record the video on the side of the photographed object, and displays the interface shown in (b) in FIG. 6 .
  • the first mobile phone cannot record the video on the side of the photographed object.
  • a notification box may also be displayed in the first mobile phone to notify the photographer that the user on the side of the photographed object rejects the photographing request of the first mobile phone.
  • the prompt box 601 may not be displayed, so as to ensure that the shooting picture is not blocked.
  • the above-mentioned selection frame may be continuously displayed on the second mobile phone to confirm whether the user on the side of the photographed object chooses to agree to continue the current shooting. Once the user on the side of the photographed object rejects the photographing request, the first mobile phone immediately stops recording the video. During the shooting process, a reminder message may also be continuously displayed on the second mobile phone to remind the user on the side of the photographed object that the first mobile phone is still recording the video on the side of the photographed object.
  • the first mobile phone collects the video on the side of the photographed object only when the user on the side of the photographed object agrees to the photographing request, which can ensure the validity and security of the collected video, prevent irrelevant content from being photographed, and improve user experience.
  • the first cell phone also acquires audio from the second cell phone. Still as shown in (a) in Figure 7, after the first mobile phone receives each frame of audio data from the second mobile phone, each frame of audio data can be processed by the receiving buffer module, the audio decoding module and the audio disassembly module respectively, Get the processed audio data. At the same time, when the first mobile phone receives the audio data from the second mobile phone, it can also receive delay data. According to the time stamp of the arrival of each audio data packet and the above-mentioned delay data, the clock of the first mobile phone is used as the benchmark to calculate each The generation moment of each audio data packet in the second mobile phone, that is, the generation moment of each frame of audio data.
  • step S303 after the second mobile phone records the audio, the audio is transmitted to the first mobile phone, that is, before the first mobile phone responds to the user's recording operation in step S304,
  • the first phone can acquire audio from the second phone.
  • the audio obtained by the first mobile phone from the second mobile phone can be used to conduct an audio call test between the first mobile phone and the second mobile phone to ensure that the first mobile phone and the second mobile phone are The audio call between the two is uninterrupted.
  • the first mobile phone performs synchronous mixed recording of the video recorded by the first mobile phone and the audio from the second mobile phone to generate a short video file with synchronized audio and video.
  • the recording ends, and the display is displayed.
  • the playback interface shown in (c) of FIG. 6 if the first mobile phone detects that the photographer has finished recording, for example, when the first mobile phone detects that the photographer clicks the stop recording control 603 in (b) in FIG. 6 , the recording ends, and the display is displayed.
  • the playback interface shown in (c) of FIG. 6 displays a short video of audio and video synchronization obtained by synchronously mixing the video data collected by the first mobile phone and the audio data from the second mobile phone. video file. Since the first mobile phone calculates the generation time of each frame of audio data based on the clock of the first mobile phone, that is to say, the video data collected by the first mobile phone and the audio data collected by the second mobile phone are both based on the clock of the first mobile phone, Therefore, the problem that the first mobile phone performs synchronous mixed recording of the video from the first mobile phone and the audio from the second mobile phone (heterogeneous audio and video synchronization) can be transformed into the first mobile phone to synchronize the audio and video from the first mobile phone.
  • the first mobile phone can synchronously mix and record the video recorded by the first mobile phone and the audio recorded by the second mobile phone.
  • the first mobile phone can use the audio data as the main stream, and the video data as the slave stream. , adjust the playback state of the video data according to the time stamp of the audio data, so as to realize the synchronous mixed recording of the video recorded by the first mobile phone and the audio from the second mobile phone, and obtain a short video file with synchronized audio and video.
  • the short video file for audio and video synchronization may be a short video file with lip synchronization effect, and the state of the photographed object in the video, such as mouth shape, motion, etc., corresponds to the audio.
  • the first mobile phone may start recording the video based on the timestamp (ie the moment of the user's video recording operation) ) discards the audio data acquired by the first mobile phone before the user's video recording operation, so that the video and audio acquired by the first mobile phone after the user's video recording operation is synchronously mixed and recorded.
  • the first mobile phone may also respond to the user's video recording operation, first record the video on the side of the photographed object, and then in response to the user's triggering operation on the remote connection icon, initiate an audio call to the second mobile phone, Thereby, the audio from the second mobile phone is acquired, and finally the video recorded by the first mobile phone and the audio from the second mobile phone are synchronously mixed and recorded.
  • the first mobile phone can discard the previous video data based on the calculated generation time of the audio data, so as to perform synchronous mixed recording of the audio obtained from the second mobile phone and the corresponding video.
  • the playback interface shown in (c) of FIG. 6 may further include a return control 605 , a release control 606 and a save control 607 . If the user is not satisfied with the currently playing audio-video synchronized short video file, he can trigger the return control 605 to return to the preview interface to re-record; if the user wants to save the currently playing audio-video synchronized short video file, he can trigger the save control 607, The currently playing audio-video synchronized short video file is saved; if the user wishes to share the currently playing audio-video synchronized short video file, the publishing control 606 can be triggered to share the currently playing audio-video synchronized short video file.
  • the audio on the side of the object to be photographed cannot be recorded clearly and completely, so the first mobile phone is recording the side of the object to be photographed.
  • the microphone can be turned on or off.
  • the first mobile phone performs synchronous mixed recording only the video recorded by the first mobile phone is selected, and the video recorded by the first mobile phone is recorded with the second mobile phone on the subject side.
  • the audio is synchronously mixed and recorded to generate a short video file with audio and video synchronization.
  • the first mobile phone can be based on the time delay data of each frame of audio data from the second mobile phone and the timestamp of each frame of audio data received by the first mobile phone from the second mobile phone.
  • the clock of the first mobile phone is used as the benchmark to accurately calculate the generation time of each frame of audio data from the second mobile phone.
  • the problem of synchronous mixing of audio and video from different mobile phones is transformed into a problem of synchronous mixing of audio and video of the same mobile phone, and then the mature method of synchronous mixing of audio and video from the same mobile phone can be combined to realize the first
  • the purpose of mixing and recording the video recorded by one mobile phone and the audio recorded by the second mobile phone is synchronized.
  • This method does not rely on the NTP time server, and can avoid the problem of audio and video synchronization caused by the large delay deviation caused by the WAN timing accuracy error. At the same time, this method has no additional requirements on equipment and links, and the cost is relatively It can avoid the overhead of extra communication cost or device cost caused by needing extra device support or increasing the synchronization period. In addition, this method can more conveniently support the mixed recording of long-distance audio and local video, thereby eliminating the artificial bias brought by methods such as medium and long microphones or post-recording studio dubbing in the prior art, and further improving the synchronization of mixed recording of audio and video sex.
  • the audio and video call module between devices is the Huawei Changlian CaasKit component
  • the audio call between the first mobile phone and the second mobile phone through the Changlian call is For example, the method continues to be described supplementally from the perspective of the device module.
  • a video recording application in the first mobile phone may integrate an audio and video call module between devices, such as a Huawei Changlian CaasKit component.
  • the Huawei Changlian CaasKit component has the capabilities of audio and video codec and network transmission, including audio and video call interfaces and codec transmission interfaces.
  • the system or the audio and video call application (for example, the Changlian call application) in the first mobile phone and the second mobile phone both include audio and video call transmission components, and the first mobile phone and the second mobile phone transmit the audio and video calls through their respective audio and video calls.
  • the component performs an audio call, and the system or the audio and video call application (for example, the Changlian call application) in the first mobile phone and the second mobile phone may further include a Call service module to query the audio call capability.
  • both the first mobile phone and the second mobile phone have enabled a smooth call, when judging whether the first mobile phone has the ability to obtain remote audio, and when the first mobile phone searches for the ability to establish a connection with the first mobile phone.
  • the CaasKit component integrated in the video recording application in the first mobile phone can communicate with the Call service module in the first mobile phone system through inter-process communication, so as to query the first mobile phone through the application program interface to obtain remote audio.
  • the user selects the second mobile phone and initiates an audio call to the selected second mobile phone.
  • Changlian Call completes the device management and audio and video call connection through the cloud.
  • the video recording application acquires the audio data collected by the second mobile phone through the application program interface of CaasKit, and records the audio data collected by the second mobile phone in the video recording application of the first mobile phone according to the above-mentioned step S304.
  • the clock of the mobile phone is used as the benchmark to determine the generation time of the audio recorded by the second mobile phone, and then the media recording module in the APP base in the video recording application of the first mobile phone refers to the current mature homologous audio and video synchronization technology (such as lip synchronization technology) , to complete the synchronous mixed recording of the video data collected by the first mobile phone and the audio data collected by the second mobile phone.
  • the media recording module in the APP base in the video recording application of the first mobile phone refers to the current mature homologous audio and video synchronization technology (such as lip synchronization technology) , to complete the synchronous mixed recording of the video data collected by the first mobile phone and the audio data collected by the second mobile phone.
  • the method provided by the embodiment of the present application can also be used when the distance between the first mobile phone and the second mobile phone is relatively short.
  • the method provided by the embodiment of the present application can also be applied to a scene where the distance between the photographer and the subject changes.
  • the first mobile phone can record a video in an ordinary way. Switch between ways to record video with the first phone and get audio from the second phone. For example, if the distance between the photographer and the subject is relatively short when video recording is started, the first mobile phone can effectively record the audio on the subject side, and the first mobile phone directly records the audio and video. Later, if the distance between the photographer and the subject becomes farther, or after the photographer changes the subject, the distance between the new subject and the photographer is farther, so that the first mobile phone cannot effectively record the subject.
  • the first mobile phone can automatically switch to the first mobile phone to record video and the second mobile phone to record audio, or the first mobile phone can also switch to the first mobile phone to record video in response to the user's switching operation, and the second mobile phone to record video.
  • the first mobile phone records the video recorded by the first mobile phone and the audio recorded by the second mobile phone synchronously and mixed. In this way, it can be ensured that the first mobile phone can continuously obtain the audio on the side of the photographed object, so as to avoid a situation in which the audio recording in the finally obtained video is incomplete due to changes in the video recording process.
  • the embodiments of the present application described above are described based on the distance of scene 10, and it is not limited that the methods provided by the embodiments of the present application can only be applied to synchronous mixed recording of local video and remote audio.
  • the method provided by the embodiment of the present application is also applicable. That is to say, the first mobile phone can initiate a video call to the second mobile phone. After the second mobile phone detects that the user accepts the operation of the video call, a video call is established between the second mobile phone and the first mobile phone, and the second mobile phone can record the video call. The video of the second mobile phone is transmitted to the first mobile phone.
  • the audio on the side of the first mobile phone can be recorded.
  • the first mobile phone synchronously mixes and records the collected audio data and the video data from the second mobile phone. This process is not described in detail in this embodiment of the present application.
  • the audio and video from different electronic devices can also be recorded synchronously and mixed without relying on the NTP time server, which can avoid the asynchronous audio and video caused by the large delay deviation caused by the timing accuracy of the WAN.
  • this method has no additional requirements on equipment and links, and the cost is low.
  • the electronic device includes corresponding hardware and/or software modules for executing each function.
  • the present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
  • the electronic device can be divided into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • FIG. 10 is a schematic structural diagram of a first electronic device provided by an embodiment of the present application.
  • an embodiment of the present application provides a first electronic device 100 , including a detection unit 1001 , a transceiver unit 1002 , a collection unit 1003 , a display unit 1004 , and a synchronization unit 1005 .
  • the detection unit 1001 is used to detect the user's recording operation, the operation of ending recording, and other operations.
  • the transceiver unit 1002 is configured to initiate an audio call to the second electronic device, and acquire audio data from the second electronic device and delay data corresponding to each frame of audio data.
  • the acquisition unit 1003 is used to acquire video data on the side of the photographed object.
  • the display unit 1004 is used for displaying preset icons, a list of contact objects, and recorded images of the photographed objects, and the like.
  • the synchronization unit 1005 is used for synchronizing the audio data from the second electronic device and the video data collected by the first electronic device.
  • FIG. 11 is a schematic structural diagram of a second electronic device provided by an embodiment of the present application.
  • an embodiment of the present application provides a second electronic device 110 , including a detection unit 1101 , a transceiver unit 1102 , a collection unit 1103 , a display unit 1104 , and a calculation unit 1005 .
  • the detection unit 1101 is used to detect the operation of the user accepting the audio call and other operations.
  • the transceiver unit 1102 is configured to transmit audio data and delay data corresponding to each frame of audio data to the first electronic device.
  • the acquisition unit 1103 is used to acquire audio data on the side of the photographed object.
  • the display unit 1004 is used to display interfaces such as reminder messages and call interfaces.
  • the calculation unit 1105 is used to calculate processing delay and transmission delay.
  • Embodiments of the present application further provide an electronic device, including one or more processors and one or more memories.
  • the one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform
  • the above-mentioned related method steps implement the audio and video synchronous mixed recording method in the above-mentioned embodiment.
  • Embodiments of the present application further provide a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on an electronic device, the electronic device executes the above-mentioned related method steps to realize the above-mentioned embodiments The audio and video sync hybrid recording method in .
  • Embodiments of the present application also provide a computer program product, which, when running on a computer, causes the computer to execute the above-mentioned relevant steps, so as to implement the method for synchronously mixing audio and video performed by the electronic device in the above-mentioned embodiment.
  • the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the audio and video synchronous mixed recording method executed by the electronic device in the above method embodiments.
  • the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Telephone Function (AREA)

Abstract

Embodiments of the present application relate to the technical field of electronics, and provide an audio and video synchronization method and a device. A first electronic device can perform, on the basis of its own clock, synchronous and mixed recording on video data acquired by the first electronic device and audio data acquired by a second electronic device. The specific solution comprises: the first electronic device initiating an audio call to the second electronic device in response to a first user operation; acquiring video data in response to a second user operation; obtaining audio data transmitted by means of multiple audio data packets from the second electronic device, and delay data corresponding to each of the multiple audio data packets, wherein the delay data is used for indicating a delay between the moment at which each audio data packet is generated and the moment at which the first electronic device obtains each audio data packet; and synchronizing audio data and video data on the basis of the delay data in response to a third user operation. The embodiments of the present application are used in the process of video recording.

Description

一种音频与视频同步的方法及设备A method and device for synchronizing audio and video
本申请要求于2020年11月30日提交国家知识产权局、申请号为202011377423.7、申请名称为“一种音频与视频同步的方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202011377423.7 and the application title "A method and apparatus for synchronizing audio and video" filed with the State Intellectual Property Office on November 30, 2020, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请实施例涉及电子技术领域,尤其涉及一种音频与视频同步的方法及设备。The embodiments of the present application relate to the field of electronic technologies, and in particular, to a method and device for synchronizing audio and video.
背景技术Background technique
在手机等电子设备录制短视频文件的过程中,可能由于该电子设备与被拍摄对象之间距离较远,使得该电子设备无法清晰、完整地录制被拍摄对象侧的音频,导致最终短视频文件的录制效果较差。In the process of recording a short video file by an electronic device such as a mobile phone, the electronic device may not be able to clearly and completely record the audio on the subject side due to the long distance between the electronic device and the subject, resulting in the final short video file. The recording effect is poor.
现有技术中可以分别使用一个电子设备录制被拍摄对象侧的视频,使用被拍摄对象侧附近的另一电子设备录制被拍摄对象侧的音频,之后将来自不同电子设备的音频和视频进行同步混合录制。然而,将来自不同电子设备的音频和视频进行同步混合录制时可能出现音频和视频不同步的现象。In the prior art, one electronic device can be used to record the video on the side of the photographed object, another electronic device near the side of the photographed object can be used to record the audio on the side of the photographed object, and then the audio and video from different electronic devices are synchronously mixed. record. However, when audio and video from different electronic devices are recorded in a synchronized mix, the audio and video may become out of sync.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种音频与视频同步的方法及设备,第一电子设备能够基于第二电子设备端内计算统计的每帧音频数据的处理时延和传输时延,以及每帧音频数据到达第一电子设备的时刻,以第一电子设备的时钟为基准,计算每帧音频数据的产生时刻,进而实现同步混合录制分别来自于第一电子设备的视频和第二电子设备的音频。The embodiment of the present application provides a method and device for synchronizing audio and video. The first electronic device can calculate and count the processing delay and transmission delay of each frame of audio data based on the second electronic device terminal, and the arrival time of each frame of audio data. The time of the first electronic device is based on the clock of the first electronic device, and the generation time of each frame of audio data is calculated, thereby realizing synchronous mixed recording of the video from the first electronic device and the audio from the second electronic device.
为达到上述目的,本申请实施例采用如下技术方案:In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
第一方面,本申请实施例提供了一种音频与视频同步的方法,应用于第一电子设备的音视频录制场景中。该方法包括:第一电子设备响应于第一用户操作,向第二电子设备发起音频呼叫;响应于第二用户操作,采集视频数据;获取来自所述第二电子设备的音频数据,该音频数据通过多个音频数据包传输;以及获取来自第二电子设备的多个音频数据包中的每个音频数据包对应的时延数据。其中,时延数据用于表示多个音频数据包中的每个音频数据包的产生时刻与第一电子设备获取到多个音频数据包中的每个音频数据包的时刻之间的时延;响应于第三用户操作,基于时延数据将音频数据和视频数据同步。In a first aspect, an embodiment of the present application provides a method for synchronizing audio and video, which is applied in an audio and video recording scene of a first electronic device. The method includes: in response to a first user operation, a first electronic device initiates an audio call to a second electronic device; in response to a second user operation, collecting video data; and acquiring audio data from the second electronic device, the audio data transmitting through a plurality of audio data packets; and acquiring delay data corresponding to each audio data packet in the plurality of audio data packets from the second electronic device. Wherein, the time delay data is used to represent the time delay between the time when each audio data packet in the plurality of audio data packets is generated and the time when the first electronic device obtains each audio data packet in the plurality of audio data packets; In response to the third user operation, the audio data and the video data are synchronized based on the time delay data.
在该方案中,第一电子设备能够获取来自第二电子设备的音频数据,以及传输音频数据的多个音频数据包中的每个音频数据包对应的时延数据,进而以第一电子设备的时钟为基准,根据时延数据将第一电子设备采集的视频数据与第二电子设备采集的音频数据同步,得到音视频同步的视频文件。In this solution, the first electronic device can obtain the audio data from the second electronic device, and the delay data corresponding to each audio data packet in the multiple audio data packets for transmitting the audio data, and then use the audio data of the first electronic device. Using the clock as a reference, the video data collected by the first electronic device is synchronized with the audio data collected by the second electronic device according to the time delay data, so as to obtain a video file with audio and video synchronization.
在一种可能的实现方式中,时延数据包括第一时延数据和第二时延数据。其中,第一时延数据为第二电子设备处理音频数据过程的处理时延,第二时延数据为音频数据从第二电子设备到第一电子设备的传输时延。In a possible implementation manner, the delay data includes first delay data and second delay data. Wherein, the first delay data is the processing delay of the process of processing the audio data by the second electronic device, and the second delay data is the transmission delay of the audio data from the second electronic device to the first electronic device.
在另一种可能的实现方式中,处理包括音频数据的编码处理和缓存处理。In another possible implementation, the processing includes encoding processing and buffering processing of audio data.
在另一种可能的实现方式中,处理还包括音频数据的组装处理。In another possible implementation, the processing also includes assembly processing of the audio data.
这里,第二电子设备处理音频数据的过程通常包括组装处理、编码处理和缓存处理,由于组装处理的处理时延通常在可接受的时延偏差范围内,因此在一些实施例中,可以忽略组装处理的处理时延。Here, the process of processing the audio data by the second electronic device generally includes assembly processing, encoding processing and buffering processing. Since the processing delay of the assembly processing is usually within an acceptable delay deviation range, in some embodiments, the assembly process can be ignored. Processing latency for processing.
在另一种可能的实现方式中,多个音频数据包中的每个音频数据包还传输每个音频数据包对应的时延数据。In another possible implementation manner, each audio data packet in the multiple audio data packets also transmits delay data corresponding to each audio data packet.
也就是说,音频数据和时延数据可以通过同一个音频数据包传输。That is, audio data and delay data can be transmitted through the same audio data packet.
在另一种可能的实现方式中,每个音频数据包是实时传输协议RTP报文。其中,时延数据位于RTP报文头的扩展头字段。In another possible implementation manner, each audio data packet is a real-time transport protocol RTP message. The delay data is located in the extended header field of the RTP packet header.
这里,时延数据可以通过RTP报文头的扩展头字段与音频数据一起被传输。Here, the delay data may be transmitted together with the audio data through the extension header field of the RTP header.
在另一种可能的实现方式中,每个音频数据包是RTP报文。其中,时延数据位于RTP报文头的时间戳timestamp字段。In another possible implementation manner, each audio data packet is an RTP packet. The delay data is located in the timestamp field of the RTP packet header.
这里,时延数据可以通过RTP报文头的timestamp字段与音频数据一起被传输。Here, the delay data may be transmitted together with the audio data through the timestamp field of the RTP header.
在另一种可能的实现方式中,时延数据以控制信令格式传输。In another possible implementation manner, the delay data is transmitted in a control signaling format.
这里,时延数据可以以控制信令格式传输,也就是说,时延数据可以与音频数据分开传输。例如,音频数据仍然通过多个音频数据包从媒体通道传输,时延数据以控制信令格式从控制信令通道传输。Here, the delay data may be transmitted in a control signaling format, that is, the delay data may be transmitted separately from the audio data. For example, audio data is still transmitted from the media channel through multiple audio data packets, and delayed data is transmitted from the control signaling channel in a control signaling format.
在另一种可能的实现方式中,时延数据还包括每个音频数据包的序列号信息。In another possible implementation manner, the delay data further includes sequence number information of each audio data packet.
在另一种可能的实现方式中,若连续N个音频数据包分别对应的时延数据的方差小于预设阈值,则时延数据包括N个音频数据包分别对应的时延数据的平均值,以及N个音频数据包的开始序列号信息和结束序列号信息。In another possible implementation manner, if the variance of the delay data corresponding to the N consecutive audio data packets is smaller than the preset threshold, the delay data includes the average value of the delay data corresponding to the N audio data packets respectively, And the start sequence number information and the end sequence number information of the N audio data packets.
这样,多个时延数据可以被组装成单个数据进行传输,从而节省数据传输量。In this way, multiple time-delayed data can be assembled into a single data for transmission, thereby saving the amount of data transmission.
在另一种可能的实现方式中,基于时延数据将音频数据和视频数据同步,包括:基于时延数据和第一电子设备获取到每个音频数据包的时刻,计算每个音频数据包的产生时刻;根据每个音频数据包的产生时刻,将所述音频数据和所述视频数据进行同步,得到同步的视频文件。In another possible implementation manner, synchronizing the audio data and the video data based on the delay data includes: calculating the time delay of each audio data packet based on the delay data and the moment when the first electronic device obtains each audio data packet. Generating time; according to the generating time of each audio data packet, synchronize the audio data and the video data to obtain a synchronized video file.
在另一种可能的实现方式中,第一电子设备打开所述视频录制应用后,视频录制应用显示有预设图标或联系对象的列表。In another possible implementation manner, after the first electronic device opens the video recording application, the video recording application displays a preset icon or a list of contact objects.
在另一种可能的实现方式中,若视频录制应用显示有预设图标,则响应于第一用户操作,向第二电子设备发起音频呼叫,包括:响应于用户对预设图标的操作,显示联系对象的列表;响应于用户对联系对象的列表中第二电子设备的选择操作,向第二电子设备发起音频呼叫。In another possible implementation manner, if the video recording application displays a preset icon, in response to the first user operation, initiating an audio call to the second electronic device includes: in response to the user's operation on the preset icon, displaying A list of contact objects; in response to a user's selection operation on the second electronic device in the list of contact objects, an audio call is initiated to the second electronic device.
在另一种可能的实现方式中,若视频录制应用显示有联系对象的列表,则响应于第一用户操作,向第二电子设备发起音频呼叫,包括:响应于用户对联系对象的列表中第二电子设备的选择操作,向第二电子设备发起音频呼叫。In another possible implementation manner, if the video recording application displays a list of contact objects, in response to the first user operation, initiating an audio call to the second electronic device includes: responding to the user's selection of the first contact object in the list of contact objects. The selection operation of the second electronic device initiates an audio call to the second electronic device.
也就是说,第一电子设备可以响应于用户对预设图标或联系对象的列表的操作,向第二电子设备发起音频呼叫。That is, the first electronic device may initiate an audio call to the second electronic device in response to the user's operation on the preset icon or the list of contact objects.
在另一种可能的实现方式中,联系对象的信息来自于第一电子设备的通讯录。In another possible implementation manner, the information of the contact object comes from the address book of the first electronic device.
在另一种可能的实现方式中,联系对象对应的电子设备与第一电子设备之间建立有 蓝牙连接,或者联系对象对应的电子设备与第一电子设备在同一Wi-Fi局域网内。In another possible implementation manner, a Bluetooth connection is established between the electronic device corresponding to the contact object and the first electronic device, or the electronic device corresponding to the contact object and the first electronic device are in the same Wi-Fi local area network.
在另一种可能的实现方式中,第一电子设备存储有历史联系对象的信息,历史联系对象对应的电子设备包括之前与第一电子设备建立过音频通话的电子设备。In another possible implementation manner, the first electronic device stores information of a historical contact object, and the electronic device corresponding to the historical contact object includes an electronic device that has established an audio call with the first electronic device before.
也就是说,第一电子设备可以将之前与其建立过音频通过的电子设备的信息存储起来,方便下次发起音频呼叫。That is to say, the first electronic device can store the information of the electronic device with which the audio pass has been established before, so as to facilitate initiating an audio call next time.
第二方面,本申请实施例提供了一种第一电子设备,该第一电子设备包括:一个或多个处理器;存储器;以及一个或多个计算机程序。其中一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令,当指令被第一电子设备执行时,使得第一电子设备执行第一方面任一项可能的设计中的方法。In a second aspect, an embodiment of the present application provides a first electronic device, where the first electronic device includes: one or more processors; a memory; and one or more computer programs. wherein one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the first electronic device, cause the first electronic device to perform the method of any possible design of the first aspect .
第三方面,本申请实施例提供了一种音频与视频同步的***。该***包括第一电子设备和第二电子设备,应用于第一电子设备的音频视录制场景中。其中,第一电子设备响应于第一用户操作,向第二电子设备发起音频呼叫;第二电子设备接受所述音频呼叫;第二电子设备采集音频数据,音频数据通过多个音频数据包传输;并获取多个音频数据包中的每个音频数据包对应的时延数据,将音频数据以及时延数据传输至第一电子设备;第一电子设备响应于第二用户操作,采集视频数据,获取来自第二电子设备的音频数据以及时延数据;第一电子设备响应于第三用户操作,基于时延数据将音频数据和视频数据同步。In a third aspect, an embodiment of the present application provides a system for synchronizing audio and video. The system includes a first electronic device and a second electronic device, and is applied in the audio and video recording scene of the first electronic device. Wherein, the first electronic device initiates an audio call to the second electronic device in response to the operation of the first user; the second electronic device accepts the audio call; the second electronic device collects audio data, and the audio data is transmitted through multiple audio data packets; and obtains the delay data corresponding to each audio data packet in the plurality of audio data packets, and transmits the audio data and the delay data to the first electronic device; the first electronic device collects the video data in response to the operation of the second user, and obtains the audio data and time delay data from the second electronic device; the first electronic device synchronizes the audio data and the video data based on the time delay data in response to the third user operation.
在一种可能的实现方式中,在第一电子设备和第二电子设备之间的距离大于第一电子设备的收音距离的情况下,第一电子设备响应于第一用户操作,向第二电子设备发起音频呼叫。In a possible implementation manner, in the case that the distance between the first electronic device and the second electronic device is greater than the sound pickup distance of the first electronic device, the first electronic device responds to the operation of the first user and sends the second electronic device to the second electronic device. The device initiates an audio call.
这里,由于第一电子设备与第二电子设备之间的距离大于第一电子设备的收音距离时,第一电子设备可能无法清晰、完整地录制第二电子设备侧的音频,因此,第一电子设备可以向第二电子设备发起音频呼叫,使得第二电子设备采集音频数据。Here, since the distance between the first electronic device and the second electronic device is greater than the sound collection distance of the first electronic device, the first electronic device may not be able to clearly and completely record the audio on the side of the second electronic device. The device may initiate an audio call to the second electronic device, causing the second electronic device to collect audio data.
在另一种可能的实现方式中,在第一电子设备采集视频数据时,第二电子设备持续显示提醒消息。其中,提醒消息用于提醒用户第一电子设备在采集视频数据。In another possible implementation manner, when the first electronic device collects video data, the second electronic device continuously displays a reminder message. The reminder message is used to remind the user that the first electronic device is collecting video data.
第四方面,本申请实施例提供了一种计算机可读存储介质,包括计算机指令,当计算机指令在计算机上运行时,使得计算机执行第一方面任一项可能的设计中的方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, including computer instructions, which, when the computer instructions are executed on a computer, cause the computer to execute the method in any possible design of the first aspect.
第五方面,本申请实施例提供了一种计算机程序产品。当计算机程序产品在计算机上运行时,使得计算机执行第一方面任一项可能的设计中的方法。In a fifth aspect, the embodiments of the present application provide a computer program product. The computer program product, when run on a computer, causes the computer to perform the method of any possible design of the first aspect.
上述其他方面对应的有益效果,可以参见关于方法方面的有益效果的描述,此处不予赘述。For the beneficial effects corresponding to the above-mentioned other aspects, reference may be made to the description of the beneficial effects of the method, which will not be repeated here.
附图说明Description of drawings
图1为本申请实施例提供的一种拍摄场景图;Fig. 1 is a kind of shooting scene diagram provided by the embodiment of this application;
图2为本申请实施例提供的一种电子设备的硬件结构图;FIG. 2 is a hardware structure diagram of an electronic device provided by an embodiment of the present application;
图3为本申请实施例提供的一种音频与视频同步的方法的流程图;3 is a flowchart of a method for synchronizing audio and video according to an embodiment of the present application;
图4为本申请实施例提供的一种录像模式的预览界面的示意图;4 is a schematic diagram of a preview interface of a video recording mode provided by an embodiment of the present application;
图5为本申请实施例提供的一种界面示意图;5 is a schematic diagram of an interface provided by an embodiment of the present application;
图6为本申请实施例提供的另一种界面示意图;FIG. 6 is another interface schematic diagram provided by an embodiment of the present application;
图7为本申请实施例提供的一种音频数据处理过程的示意图;7 is a schematic diagram of an audio data processing process provided by an embodiment of the present application;
图8为本申请实施例提供的一种报文结构的示意图;FIG. 8 is a schematic diagram of a message structure provided by an embodiment of the present application;
图9为本申请实施例提供的一种电子设备的结构示意图;FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图10为本申请实施例提供的一种电子设备的结构示意图;FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图11为本申请实施例提供的另一种电子设备的结构示意图。FIG. 11 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, unless otherwise stated, “/” means or means, for example, A/B can mean A or B; “and/or” in this document is only a description of the associated object The association relationship of , indicates that there can be three kinds of relationships, for example, A and/or B, can indicate that A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" refers to two or more than two.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of this embodiment, unless otherwise specified, "plurality" means two or more.
目前,将来自不同电子设备的音频和视频进行同步混合录制时可能出现音频和视频不同步的现象。现有技术中,为了同步混合录制来自不同电子设备的音频和视频,一个电子设备录制的视频和另一个电子设备录制的音频可以周期性地与网络时间协议(network time protocol,NTP)时钟服务器进行同步,从而使得来源于不同电子设备的视频和音频能够实现同步混合录制。At present, when the audio and video from different electronic devices are synchronously mixed and recorded, the phenomenon that the audio and video are out of sync may occur. In the prior art, in order to synchronously mix and record audio and video from different electronic devices, the video recorded by one electronic device and the audio recorded by another electronic device can be periodically performed with a network time protocol (network time protocol, NTP) clock server. Synchronization, enabling simultaneous mixing and recording of video and audio from different electronic devices.
但是上述方案依赖NTP时间服务器对音频和视频进行同步,而广域网授时精度误差通常达到50ms-500ms。由于广域网的授时精度误差可能超过音频和视频同步的时延偏差要求,因此,依赖NTP时间服务器对音频和视频进行同步混合录制可能导致时延偏差较大,仍然出现音频和视频不同步的现象。However, the above scheme relies on the NTP time server to synchronize audio and video, and the WAN timing accuracy error usually reaches 50ms-500ms. Because the timing accuracy error of the WAN may exceed the delay deviation requirement of audio and video synchronization, relying on the NTP time server to perform synchronous mixed recording of audio and video may lead to a large delay deviation, and the phenomenon of audio and video synchronization still occurs.
本申请实施例提供一种音频与视频同步的方法,第一电子设备能够基于第二电子设备端内计算统计的每帧音频数据的时延数据(例如,处理时延和传输时延),以及每帧音频数据到达第一电子设备的时刻,以第一电子设备的时钟为基准,计算每帧音频数据的产生时刻,进而实现同步混合录制分别来自于第一电子设备的视频和第二电子设备的音频。An embodiment of the present application provides a method for synchronizing audio and video. The first electronic device can calculate and count delay data (for example, processing delay and transmission delay) of each frame of audio data based on the second electronic device, and When each frame of audio data arrives at the first electronic device, the generation time of each frame of audio data is calculated based on the clock of the first electronic device, thereby realizing synchronous mixed recording of the video from the first electronic device and the second electronic device. 's audio.
本申请实施例提供的音频与视频同步的方法可以应用于如图1所示的拍摄场景中。参见图1,该拍摄场景10可以包括第一电子设备11及第二电子设备12等。其中,第一电子设备11在拍摄者侧,第二电子设备12在被拍摄对象侧,且拍摄者和被拍摄对象之间的距离较远。例如,拍摄者和被拍摄对象之间的距离,大于拍摄者侧的本地设备11的麦克风等收音模块的有效收音距离。示例性的,可以通过第一电子设备11录制被拍摄对象侧的视频。同时,被拍摄对象附近的第二电子设备12可以录制被拍摄对象侧的音频,并将录制的音频传输给第一电子设备11。其中,第二电子设备12可以基于多种通信协议中的任意一种通信协议将音频传输给第一电子设备11,例如,通信协议可以包括服务器中转传输、P2P传输、蓝牙、zigbee、Wi-Fi传输等。也就是说,第二电子设备12与第一电子设备11之间可以基于上述任意一种通信协议建立音频通话。第一电子设备11可以基于第二电子设备12端内计算统计的每帧音频数据的处理时延和传输时延,以及每帧 音频数据到达第一电子设备11的时刻,以第一电子设备11的时钟为基准,计算每帧音频数据的产生时刻,最终来自不同电子设备的音频和视频的同步混合录制可以转换为近似同一设备的音频和视频的同步混合录制,由第一电子设备11将第一电子设备11录制的视频和第二电子设备12录制的音频进行同步混合录制。The method for synchronizing audio and video provided in this embodiment of the present application can be applied to the shooting scene shown in FIG. 1 . Referring to FIG. 1 , the shooting scene 10 may include a first electronic device 11 and a second electronic device 12 and the like. The first electronic device 11 is on the side of the photographer, the second electronic device 12 is on the side of the object to be photographed, and the distance between the photographer and the object to be photographed is relatively far. For example, the distance between the photographer and the object to be photographed is greater than the effective sound pickup distance of a sound pickup module such as a microphone of the local device 11 on the photographer's side. Exemplarily, a video on the side of the photographed object may be recorded by the first electronic device 11 . At the same time, the second electronic device 12 near the photographed object may record the audio on the side of the photographed object, and transmit the recorded audio to the first electronic device 11 . The second electronic device 12 may transmit audio to the first electronic device 11 based on any one of multiple communication protocols. For example, the communication protocol may include server relay transmission, P2P transmission, Bluetooth, zigbee, Wi-Fi transmission, etc. That is to say, an audio call can be established between the second electronic device 12 and the first electronic device 11 based on any one of the above communication protocols. The first electronic device 11 can calculate the processing delay and transmission delay of each frame of audio data based on the calculation and statistics in the second electronic device 12 , and the moment when each frame of audio data reaches the first electronic device 11 . The clock is used as a reference to calculate the generation time of each frame of audio data. Finally, the synchronous mixed recording of audio and video from different electronic devices can be converted into a synchronous mixed recording of audio and video of the same device. The video recorded by one electronic device 11 and the audio recorded by the second electronic device 12 are simultaneously mixed and recorded.
其中,第一电子设备11通常在拍摄者和被拍摄对象之间的距离大于手机有效收音距离的情况下,进入远程音频采集的状态。可以理解的是,在拍摄者和被拍摄对象之间的距离较小的时候,第一电子设备11可以正常录制视频和音频,也可以选择进入远程音频采集的状态,分别由第一电子设备11采集视频,由第二电子设备12采集音频,本申请实施例对此不作限定。Wherein, the first electronic device 11 usually enters the state of remote audio collection when the distance between the photographer and the object to be photographed is greater than the effective sound-receiving distance of the mobile phone. It can be understood that, when the distance between the photographer and the subject is small, the first electronic device 11 can record video and audio normally, or can choose to enter the state of remote audio collection, respectively, by the first electronic device 11. Video is collected, and audio is collected by the second electronic device 12, which is not limited in this embodiment of the present application.
例如,第一电子设备和第二电子设备可以分别是手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等电子设备中的任意一个。For example, the first electronic device and the second electronic device may be a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, a super mobile device, respectively. Any of electronic devices such as a personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA).
示例性的,图2示出了上述电子设备的结构示意图,该电子设备可以是上述第一电子设备,也可以是上述第二电子设备。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。Exemplarily, FIG. 2 shows a schematic structural diagram of the foregoing electronic device, and the electronic device may be the foregoing first electronic device or the foregoing second electronic device. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了***的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry  processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。 Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。The modem processor may include a modulator and a demodulator. Wherein, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and passed to the application processor. The application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 . In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,WiFi)网络),蓝牙(bluetooth,BT),全球导航卫星***(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (WiFi) networks), bluetooth (BT), global navigation satellite systems ( global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯***(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS 可以包括全球卫星定位***(global positioning system,GPS),全球导航卫星***(global navigation satellite system,GLONASS),北斗卫星导航***(beidou navigation satellite system,BDS),准天顶卫星***(quasi-zenith satellite system,QZSS)和/或星基增强***(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi satellite system) -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作***,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电 话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 . The internal memory 121 may include a storage program area and a storage data area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。Motor 191 can generate vibrating cues. The motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. The motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
当电子设备为第一电子设备或第二电子设备等不同角色的设备时,也可以包括不同的部件。例如,在上述拍摄场景10中,由于第二电子设备12用于录制音频,因此第二 电子设备12包括音频模块170,可不包括摄像头193;而第一电子设备11用于录制视频,因此本地设备包括摄像头193。When the electronic device is a device with different roles, such as the first electronic device or the second electronic device, different components may also be included. For example, in the above shooting scene 10, since the second electronic device 12 is used for recording audio, the second electronic device 12 includes the audio module 170, but may not include the camera 193; while the first electronic device 11 is used for recording video, so the local device Camera 193 is included.
在本申请实施例中,以拍摄场景10为例,第一电子设备11可以通过摄像头193录制被拍摄对象侧的视频,第二电子设备12可以通过音频模块170中的麦克风170C录制被拍摄对象侧的音频,并将录制的音频传输至第一电子设备11,第一电子设备11以自身的时钟为基准,计算出第二电子设备12录制的音频的产生时刻,并通过处理器110将录制的视频和接收到的音频进行同步混合录制。In this embodiment of the present application, taking the shooting scene 10 as an example, the first electronic device 11 can record the video on the side of the photographed object through the camera 193 , and the second electronic device 12 can record the side of the photographed object through the microphone 170C in the audio module 170 and transmits the recorded audio to the first electronic device 11. The first electronic device 11 uses its own clock as a reference to calculate the generation time of the audio recorded by the second electronic device 12, and uses the processor 110 to convert the recorded audio. Video and received audio are recorded simultaneously and mixed.
以下将以在拍摄场景10中,第一电子设备和第二电子设备均为手机,第一电子设备采用视频录制应用拍摄视频为例,对本申请实施例提供的音频与视频同步的方法进行阐述。The method for synchronizing audio and video provided by the embodiment of the present application will be described below by taking as an example that the first electronic device and the second electronic device are both mobile phones and the first electronic device uses a video recording application to shoot video in the shooting scene 10 .
参见图3,该方法可以包括:Referring to Figure 3, the method may include:
S301、第一手机打开视频录制应用,进入预览界面,该预览界面中包括远程连接图标。S301. The first mobile phone opens a video recording application, and enters a preview interface, where the preview interface includes a remote connection icon.
第一手机可以响应于用户打开视频录制应用的操作来打开该视频录制应用,进而进入某一录像模式的预览界面。其中,某一录像模式为“普通录像”、“延时录像”或“开直播”等录像模式中的任意一种。视频录制应用可以是
Figure PCTCN2021134168-appb-000001
或其他具有视频录制能力的应用。本申请对此不作限定。
The first mobile phone may open the video recording application in response to the user's operation of opening the video recording application, and then enter a preview interface of a certain recording mode. Among them, a certain recording mode is any one of the recording modes such as "normal recording", "delayed recording" or "live streaming". Video recording applications can be
Figure PCTCN2021134168-appb-000001
or other apps with video recording capability. This application does not limit this.
在一些实施例中,第一手机可以响应于用户对视频录制应用的图标的点击操作打开该视频录制应用,第一手机还可以响应于用户的语音输入或手势输入等操作打开视频录制应用,本申请对第一手机打开视频录制应用的方式不作限定。In some embodiments, the first mobile phone can open the video recording application in response to the user's click operation on the icon of the video recording application, and the first mobile phone can also open the video recording application in response to the user's voice input or gesture input. The application does not limit the way of opening the video recording application on the first mobile phone.
在一些实施例中,第一手机可以通过多种方式进入某一拍摄模式的预览界面。可选地,第一手机打开视频录制应用后直接进入某一录像模式的预览界面。又可选地,第一手机打开视频录制应用后显示非预览界面的其他界面,例如视频浏览界面或视频选择界面等,第一手机在检测到用户进入录像模式的操作之后才进入某一录像模式的预览界面。这里,用户进入录像模式的操作可以包括点击特定控件、语音输入或手势输入等,本申请对此不作限定。In some embodiments, the first mobile phone can enter the preview interface of a certain shooting mode in multiple ways. Optionally, after opening the video recording application, the first mobile phone directly enters a preview interface of a certain recording mode. Alternatively, after the first mobile phone opens the video recording application, other interfaces other than the preview interface are displayed, such as a video browsing interface or a video selection interface, etc., and the first mobile phone enters a certain video recording mode after detecting that the user has entered the video recording mode. preview interface. Here, the operation of the user entering the recording mode may include clicking on a specific control, voice input or gesture input, etc., which is not limited in this application.
示例性的,图4为第一手机进入某一录像模式后的预览界面的示意图。以某一录像模式为“普通录像”模式为例,如图4所示,预览界面401包括取景窗口402和远程连接图标403。其中,取景窗口402中显示的是被拍摄对象的预览画面。远程连接图标403表示第一手机具有获取远程音频的能力。在一些实施例中,第一手机打开视频录制应用后,该视频录制应用可以调用***接口,判断第一手机是否具有获取远程音频的能力。若确定第一手机具有该能力,则第一手机在预览界面中显示远程连接图标403。也可以认为,若预览界面显示了远程连接图标403,则表示该视频录制应用集成了设备间音视频通话模块,例如,该设备间音视频通话模块可以为华为的CaasKit或声网的RTC SDK等。通过设备间音视频通话模块,被拍摄对象侧的第二手机可以将录制的音频传输到第一手机上。也就是说,远程连接图标403表示第一手机可以与第二手机建立音频通话。Exemplarily, FIG. 4 is a schematic diagram of a preview interface after the first mobile phone enters a certain recording mode. Taking a certain recording mode as the “normal recording” mode as an example, as shown in FIG. 4 , the preview interface 401 includes a viewfinder window 402 and a remote connection icon 403 . Wherein, what is displayed in the viewfinder window 402 is a preview image of the photographed object. The remote connection icon 403 indicates that the first handset has the ability to acquire remote audio. In some embodiments, after the first mobile phone opens the video recording application, the video recording application can call the system interface to determine whether the first mobile phone has the ability to acquire remote audio. If it is determined that the first mobile phone has the capability, the first mobile phone displays a remote connection icon 403 in the preview interface. It can also be considered that if the preview interface displays the remote connection icon 403, it means that the video recording application integrates an audio and video call module between devices. For example, the audio and video call module between devices can be Huawei's CaasKit or SoundNet's RTC SDK, etc. . Through the inter-device audio and video call module, the second mobile phone on the subject side can transmit the recorded audio to the first mobile phone. That is, the remote connection icon 403 indicates that the first cell phone can establish an audio call with the second cell phone.
这里,远程连接图标还可以被称为“音视频通话图标”或“特定图标”等其他名称,本申请实施例对该图标的名称不作限定。Here, the remote connection icon may also be called an "audio/video call icon" or a "specific icon" or other names, and the embodiment of the present application does not limit the name of the icon.
在一些实施例中,若第一手机能够获取远程音频,则第一手机在任意一种录像模式 的预览界面中都可以显示上述远程连接图标。In some embodiments, if the first mobile phone can acquire remote audio, the first mobile phone can display the above-mentioned remote connection icon in the preview interface of any recording mode.
在另一些实施例中,第一手机仅在需要获取远程音频的特定录像模式下才显示上述远程连接图标。例如,视频录制应用中还可以设置有“远景录像”模式,通常在第一手机与被拍摄对象较远时用户才进入远景录像模式,此时,第一手机可能需要获取远程音频。因此,第一手机在进入远景录像这一可能需要获取远程音频的特定录像模式时,在预览界面上显示远程连接图标。In other embodiments, the first mobile phone displays the above-mentioned remote connection icon only in a specific recording mode that needs to acquire remote audio. For example, the video recording application can also be set to a "remote video" mode. Usually, the user enters the remote video mode when the first mobile phone is far away from the object to be photographed. At this time, the first mobile phone may need to obtain remote audio. Therefore, the first mobile phone displays a remote connection icon on the preview interface when it enters the remote video recording, a specific recording mode that may need to acquire remote audio.
在另一些实施例中,第一手机还可以不显示上述远程连接图标。例如,在确定第一手机能够获取远程音频之后,可以直接显示选择窗口,使得用户查看能够与第一手机建立音频通话的其他电子设备。或者,第一手机也可以在确定能够获取远程音频之后,在屏幕上显示第一手机能够获取远程音频的消息,以通知用户第一手机能够获取远程音频。In other embodiments, the first mobile phone may also not display the above-mentioned remote connection icon. For example, after it is determined that the first mobile phone can acquire remote audio, a selection window may be directly displayed, so that the user can view other electronic devices capable of establishing an audio call with the first mobile phone. Alternatively, after determining that the first mobile phone can acquire the remote audio, a message that the first mobile phone can acquire the remote audio can be displayed on the screen to notify the user that the first mobile phone can acquire the remote audio.
S302、第一手机响应于用户对远程连接图标的触发操作,向第二手机发起音频呼叫。S302. The first mobile phone initiates an audio call to the second mobile phone in response to the user's triggering operation on the remote connection icon.
在本申请实施例中,若第一手机在视频录制应用的预览界面中显示连接图标,则表示第一手机能够获取远程音频。第一手机检测到用户对连接图标的触发操作之后,可以显示(例如在弹出的选择窗口中显示)能够与第一手机进行音频通话的联系人信息或电子设备信息,第一手机可以响应于用户对被拍摄对象侧的第二手机的选择,进而向用户选择的第二手机发起音频呼叫。In the embodiment of the present application, if the first mobile phone displays a connection icon in the preview interface of the video recording application, it means that the first mobile phone can obtain remote audio. After the first mobile phone detects the user's triggering operation on the connection icon, it can display (for example, display in a pop-up selection window) contact information or electronic device information that can conduct an audio call with the first mobile phone, and the first mobile phone can respond to the user The selection of the second mobile phone on the side of the photographed object further initiates an audio call to the second mobile phone selected by the user.
其中,能够与第一手机进行通话的电子设备通常具备音频通话应用,例如***电话、畅连通话或微信等。Wherein, the electronic device capable of making a call with the first mobile phone usually has an audio call application, such as a system phone, a Changlian call, or WeChat, and the like.
示例性的,如图5中的(a)所示,第一手机检测到用户对远程连接图标403的点击操作之后,可以弹出如图5中的(b)所示的选择窗口404。其中,选择窗口404中包括能够与第一手机建立音频通话的其他电子设备的信息。例如,选择窗口404中可以包括能够与第一手机建立音频通话的其他电子设备的标识信息。再例如,通常每个电子设备对应一个用户,通过用户信息也可以确定电子设备,因此,选择窗口404中也可以包括能够与第一手机建立音频通话的其他电子设备的用户信息,第一手机可以向选择的用户对应的电子设备发起音频呼叫。当然,第一手机也可以响应于用户的语音输入或手势输入等其他方式触发远程连接图标403,本申请实施例对此不作限定。Exemplarily, as shown in (a) of FIG. 5 , after the first mobile phone detects the user's click operation on the remote connection icon 403 , a selection window 404 as shown in (b) of FIG. 5 may pop up. Wherein, the selection window 404 includes information of other electronic devices capable of establishing an audio call with the first mobile phone. For example, the selection window 404 may include identification information of other electronic devices capable of establishing an audio call with the first mobile phone. For another example, usually each electronic device corresponds to a user, and the electronic device can also be determined by the user information. Therefore, the selection window 404 may also include user information of other electronic devices that can establish an audio call with the first mobile phone. An audio call is initiated to the electronic device corresponding to the selected user. Of course, the first mobile phone may also trigger the remote connection icon 403 in response to the user's voice input or gesture input, etc., which is not limited in this embodiment of the present application.
之后,第一手机检测到用户对选择窗口404中某一联系对象(电子设备或用户)的选择操作之后,可以向对应的电子设备发起音频呼叫。例如,在场景10中,第二手机在被拍摄对象侧,若第一手机获取被拍摄对象侧的远程音频,则第一手机需要与第二手机建立音频通话。因此,第一手机侧的用户(拍摄者)可以在图5中的(b)所示的选择窗口404中选择“第二手机”,第一手机检测到用户的选择操作后,立即向被拍摄对象侧的第二手机发起音频呼叫。相应地,第二手机上可以接收到来自第一手机的音频呼叫,如图5中的(c)所示。After that, after detecting the user's selection operation on a contact object (electronic device or user) in the selection window 404, the first mobile phone can initiate an audio call to the corresponding electronic device. For example, in scene 10, the second mobile phone is on the side of the object to be photographed. If the first mobile phone obtains the remote audio from the side of the photographed object, the first mobile phone needs to establish an audio call with the second mobile phone. Therefore, the user (photographer) on the side of the first mobile phone can select "second mobile phone" in the selection window 404 shown in (b) of FIG. The second mobile phone on the object side initiates an audio call. Correspondingly, the audio call from the first mobile phone can be received on the second mobile phone, as shown in (c) of FIG. 5 .
在本申请实施例中,可以通过多种途径获取选择窗口404中显示的联系对象的信息。In this embodiment of the present application, the information of the contact object displayed in the selection window 404 can be acquired through various ways.
在一些实施例中,第一手机可以将第一手机的通讯录中开启音频通话功能(例如开启畅连通话)的联系对象的信息显示在选择窗口404中。In some embodiments, the first mobile phone may display in the selection window 404 the information of the contact object whose audio calling function is enabled (for example, enable smooth call) in the address book of the first mobile phone.
在另一些实施例中,第一手机可以发送广播信息搜索预设距离内能够与第一手机建立音频通话的其他电子设备,并将这些电子设备的相关信息或者对应的用户信息显示在选择窗口404中。其中,第一手机可以通过蓝牙、Wi-Fi、zigbee等方式发送广播消息, 预设距离通常与第一手机拍摄的范围相关,本申请对预设距离不作具体限定。In other embodiments, the first mobile phone may send broadcast information to search for other electronic devices within a preset distance that can establish an audio call with the first mobile phone, and display relevant information of these electronic devices or corresponding user information in the selection window 404 middle. The first mobile phone can send broadcast messages through Bluetooth, Wi-Fi, zigbee, etc. The preset distance is generally related to the range of the first mobile phone, and the present application does not specifically limit the preset distance.
其中,选择窗口404中显示的联系对象对应的电子设备可以与第一手机之间建立蓝牙连接,或者选择窗口404中显示的联系对象对应的电子设备可以与第一手机在同一Wi-Fi局域网内,或者选择窗口404中显示的联系对象对应的电子设备与第一手机之前已经建立过音频通话。本申请实施例对此不作限定。The electronic device corresponding to the contact object displayed in the selection window 404 may establish a Bluetooth connection with the first mobile phone, or the electronic device corresponding to the contact object displayed in the selection window 404 may be in the same Wi-Fi local area network as the first mobile phone , or the electronic device corresponding to the contact object displayed in the selection window 404 has established an audio call with the first mobile phone before. This embodiment of the present application does not limit this.
在一些实施例中,若某些电子设备于第一手机之前已经建立过音频通话,则第一手机可以存储这些电子设备的标识信息等相关信息,作为历史联系对象的信息。这样,之后第一手机显示选择窗口404时,选择窗口404中可以包括第一手机中存储的历史联系对象的信息。也就是说,与第一手机之前建立过音频通话的这些电子设备可以显示为选择窗口404中的联系对象。In some embodiments, if some electronic devices have established audio calls before the first mobile phone, the first mobile phone may store identification information and other related information of these electronic devices as information of historical contact objects. In this way, when the first mobile phone displays the selection window 404 later, the selection window 404 may include the information of the historical contact object stored in the first mobile phone. That is, these electronic devices that have previously established an audio call with the first mobile phone can be displayed as contact objects in the selection window 404 .
在一些实施例中,若预览界面中不包括远程连接图标,则第一手机可以通过其他方式向第二手机发起音频呼叫。In some embodiments, if the remote connection icon is not included in the preview interface, the first mobile phone can initiate an audio call to the second mobile phone in other ways.
例如,第一手机在进入需要获取远程音频的特定录像模式,例如远景录像模式之后,可以不显示远程连接图标,而直接显示选择窗口,向用户呈现能够与第一手机建立音频通话的其他电子设备。第一手机检测到用户选择第二手机后,向第二手机发起音频呼叫。再例如,第一手机还可以在预览界面中不显示远程连接图标的情况下,响应于用户的语音输入,例如“与第二手机建立音频连接”而直接向第二手机发起音频呼叫。本申请实施例对此不作限定。For example, after the first mobile phone enters a specific video recording mode that needs to acquire remote audio, such as a remote video recording mode, the remote connection icon may not be displayed, but a selection window may be directly displayed to present the user with other electronic devices capable of establishing an audio call with the first mobile phone . After detecting that the user selects the second mobile phone, the first mobile phone initiates an audio call to the second mobile phone. For another example, the first mobile phone may directly initiate an audio call to the second mobile phone in response to a user's voice input, such as "establish an audio connection with the second mobile phone", without displaying the remote connection icon in the preview interface. This embodiment of the present application does not limit this.
S303、第二手机响应于用户的接听操作,录制音频并将该音频传输至第一手机。S303. In response to the answering operation of the user, the second mobile phone records audio and transmits the audio to the first mobile phone.
在第二手机上接收到来自第一手机的音频呼叫后,若用户选择接听该音频呼叫,则第二手机开始录制被拍摄对象侧的音频,并将该音频传输至第一手机。After receiving the audio call from the first mobile phone on the second mobile phone, if the user chooses to answer the audio call, the second mobile phone starts recording the audio on the side of the photographed object and transmits the audio to the first mobile phone.
例如,第二手机接收到来自第一手机的音频呼叫时,第二手机上显示如图5中的(c)所示的待接听通话界面。若用户选择“接受”控件,则第二手机与第一手机之间建立音频通话,第二手机上显示如图5中的(d)所示的正在进行通话的界面,第二手机开始录制被拍摄对象侧的音频,并与第一手机建立音频通话,将录制的音频传输至第一手机。其中,若第二手机检测到用户触发图5中的(d)中的“挂断”控件,则第二手机可以结束与第一手机之间的音频通话。For example, when the second mobile phone receives an audio call from the first mobile phone, the second mobile phone displays an interface for a call to be answered as shown in (c) in FIG. 5 . If the user selects the "accept" control, an audio call is established between the second mobile phone and the first mobile phone, the second mobile phone displays the interface of the ongoing call as shown in (d) in FIG. 5 , and the second mobile phone starts to record the The audio on the side of the subject is photographed, an audio call is established with the first mobile phone, and the recorded audio is transmitted to the first mobile phone. Wherein, if the second mobile phone detects that the user triggers the "hang up" control in (d) of FIG. 5 , the second mobile phone can end the audio call with the first mobile phone.
在第二手机与第一手机之间建立音频通话时,第一手机也开始获取来自第二手机的远程音频,如图6中的(a)所示,第一手机在预览界面中可以显示提示框601,以提示拍摄者第一手机正在获取来自第二手机的被拍摄对象侧的远端音频。此时,预览界面中的远程连接图标403可以继续存在,也可以暂时不显示,直到音频采集结束时再显示。在一些实施例中,若第一手机检测到拍摄者触发结束音频控件604,则第一手机可以结束与第二手机之间的音频通话。When an audio call is established between the second mobile phone and the first mobile phone, the first mobile phone also starts to obtain the remote audio from the second mobile phone, as shown in (a) in FIG. 6 , the first mobile phone can display a prompt in the preview interface Block 601, to prompt the photographer that the first mobile phone is acquiring the far-end audio from the second mobile phone on the side of the photographed object. At this time, the remote connection icon 403 in the preview interface may continue to exist, or may not be displayed temporarily until the end of the audio collection. In some embodiments, if the first mobile phone detects that the photographer has triggered the end audio control 604, the first mobile phone can end the audio call with the second mobile phone.
下面将结合图7,具体描述第二手机采集音频数据并将该音频数据传输至第一手机的过程:Below in conjunction with Fig. 7, the process that the second mobile phone collects audio data and transmits this audio data to the first mobile phone will be described in detail:
第二手机检测到用户的接听操作后,可以启动音频采集模块的麦克风等收音装置,以采集被拍摄对象侧的音频数据,如图7中的(a)所示,第二手机通过音频采集模块开始采集音频数据之后,采集到的音频数据依次经过音频组装模块、音频编码模块和缓存发送模块进行处理。之后,第二手机将处理后的音频数据通过服务器中转传输、P2P传 输、蓝牙、zigbee、Wi-Fi等等多种通信协议之一传输至第一手机。其中,音频组装模块将采集到的音频数据组装成预设大小的数据包;音频编码模块中将输入的数据包进行压缩;在缓存发送模块中压缩后的数据包排成队列等待传输。需要说明的是,若第二手机和第一手机都可以连接至同一个路由器,优选使用WiFi-P2P链路传输音频数据,从而获得更稳定的传输效果。After the second mobile phone detects the user's answering operation, it can activate the audio pickup device such as the microphone of the audio collection module to collect the audio data of the object to be photographed. As shown in (a) in Figure 7, the second mobile phone passes the audio collection module After starting to collect audio data, the collected audio data is processed through the audio assembling module, the audio coding module and the buffer sending module in sequence. After that, the second mobile phone transmits the processed audio data to the first mobile phone through one of multiple communication protocols such as server relay transmission, P2P transmission, Bluetooth, zigbee, Wi-Fi, etc. The audio assembling module assembles the collected audio data into data packets of preset size; the audio coding module compresses the input data packets; the compressed data packets in the buffer sending module are queued for transmission. It should be noted that, if both the second mobile phone and the first mobile phone can be connected to the same router, the WiFi-P2P link is preferably used to transmit audio data, so as to obtain a more stable transmission effect.
其中,在第二手机处理音频数据的过程中存在处理时延delay_remote,目前多种智能语音识别算法可能根据需求动态调整音频采集的频率、组装比或编码方式等参数,从而导致处理时延发生变化。同时,在第二手机将处理后的音频数据传输至第一手机的过程中存在传输时延delay_trans,网络质量的变化可能导致传输时延发生变化。为了适应上述情况可能带来的处理时延的变化和传输时延的变化,本申请实施例中针对每帧音频数据计算处理时延delay_remote和传输时延delay_trans。每帧音频数据的时延数据为处理时延和传输时延之和。需要说明的是,每帧音频数据表示音频组装模块中生成的每个音频数据包。Among them, there is a processing delay delay_remote in the process of processing the audio data by the second mobile phone. At present, various intelligent speech recognition algorithms may dynamically adjust parameters such as the frequency of audio collection, assembly ratio or encoding method according to the requirements, resulting in changes in the processing delay. . At the same time, there is a transmission delay delay_trans in the process that the second mobile phone transmits the processed audio data to the first mobile phone, and changes in network quality may cause changes in the transmission delay. In order to adapt to changes in processing delay and transmission delay that may be caused by the above situation, in this embodiment of the present application, the processing delay delay_remote and the transmission delay delay_trans are calculated for each frame of audio data. The delay data of each frame of audio data is the sum of the processing delay and the transmission delay. It should be noted that each frame of audio data represents each audio data packet generated in the audio assembly module.
下面以音频采样频率为44Khz,组装比为50hz,用户可接收的音频视频的时延偏差在[-120ms,+40ms]范围内为例,具体描述每帧音频数据的处理时延和传输时延的计算过程。The following takes the audio sampling frequency of 44Khz, the assembly ratio of 50hz, and the delay deviation of the audio and video that the user can receive within the range of [-120ms, +40ms] as an example to describe the processing delay and transmission delay of each frame of audio data in detail. calculation process.
其中,每帧音频数据的处理时延为每帧音频数据经过的各个处理模块的处理时延之和。The processing delay of each frame of audio data is the sum of the processing delays of each processing module that each frame of audio data passes through.
音频采集模块的采样频率为44Khz表示,音频采集模块每秒可以产生44000个小数据包。音频采集模块将采集到的小数据包输出至音频组装模块。音频组装模块通过缓冲区机制组装数据包,也就是说,音频采集模块输出至音频组装模块的小数据包的数量满足预设数量时,音频组装模块生成一个音频数据包,即一帧音频数据。这里的预设数量与音频组装模块的组装比相关。例如,音频组装模块的组装比为50hz表示音频组装模块每秒组装50个音频数据包。这样,音频采集模块输出至音频组装模块的小数据包累积够44000/50=880个时,音频组装模块将这880个小数据包组装成一个音频数据包,即一帧音频数据。显然,音频组装模块组装一个音频数据包所需的时间是与组装比相关的固定值。在该示例中可以直接得出,音频组装模块的处理时延为delay_1=1/50hz=20ms,无需统计计算。The sampling frequency of the audio acquisition module is 44Khz, which means that the audio acquisition module can generate 44,000 small data packets per second. The audio collection module outputs the collected small data packets to the audio assembly module. The audio assembly module assembles data packets through the buffer mechanism, that is, when the number of small data packets output by the audio acquisition module to the audio assembly module meets the preset number, the audio assembly module generates an audio data packet, that is, a frame of audio data. The preset number here is related to the assembly ratio of the audio assembly module. For example, an audio assembly module with an assembly ratio of 50hz means that the audio assembly module assembles 50 audio packets per second. In this way, when there are 44000/50=880 small data packets output by the audio collection module to the audio assembly module, the audio assembly module assembles the 880 small data packets into one audio data packet, that is, one frame of audio data. Obviously, the time required by the audio assembly module to assemble one audio data packet is a fixed value related to the assembly ratio. In this example, it can be directly concluded that the processing delay of the audio assembly module is delay_1=1/50hz=20ms, and no statistical calculation is required.
在一些实施例中,由于通常delay_1在用户可接受的音频视频的时延偏差范围[-120ms,+40ms]内,因此在计算处理时延时可以忽略delay_1。In some embodiments, since delay_1 is generally within the range of delay deviation of audio and video acceptable to the user [-120ms, +40ms], delay_1 can be ignored during calculation and processing.
如图7中的(b)所示,这里将在音频组装模块中生成每个音频数据包的时间戳记为t 1,由于生成的每个音频数据包直接输出至音频编码模块,因此也可以认为,每个音频数据包进入音频编码模块的时间戳为t 1As shown in (b) of FIG. 7 , here the timestamp of each audio data packet generated in the audio assembly module is marked as t 1 . Since each generated audio data packet is directly output to the audio coding module, it can also be considered that , the timestamp of each audio data packet entering the audio coding module is t 1 .
音频编码模块可以对输入的每个音频数据包进行压缩,具体的压缩方式与编码格式以及电子设备的算力(例如,软件编码方式或硬件编码方式等)相关。通常在音频编码模块中采用流水线处理方式,因此可以通过每个音频数据包进入音频编码模块的时间戳和输出音频编码模块的时间戳精确统计出音频编码模块的处理时延delay_2。若将每个音频数据包输出音频编码数据包的时间戳记为t 2,则音频编码模块的处理时延delay_2=t 2-t 1。同样,由于音频编码模块输出的每个音频数据包直接进入缓存发送模块,因此可以认为, 每个音频数据包进入缓存发送模块的时间戳为t 2The audio encoding module can compress each input audio data packet, and the specific compression method is related to the encoding format and the computing power of the electronic device (eg, software encoding method or hardware encoding method, etc.). Usually, the pipeline processing method is adopted in the audio coding module, so the processing delay delay_2 of the audio coding module can be accurately counted according to the time stamp of each audio data packet entering the audio coding module and the time stamp of the output audio coding module. If the time stamp of each audio data packet outputting the audio coding data packet is t 2 , then the processing delay of the audio coding module is delay_2=t 2 −t 1 . Likewise, since each audio data packet output by the audio coding module directly enters the buffer sending module, it can be considered that the time stamp of each audio data packet entering the buffer sending module is t 2 .
在缓存发送模块中,每个音频数据包可以排成队列,等待经由空口被传输至第一手机。缓存发送模块的处理时延delay_3与第二手机中设置的发送缓冲区的大小相关,缓存发送模块的处理时延delay_3也可以通过每个音频数据包进入缓存发送模块的时间戳和输出缓存发送模块的时间戳来精确统计。若将每个音频数据包输出缓冲发送模块的时间戳记为t 3,则delay_3=t 3-t 2。这里,也可以认为t 3即为每个音频数据包发送的时间戳。 In the buffer sending module, each audio data packet can be queued, waiting to be transmitted to the first mobile phone via the air interface. The processing delay delay_3 of the buffer sending module is related to the size of the sending buffer set in the second mobile phone. The processing delay delay_3 of the buffer sending module can also enter the timestamp of each audio data packet into the buffer sending module and output the buffer sending module timestamp for accurate statistics. If the time stamp of each audio data packet output buffer sending module is t 3 , then delay_3=t 3 -t 2 . Here, it can also be considered that t 3 is the time stamp sent by each audio data packet.
这样,第二手机分别计算出每个音频数据包在上述各个模块中的处理时延。由于处理时延delay_remote为上述各模块的处理时延之和,因此delay_remote=delay_1+delay_2+delay_3。In this way, the second mobile phone separately calculates the processing delay of each audio data packet in the above modules. Since the processing delay delay_remote is the sum of the processing delays of the above modules, delay_remote=delay_1+delay_2+delay_3.
在本申请实施例中,由于delay_1可以忽略,此外,音频和视频混合录制时基于每个音频数据包的产生时刻,因而也可以认为delay_remote=delay_2+delay_3。通过计算各个模块的处理时延,进而计算总的处理时延的方法具备普适性,在处理过程发生调整改变的情况下,仍然可以通过计算此时各个模块的时延得到总的处理时延。In the embodiment of the present application, since delay_1 can be ignored, and in addition, the generation time of each audio data packet is based on the mixed recording of audio and video, so it can also be considered that delay_remote=delay_2+delay_3. The method of calculating the total processing delay by calculating the processing delay of each module is universal. When the processing process is adjusted and changed, the total processing delay can still be obtained by calculating the delay of each module at this time. .
在另一些实施例中,第二手机还可以基于生成每个音频数据包的时间戳t 1和每个音频数据包输出缓存发送模块的时间戳t 3直接计算处理时延delay_remote。此时,处理时延delay_remote=t 3-t 1。这种直接计算处理时延的方法更加简单方便,能够直接一步计算出每个音频数据包对应的处理时延。 In other embodiments, the second mobile phone may also directly calculate the processing delay delay_remote based on the time stamp t1 of generating each audio data packet and the time stamp t3 of the output buffer sending module of each audio data packet. At this time, the processing delay delay_remote=t 3 -t 1 . This method of directly calculating the processing delay is more simple and convenient, and can directly calculate the processing delay corresponding to each audio data packet in one step.
每个音频数据包从缓存发送模块输出之后,可以经由空口传输至第一手机。每个音频数据包从第二手机发出的时间与第一手机接收到每个音频数据包的时间之间存在时延,即传输时延delay_trans。After each audio data packet is output from the buffer sending module, it can be transmitted to the first mobile phone via the air interface. There is a delay between the time when each audio data packet is sent from the second mobile phone and the time when each audio data packet is received by the first mobile phone, that is, the transmission delay delay_trans.
示例性的,第二手机可以通过向第一手机发送探测报文来计算传输时延。第二手机将探测报文发送至第一手机,第一手机接收到探测报文后不做任何处理,立即由原路将探测报文返回至第二手机。第二手机将发送探测报文的时间标记为发送时间戳,记为t 4,将接收到返回的探测报文的时间标记为接收时间戳,记为t 5。这样,第二手机可以计算得出,传输时延delay_trans=(t 5-t 4)/2。 Exemplarily, the second mobile phone may calculate the transmission delay by sending a probe packet to the first mobile phone. The second mobile phone sends the detection packet to the first mobile phone, and the first mobile phone does not do anything after receiving the detection packet, and immediately returns the detection packet to the second mobile phone through the original path. The second mobile phone marks the time of sending the probe packet as a sending timestamp, denoted as t 4 , and marks the time at which the returned probe packet is received as a reception timestamp, denoted as t 5 . In this way, the second mobile phone can calculate that the transmission delay delay_trans=(t 5 -t 4 )/2.
在一些实施例中,第二手机可以以预设频率发送探测报文来计算当前的传输时延,通常该预设频率低于组装比。这样,传输的探测报文较少,能够减少产生额外流量。在另一些实施例中,为了提高精度,第二手机也可以以组装比的频率周期性地发送探测报文,从而得到每个音频数据包的传输时延。由于音频通话对带宽的要求较低,因此按照组装比的频率发送探测报文不会影响音频数据传输。In some embodiments, the second mobile phone may send a probe packet at a preset frequency to calculate the current transmission delay, and usually the preset frequency is lower than the assembly ratio. In this way, fewer probe packets are transmitted, which can reduce the generation of extra traffic. In other embodiments, in order to improve the accuracy, the second mobile phone may also periodically send detection packets at the frequency of the assembly ratio, so as to obtain the transmission delay of each audio data packet. Since audio calls have low bandwidth requirements, sending probe packets at the frequency of the assembly ratio will not affect audio data transmission.
上面分别介绍了第二手机计算每个音频数据包的处理时延以及传输时延的过程,根据处理时延和传输时延即可获得每帧音频数据的时延数据。第一手机可以根据每个音频数据包到达的时间戳以及上述时延数据,以第一手机的时钟为基准,计算出每个音频数据包在第二手机中开始生成的时刻,即每帧音频数据的产生时刻。若每个音频数据包到达第一手机(即第一手机接收到每个音频数据包)的时间戳为t_arrive,则第一手机可以计算出每帧音频数据的产生时刻t_start。其中,t_start=t_arrive-(delay_remote+delay_trans)。这样,第一手机可以以第一手机的时钟为基准获取每帧音频数据的产生时刻,也就是说,可以将第二手机的音频与第一手机的视频同步混合转换成近似第一手机的音频和视频同步混合。The process of calculating the processing delay and transmission delay of each audio data packet by the second mobile phone is described above, and the delay data of each frame of audio data can be obtained according to the processing delay and the transmission delay. The first mobile phone can calculate the time when each audio data packet starts to be generated in the second mobile phone according to the time stamp of the arrival of each audio data packet and the above-mentioned time delay data, taking the clock of the first mobile phone as a reference, that is, each frame of audio frequency. The moment when the data was generated. If the time stamp when each audio data packet arrives at the first mobile phone (that is, the first mobile phone receives each audio data packet) is t_arrive, the first mobile phone can calculate the generation time t_start of each frame of audio data. Wherein, t_start=t_arrive-(delay_remote+delay_trans). In this way, the first mobile phone can obtain the generation time of each frame of audio data based on the clock of the first mobile phone, that is to say, the audio of the second mobile phone and the video of the first mobile phone can be synchronously mixed and converted to approximate the audio of the first mobile phone Mixed in sync with the video.
为了使得第一手机能够精准计算并统计出每个音频数据包开始生成的时刻,第二手机还需要将计算得到每个音频数据包的时延数据,例如处理时延和传输时延,也传输给第一手机。由于时延数据不能与音频数据混合传输,否则可能影响第一手机对音频数据的解码,因此本申请实施例提供了多种时延数据的传输方法,以下通过举例进行说明。In order to enable the first mobile phone to accurately calculate and count the moment when each audio data packet starts to be generated, the second mobile phone also needs to calculate the delay data of each audio data packet, such as processing delay and transmission delay, and also transmit Give the first phone. Since the delay data cannot be mixed and transmitted with the audio data, otherwise the decoding of the audio data by the first mobile phone may be affected. Therefore, the embodiments of the present application provide various transmission methods for the delay data, which are described below by way of examples.
在一些实施例中,时延数据可以与音频数据一起通过实时传输协议(real-time transport protocol,RTP)报文从第二手机传输至第一手机。例如下面描述的传输方法1和传输方法2。此时,也可以认为,音频数据包为RTP报文。In some embodiments, the delay data may be transmitted from the second mobile phone to the first mobile phone through a real-time transport protocol (RTP) message together with the audio data. For example, transmission method 1 and transmission method 2 described below. At this time, it can also be considered that the audio data packet is an RTP packet.
传输方法1:Transmission method 1:
在一些实施例中,时延数据可以通过RTP报文标准头中的时戳(timestamp)字段从第二手机传输至第一手机。In some embodiments, the delay data may be transmitted from the second mobile phone to the first mobile phone through a timestamp field in the standard header of the RTP message.
示例性的,参见图8中的(a),示出了RTP报文标准头的格式。对于图8中定义的报文标准头中的字段说明如下:Illustratively, see (a) in FIG. 8 , which shows the format of the standard header of the RTP message. The fields in the standard header of the message defined in Figure 8 are described as follows:
扩展位(X)占用1字节,当X=1时,该RTP报文头之后有效载荷之前可以增加扩展头。The extension bit (X) occupies 1 byte. When X=1, the extension header can be added after the RTP header and before the payload.
时戳(timestamp)占用32字节,时戳反映了该RTP报文的第一个八位组的采样时刻。在一些实施例中,接收者使用时戳来计算延迟和延迟抖动,并进行同步控制。The timestamp (timestamp) occupies 32 bytes, and the timestamp reflects the sampling time of the first octet of the RTP message. In some embodiments, the receiver uses the timestamp to calculate delay and delay jitter, and to perform synchronization control.
版本号(V)表示RTP协议的版本号,版本号占用2字节,当前RTP协议的版本号为2。The version number (V) indicates the version number of the RTP protocol, the version number occupies 2 bytes, and the current version number of the RTP protocol is 2.
填充标志(P)占用1字节。当P=1时,该RTP报文的尾部可以填充一个或多个额外的八位组,这一个或多个八位组不属于有效载荷。The padding flag (P) occupies 1 byte. When P=1, the tail of the RTP message can be filled with one or more additional octets, and the one or more octets do not belong to the payload.
CSRC计数器(CC)占用4字节,指示CSRC标识符的个数。The CSRC counter (CC) occupies 4 bytes and indicates the number of CSRC identifiers.
标记(M)占用1字节。不同的有效载荷具有不同的含义,例如,对于视频,可以标记一帧的结束;对于音频,标记音频会话的开始。The marker (M) occupies 1 byte. Different payloads have different meanings, for example, for video, the end of a frame can be marked; for audio, the start of an audio session can be marked.
有效载荷类型(PT)占用7字节,用于说明RTP报文中有效载荷的类型,如GSM音频、JPEM图像等。The payload type (PT) occupies 7 bytes and is used to describe the type of payload in the RTP message, such as GSM audio, JPEG image, and so on.
序列号(sequence number)占用16字节,用于标识发送方所发送的RTP报文的序列号,每发送一个报文,序列号增1。接收方通过序列号可以检测报文丢失情况,从而重新排序报文,恢复数据。The sequence number (sequence number) occupies 16 bytes and is used to identify the sequence number of the RTP message sent by the sender. Each time a message is sent, the sequence number increases by 1. The receiver can detect the loss of the message through the serial number, so as to reorder the message and restore the data.
同步信源(synchronization source,SSRC)标识符(identifier)占用32字节,用于标识同步信源。该标识符是随机选择的,参加同一视频会议的两个同步信源不能有相同的SSRC。The synchronization source (synchronization source, SSRC) identifier (identifier) occupies 32 bytes and is used to identify the synchronization source. The identifier is randomly selected, and two synchronization sources participating in the same video conference cannot have the same SSRC.
特约信源(contributing source,CSRC)标识符(identifiers)可以有0~15个,每个CSRC标识符占用32字节。每个CSRC标识了包含在该RTP报文有效载荷中的所有特约信源。Contributing source (contributing source, CSRC) identifiers (identifiers) may have 0 to 15, and each CSRC identifier occupies 32 bytes. Each CSRC identifies all privileged sources contained in the payload of the RTP message.
在本申请实施例中,每个音频数据包都可以通过一个RTP协议报文从第二手机传输至第一手机。若该RTP协议报文头中的上述timestamp字段未被使用,则可以直接使用该timestamp字段同时传输时延数据。由于每个音频数据包的处理时延和传输时延之和通常在1000ms内,因此只需占用timestamp字段中的2字节即可。In this embodiment of the present application, each audio data packet may be transmitted from the second mobile phone to the first mobile phone through an RTP protocol message. If the above-mentioned timestamp field in the RTP protocol packet header is not used, the timestamp field can be directly used to transmit the delay data at the same time. Since the sum of the processing delay and transmission delay of each audio data packet is usually within 1000ms, only 2 bytes in the timestamp field are required.
在RTP报文标准头中的timestamp字段未使用的情况下,直接使用该字段传输时延 数据是最方便简单的一种方式,能够在传输每个音频数据包的同时,一并传输对应的时延数据。When the timestamp field in the standard header of the RTP packet is not used, it is the most convenient and simple way to directly use this field to transmit the delay data. extended data.
传输方法2:Transmission method 2:
在RTP报文标准头中的timestamp字段被使用的情况下,在一些实施例中,时延数据可以通过RTP报文扩展头字段从第二手机传输至第一手机。In the case where the timestamp field in the standard header of the RTP message is used, in some embodiments, the delay data may be transmitted from the second mobile phone to the first mobile phone through the extended header field of the RTP message.
示例性的,如图8中的(b)所示,当上述扩展位X为1时,可以在RTP报文头之后,有效载荷之前增加扩展头。这样,可以在通过RTP报文传输每个音频数据包的同时,通过扩展头字段传输对应的时延数据。同样,传输时延数据只需占用扩展头字段中的2字节即可。Exemplarily, as shown in (b) of FIG. 8 , when the above-mentioned extension bit X is 1, an extension header may be added after the RTP header and before the payload. In this way, while transmitting each audio data packet through the RTP message, the corresponding delay data can be transmitted through the extended header field. Likewise, the transmission delay data only needs to occupy 2 bytes in the extended header field.
在一些实施例中,本申请实施例使用的报文扩展头字段是不加密的。In some embodiments, the packet extension header field used in the embodiments of this application is not encrypted.
可以理解的是,该操作需要在发送端修改RTP协议头组装逻辑代码,以及接收端的RTP协议头解析逻辑代码,从而实现时延数据的***和提取。It can be understood that this operation needs to modify the RTP protocol header assembly logic code at the sending end, and the RTP protocol header parsing logic code at the receiving end, so as to realize the insertion and extraction of delay data.
上述两种传输时延数据的方式都是基于媒体通道的报文头,例如RTP报文头,在传输每个音频数据包的同时传输每个音频数据包对应的时延数据。也就是说,上述两种传输时延的方式都是基于媒体通道这一单通道进行的。The above two methods of transmitting delay data are based on the header of the media channel, such as the RTP header, and transmit the delay data corresponding to each audio data packet while transmitting each audio data packet. That is to say, the above two transmission delay methods are based on the single channel of the media channel.
在另一些实施例中,可以通过不同的通信通道分别传输音频数据和时延数据,如下面的传输方法3。也可以认为,音频数据仍然通过RTP报文进行传输,而时延数据以控制信令格式传输。In other embodiments, the audio data and the time delay data may be respectively transmitted through different communication channels, as shown in the following transmission method 3. It can also be considered that the audio data is still transmitted through the RTP message, while the delay data is transmitted in the control signaling format.
传输方法3:Transmission method 3:
在一些实施例中,时延数据可以控制信令格式随控制信令通道传输,而不需要与音频数据一同随媒体通道传输。其中,控制信令通道可以传输例如通话启停、握手或认证等控制信息。示例性的,控制信令通道可以基于传输控制协议(transmission control protocol,TCP)来传输信息和数据。通过控制信令格式传输的数据经由控制信令通道传输。In some embodiments, the delay data may be transmitted with the control signaling channel in the control signaling format, and need not be transmitted with the audio data with the media channel. Among them, the control signaling channel can transmit control information such as call start and stop, handshake or authentication. Exemplarily, the control signaling channel may transmit information and data based on a transmission control protocol (transmission control protocol, TCP). Data transmitted through the control signaling format is transmitted via the control signaling channel.
可以理解的是,由于通过控制信令通道传输时延数据时,每个音频数据包与对应的时延数据不存在于同一个报文中,因此通过控制信令通道传输时延数据时需要添加帧序列号标签。It is understandable that each audio data packet and the corresponding delay data do not exist in the same packet when the delay data is transmitted through the control signaling channel, so it is necessary to add the delay data when transmitting the delay data through the control signaling channel. Frame serial number label.
例如,每个音频数据包对应一个帧序列号,通过控制信令通道传输时延数据时,需要对每个音频数据包的时延数据添加该音频数据包对应的帧序列号信息,标明该时延数据对应哪一个音频数据包。之后,示例性的,第二手机可以通过周期性发送方式(例如,随心跳报文负载同步传输)将添加了对应的帧序列号信息的时延数据周期性地传输至第一手机。再示例性的,第二手机也可以将添加了对应的帧序列号信息的时延数据一次性传输至第一手机。For example, each audio data packet corresponds to a frame sequence number. When the delay data is transmitted through the control signaling channel, the frame sequence number information corresponding to the audio data packet needs to be added to the delay data of each audio data packet, indicating the time Which audio packet the delay data corresponds to. Afterwards, for example, the second mobile phone may periodically transmit the delay data to which the corresponding frame sequence number information is added to the first mobile phone by means of periodic transmission (for example, synchronous transmission with the heartbeat packet load). For another example, the second mobile phone may also transmit the delay data to which the corresponding frame sequence number information is added to the first mobile phone at one time.
在一些实施例中,时延数据还可以被压缩,从而节省数据传输量。示例性的,在网络较稳定的情况下,每个音频数据包的处理时延和传输时延也较稳定,变化较小。例如,若一段连续音频数据帧的时延数据的方差小于或等于预设偏差范围,则可以不对每个时延数据添加对应的帧序列号信息,而直接取这段连续音频数据帧的时延平均值,并添加这段音频数据帧的开始帧序列号和终止帧序列号。这样可以通过数据压缩来减少时延数据的数据量,从而大量节省数据传输量。In some embodiments, the delayed data may also be compressed, thereby saving data transmission volume. Exemplarily, in a situation where the network is relatively stable, the processing delay and transmission delay of each audio data packet are also relatively stable, with small changes. For example, if the variance of the delay data of a continuous audio data frame is less than or equal to the preset deviation range, the corresponding frame sequence number information may not be added to each delay data, but the delay average of this continuous audio data frame can be directly taken. Average, and add the start frame sequence number and end frame sequence number of this audio data frame. In this way, the amount of time-delayed data can be reduced through data compression, thereby saving a large amount of data transmission.
在步骤S303中,第二手机开始采集音频数据,并将采集的音频数据传输至第一手机,同时,第二手机还可以通过传输方法1至传输方法3中任一方法将处理音频数据过程中的处理时延和传输音频数据过程中的传输时延也传输至第一手机,以便第一手机根据接收到的处理时延和传输时延,以及第一手机接收到每个音频数据包的时刻,以第一手机的时钟为基准,计算出每个音频数据包的产生时刻,从而根据现有的同设备音频和视频同步录制的方式(即同源音视频同步方式)将接收到的音频与第一手机录制的视频进行同步混合录制。In step S303, the second mobile phone starts to collect audio data, and transmits the collected audio data to the first mobile phone. At the same time, the second mobile phone can also use any one of the transmission methods 1 to 3 to process the audio data. The processing delay and transmission delay in the process of transmitting audio data are also transmitted to the first mobile phone, so that the first mobile phone can receive each audio data packet according to the received processing delay and transmission delay, and the time when the first mobile phone receives each audio data packet. , based on the clock of the first mobile phone, calculate the generation time of each audio data packet, so as to compare the received audio with the The video recorded by the first mobile phone is synchronously mixed and recorded.
S304、第一手机响应于用户的录像操作,录制被拍摄对象侧的视频,获取来自第二手机的音频。S304. The first mobile phone records the video on the side of the photographed object in response to the user's video recording operation, and acquires the audio from the second mobile phone.
在本申请实施例中,在第一手机与第二手机之间建立音频通话之后,第一手机可以响应于用户的录像操作开始录制被拍摄对象侧的视频。In this embodiment of the present application, after an audio call is established between the first mobile phone and the second mobile phone, the first mobile phone may start to record a video on the side of the photographed object in response to a user's recording operation.
例如,拍摄者点击图6中的(a)所示的录像控件602后,第一手机开始录制视频,显示图6中的(b)所示的拍摄界面。此时,录像控件602变为停止录像控件603。可以理解的是,录像操作还可以是语音输入或手势输入等,本申请对此不作限定。For example, after the photographer clicks the recording control 602 shown in (a) in FIG. 6 , the first mobile phone starts to record a video, and displays the shooting interface shown in (b) in FIG. 6 . At this time, the recording control 602 becomes the stop recording control 603 . It can be understood that the recording operation may also be voice input or gesture input, which is not limited in this application.
在一些实施例中,第一手机检测到用户的录像操作后,不会立即开始录制被拍摄对象侧的视频。此时,第一手机可以先向第二手机发送请求消息,例如,在第二手机中弹出选择框,在选择框中可以询问第二手机侧的用户是否同意第一手机录制被拍摄对象侧的视频,在第二手机侧的用户选择同意拍摄的情况下,第一手机开始录制被拍摄对象侧的视频,显示如图6中的(b)所示的界面。若第二手机侧的用户选择不同意拍摄,则第一手机无法录制被拍摄对象侧的视频。示例性的,在第二手机侧的用户不同意拍摄的情况下,第一手机中还可以显示通知框,以通知拍摄者被拍摄对象侧的用户拒绝第一手机的拍摄请求。其中,在第一手机录制视频的过程中,可以不显示提示框601,从而保证拍摄画面不被遮挡。In some embodiments, after the first mobile phone detects the user's recording operation, it does not immediately start recording the video on the side of the photographed object. At this time, the first mobile phone can first send a request message to the second mobile phone, for example, a selection box will pop up in the second mobile phone, and in the selection box, the user on the second mobile phone side can be asked whether the user on the side of the photographed object agrees with the first mobile phone to record the recording of the subject. For video, when the user on the side of the second mobile phone chooses to agree to shooting, the first mobile phone starts to record the video on the side of the photographed object, and displays the interface shown in (b) in FIG. 6 . If the user on the side of the second mobile phone chooses not to agree to shooting, the first mobile phone cannot record the video on the side of the photographed object. Exemplarily, if the user on the second mobile phone side does not agree to photographing, a notification box may also be displayed in the first mobile phone to notify the photographer that the user on the side of the photographed object rejects the photographing request of the first mobile phone. Wherein, in the process of video recording by the first mobile phone, the prompt box 601 may not be displayed, so as to ensure that the shooting picture is not blocked.
在另一些实施例中,在拍摄过程中,第二手机上还可以继续显示上述选择框,以确认被拍摄对象侧的用户是否选择同意当前的拍摄继续进行。一旦被拍摄对象侧的用户拒绝拍摄请求,则第一手机随即停止录制视频。在拍摄过程中,第二手机上还可以持续显示提醒消息,提醒被拍摄对象侧的用户,第一手机仍然在录制被拍摄对象侧的视频。In some other embodiments, during the shooting process, the above-mentioned selection frame may be continuously displayed on the second mobile phone to confirm whether the user on the side of the photographed object chooses to agree to continue the current shooting. Once the user on the side of the photographed object rejects the photographing request, the first mobile phone immediately stops recording the video. During the shooting process, a reminder message may also be continuously displayed on the second mobile phone to remind the user on the side of the photographed object that the first mobile phone is still recording the video on the side of the photographed object.
第一手机在被拍摄对象侧的用户同意拍摄请求的情况下才采集被拍摄对象侧的视频,能够保证采集的视频的有效性和安全性,避免拍摄到无关内容,提高用户体验。The first mobile phone collects the video on the side of the photographed object only when the user on the side of the photographed object agrees to the photographing request, which can ensure the validity and security of the collected video, prevent irrelevant content from being photographed, and improve user experience.
此外,第一手机还获取来自第二手机的音频。仍然如图7中的(a)所示,第一手机接收到来自第二手机的每帧音频数据后,每帧音频数据可以分别经过接收缓存模块、音频解码模块和音频解装模块的处理,得到处理后的音频数据。同时,第一手机接收来自第二手机的音频数据时还可以接收到时延数据,根据每个音频数据包到达的时间戳以及上述时延数据,以第一手机的时钟为基准,计算出每个音频数据包在第二手机中的产生时刻,即每帧音频数据的产生时刻。若每个音频数据包到达第一手机(即第一手机接收到每个音频数据包)的时间戳为t_arrive,则第一手机可以计算出每个音频数据包的产生时刻t_start。其中,t_start=t_arrive-(delay_remote+delay_trans)。进而,第一手机将第一手机处理后的音频数据与第一手机采集到的视频数据进行同步混合录制。In addition, the first cell phone also acquires audio from the second cell phone. Still as shown in (a) in Figure 7, after the first mobile phone receives each frame of audio data from the second mobile phone, each frame of audio data can be processed by the receiving buffer module, the audio decoding module and the audio disassembly module respectively, Get the processed audio data. At the same time, when the first mobile phone receives the audio data from the second mobile phone, it can also receive delay data. According to the time stamp of the arrival of each audio data packet and the above-mentioned delay data, the clock of the first mobile phone is used as the benchmark to calculate each The generation moment of each audio data packet in the second mobile phone, that is, the generation moment of each frame of audio data. If the time stamp when each audio data packet arrives at the first mobile phone (that is, the first mobile phone receives each audio data packet) is t_arrive, the first mobile phone can calculate the generation time t_start of each audio data packet. Wherein, t_start=t_arrive-(delay_remote+delay_trans). Furthermore, the first mobile phone synchronously mixes and records the audio data processed by the first mobile phone and the video data collected by the first mobile phone.
需要说明的是,在一些实施例中,在步骤S303中,第二手机录制音频后将该音频传 输至第一手机,也就是说,在步骤S304中第一手机响应于用户的录像操作之前,第一手机就可以获取来自第二手机的音频。在第一手机响应于用户的录像操作之前,第一手机获取的来自第二手机的音频可以用于进行第一手机与第二手机之间的音频通话测试,确保第一手机与第二手机之间的音频通话畅通。It should be noted that, in some embodiments, in step S303, after the second mobile phone records the audio, the audio is transmitted to the first mobile phone, that is, before the first mobile phone responds to the user's recording operation in step S304, The first phone can acquire audio from the second phone. Before the first mobile phone responds to the user's video recording operation, the audio obtained by the first mobile phone from the second mobile phone can be used to conduct an audio call test between the first mobile phone and the second mobile phone to ensure that the first mobile phone and the second mobile phone are The audio call between the two is uninterrupted.
S305、第一手机响应于拍摄者结束录制的操作,将第一手机录制的视频与来自第二手机的音频进行同步混合录制,生成音视频同步的短视频文件。S305. In response to the photographer's operation of ending the recording, the first mobile phone performs synchronous mixed recording of the video recorded by the first mobile phone and the audio from the second mobile phone to generate a short video file with synchronized audio and video.
在本申请实施例中,若第一手机检测到拍摄者结束录制的操作,例如,第一手机检测到拍摄者点击图6中的(b)中的停止录像控件603时,则结束录制,显示如图6中的(c)所示的播放界面。In this embodiment of the present application, if the first mobile phone detects that the photographer has finished recording, for example, when the first mobile phone detects that the photographer clicks the stop recording control 603 in (b) in FIG. 6 , the recording ends, and the display is displayed. The playback interface shown in (c) of FIG. 6 .
在一些实施例中,图6中的(c)所示的播放界面中显示的是将第一手机采集的视频数据与来自第二手机的音频数据进行同步混合处理后得到的音视频同步的短视频文件。由于第一手机基于第一手机的时钟,计算出每帧音频数据的产生时刻,也就是说,第一手机采集的视频数据与第二手机采集的音频数据都以第一手机的时钟为基准,因此,可以将第一手机将来自第一手机的视频与来自第二手机的音频进行同步混合录制(异源音视频同步)的问题转换成第一手机将来自第一手机的音频和视频进行同步混合录制(同源音视频同步)的问题。这样,结合目前成熟的将来自同一电子设备的音频和视频进行同步(同源音视频同步)的方法,第一手机可以将第一手机录制的视频和第二手机录制的音频进行同步混合录制。In some embodiments, the playback interface shown in (c) of FIG. 6 displays a short video of audio and video synchronization obtained by synchronously mixing the video data collected by the first mobile phone and the audio data from the second mobile phone. video file. Since the first mobile phone calculates the generation time of each frame of audio data based on the clock of the first mobile phone, that is to say, the video data collected by the first mobile phone and the audio data collected by the second mobile phone are both based on the clock of the first mobile phone, Therefore, the problem that the first mobile phone performs synchronous mixed recording of the video from the first mobile phone and the audio from the second mobile phone (heterogeneous audio and video synchronization) can be transformed into the first mobile phone to synchronize the audio and video from the first mobile phone. The problem of mixed recording (same source audio and video synchronization). In this way, combined with the currently mature method of synchronizing audio and video from the same electronic device (same-source audio and video synchronization), the first mobile phone can synchronously mix and record the video recorded by the first mobile phone and the audio recorded by the second mobile phone.
示例性的,在第一手机可以对拍摄所得的视频数据和获取的音频数据都按照第一手机的时钟进行时间戳打点的情况下,第一手机可以以音频数据为主流,视频数据为从流,根据音频数据的时间戳调整视频数据的播放状态,从而实现将第一手机录制的视频和来自第二手机的音频进行同步混合录制,得到音视频同步的短视频文件。例如,音视频同步的短视频文件可以是具有唇音同步效果的短视频文件,视频中拍摄对象的状态例如嘴型、动作等与音频相对应。Exemplarily, in the case where the first mobile phone can perform time stamping on both the video data obtained by shooting and the obtained audio data according to the clock of the first mobile phone, the first mobile phone can use the audio data as the main stream, and the video data as the slave stream. , adjust the playback state of the video data according to the time stamp of the audio data, so as to realize the synchronous mixed recording of the video recorded by the first mobile phone and the audio from the second mobile phone, and obtain a short video file with synchronized audio and video. For example, the short video file for audio and video synchronization may be a short video file with lip synchronization effect, and the state of the photographed object in the video, such as mouth shape, motion, etc., corresponds to the audio.
在一些实施例中,若第一手机在用户的录像操作之前已经从第二手机获取到音频,则在步骤S305中,第一手机可以基于开始录制视频的时间戳(即用户的录像操作的时刻)丢弃第一手机在用户的录像操作之前获取到的音频数据,从而将第一手机响应于用户的录像操作之后获取到的视频和音频进行同步混合录制。In some embodiments, if the first mobile phone has acquired audio from the second mobile phone before the user's video recording operation, in step S305, the first mobile phone may start recording the video based on the timestamp (ie the moment of the user's video recording operation) ) discards the audio data acquired by the first mobile phone before the user's video recording operation, so that the video and audio acquired by the first mobile phone after the user's video recording operation is synchronously mixed and recorded.
此外,在一些实施例中,第一手机也可以响应于用户的录像操作,先录制被拍摄对象侧的视频,之后再响应于用户对远程连接图标的触发操作,向第二手机发起音频呼叫,从而获取来自第二手机的音频,最终将第一手机录制的视频和来自第二手机的音频进行同步混合录制。在这种情况下,第一手机可以基于计算出的音频数据的产生时刻丢弃之前的视频数据,从而将从第二手机获取的音频与对应的视频进行同步混合录制。In addition, in some embodiments, the first mobile phone may also respond to the user's video recording operation, first record the video on the side of the photographed object, and then in response to the user's triggering operation on the remote connection icon, initiate an audio call to the second mobile phone, Thereby, the audio from the second mobile phone is acquired, and finally the video recorded by the first mobile phone and the audio from the second mobile phone are synchronously mixed and recorded. In this case, the first mobile phone can discard the previous video data based on the calculated generation time of the audio data, so as to perform synchronous mixed recording of the audio obtained from the second mobile phone and the corresponding video.
在一些实施例中,在图6中的(c)所示的播放界面中还可以包括返回控件605、发布控件606和保存控件607。若用户不满意当前播放的音视频同步的短视频文件,则可以触发返回控件605,返回预览界面重新录制;若用户希望保存当前播放的音视频同步的短视频文件,则可以触发保存控件607,保存当前播放的音视频同步的短视频文件;若用户希望分享当前播放的音视频同步的短视频文件,则可以触发发布控件606,分享当前播放的音视频同步的短视频文件。可以理解的是,在本申请实施例中,由于第一手 机与被拍摄对象之间的距离较远,无法清楚、完整地录制被拍摄对象侧的音频,因此第一手机在录制被拍摄对象侧的视频的过程中,可以打开麦克风,也可以关闭麦克风,最终第一手机进行同步混合录制时只选用第一手机录制的视频,将第一手机录制的视频与被拍摄对象侧的第二手机录制的音频进行同步混合录制,从而生成音视频同步的短视频文件。In some embodiments, the playback interface shown in (c) of FIG. 6 may further include a return control 605 , a release control 606 and a save control 607 . If the user is not satisfied with the currently playing audio-video synchronized short video file, he can trigger the return control 605 to return to the preview interface to re-record; if the user wants to save the currently playing audio-video synchronized short video file, he can trigger the save control 607, The currently playing audio-video synchronized short video file is saved; if the user wishes to share the currently playing audio-video synchronized short video file, the publishing control 606 can be triggered to share the currently playing audio-video synchronized short video file. It can be understood that, in the embodiment of the present application, due to the long distance between the first mobile phone and the object to be photographed, the audio on the side of the object to be photographed cannot be recorded clearly and completely, so the first mobile phone is recording the side of the object to be photographed. During the video recording process, the microphone can be turned on or off. In the end, when the first mobile phone performs synchronous mixed recording, only the video recorded by the first mobile phone is selected, and the video recorded by the first mobile phone is recorded with the second mobile phone on the subject side. The audio is synchronously mixed and recorded to generate a short video file with audio and video synchronization.
根据上述步骤S301-S305所述的方法,第一手机可以基于来自第二手机的每帧音频数据的时延数据以及第一手机接收到来自第二手机的每帧音频数据的时间戳,以第一手机的时钟为基准,精准计算出来自第二手机的每帧音频数据的产生时刻。这样,将来自不同手机的音频和视频同步混合的问题转换成近似同一手机的音频和视频的同步混合问题进而可以结合目前成熟的将来自同一手机的音频和视频进行同步混合的方法,实现将第一手机录制的视频与第二手机录制的音频进行同步混合录制的目的。该方法不用依赖NTP时间服务器,能够避免由广域网授时精度误差带来的较大的时延偏差所导致的音频和视频不同步的问题,同时,该方法对设备和链路没有额外要求,成本较低,能够避免需要额外的设备支持或者提高同步周期而产生的额外通信成本或设备成本的开销。此外,该方法能够更方便地支持远距离的音频与本地视频的混合录制,从而消除现有技术中长话筒或后期录音棚配音等方法中带来的人为偏差,进一步提高音频视频混合录制的同步性。According to the method described in the above steps S301-S305, the first mobile phone can be based on the time delay data of each frame of audio data from the second mobile phone and the timestamp of each frame of audio data received by the first mobile phone from the second mobile phone. The clock of the first mobile phone is used as the benchmark to accurately calculate the generation time of each frame of audio data from the second mobile phone. In this way, the problem of synchronous mixing of audio and video from different mobile phones is transformed into a problem of synchronous mixing of audio and video of the same mobile phone, and then the mature method of synchronous mixing of audio and video from the same mobile phone can be combined to realize the first The purpose of mixing and recording the video recorded by one mobile phone and the audio recorded by the second mobile phone is synchronized. This method does not rely on the NTP time server, and can avoid the problem of audio and video synchronization caused by the large delay deviation caused by the WAN timing accuracy error. At the same time, this method has no additional requirements on equipment and links, and the cost is relatively It can avoid the overhead of extra communication cost or device cost caused by needing extra device support or increasing the synchronization period. In addition, this method can more conveniently support the mixed recording of long-distance audio and local video, thereby eliminating the artificial bias brought by methods such as medium and long microphones or post-recording studio dubbing in the prior art, and further improving the synchronization of mixed recording of audio and video sex.
上面是从设备角度介绍了本申请实施例提供的音频与视频同步的方法,下面以设备间音视频通话模块为华为畅连CaasKit组件,第一手机与第二手机通过畅连通话进行音频通话为例,从设备模块的角度继续对该方法进行补充描述。The above describes the method for synchronizing audio and video provided by the embodiment of the present application from the perspective of the device. Below, the audio and video call module between devices is the Huawei Changlian CaasKit component, and the audio call between the first mobile phone and the second mobile phone through the Changlian call is For example, the method continues to be described supplementally from the perspective of the device module.
示例性的,参见图9,第一手机中的视频录制应用中可以集成设备间音视频通话模块,例如华为畅连CaasKit组件。华为畅连CaasKit组件具备音视频编解码和网络传输的能力,包括音视频呼叫接口和编解码传输接口。第一手机和第二手机中的***(system)或音视频通话应用(例如,畅连通话应用)中都包括音视频通话传输组件,第一手机和第二手机分别通过各自的音视频通话传输组件进行音频通话,第一手机和第二手机中的***(system)或音视频通话应用(例如,畅连通话应用)还可以包括Call服务模块来进行音频通话能力的查询。Exemplarily, referring to FIG. 9 , a video recording application in the first mobile phone may integrate an audio and video call module between devices, such as a Huawei Changlian CaasKit component. The Huawei Changlian CaasKit component has the capabilities of audio and video codec and network transmission, including audio and video call interfaces and codec transmission interfaces. The system or the audio and video call application (for example, the Changlian call application) in the first mobile phone and the second mobile phone both include audio and video call transmission components, and the first mobile phone and the second mobile phone transmit the audio and video calls through their respective audio and video calls. The component performs an audio call, and the system or the audio and video call application (for example, the Changlian call application) in the first mobile phone and the second mobile phone may further include a Call service module to query the audio call capability.
示例性的,在本申请实施例中,第一手机和第二手机均开通了畅连通话,在判断第一手机是否具有获取远程音频能力时,以及在第一手机搜索能够与第一手机建立音频通话的其他电子设备时,第一手机中视频录制应用中集成的CaasKit组件可以通过进程间通信与第一手机***中的Call服务模块进行通信,从而通过应用程序接口查询第一手机获取远程音频的能力,同时搜索能够与第一手机建立音频通话的其他电子设备。在视频录制应用中用户选择第二手机并向选择的第二手机发起音频通话,之后由畅连通话通过云完成设备管理及音视频通话接续。在第一手机与第二手机之间建立音频通话之后,视频录制应用通过CaasKit的应用程序接口获取第二手机采集的音频数据,并按照上述步骤S304在第一手机的视频录制应用中以第一手机的时钟为基准,确定第二手机录制的音频的产生时刻,进而第一手机的视频录制应用中的APP base中的媒体录制模块参见目前成熟的同源音视频同步技术(例如唇音同步技术),完成第一手机采集的视频数据与第二手机采集音频数据的同步混合录制。Exemplarily, in the embodiment of the present application, both the first mobile phone and the second mobile phone have enabled a smooth call, when judging whether the first mobile phone has the ability to obtain remote audio, and when the first mobile phone searches for the ability to establish a connection with the first mobile phone. When the audio call is other electronic devices, the CaasKit component integrated in the video recording application in the first mobile phone can communicate with the Call service module in the first mobile phone system through inter-process communication, so as to query the first mobile phone through the application program interface to obtain remote audio. The ability to simultaneously search for other electronic devices capable of establishing an audio call with the first mobile phone. In the video recording application, the user selects the second mobile phone and initiates an audio call to the selected second mobile phone. After that, Changlian Call completes the device management and audio and video call connection through the cloud. After the audio call is established between the first mobile phone and the second mobile phone, the video recording application acquires the audio data collected by the second mobile phone through the application program interface of CaasKit, and records the audio data collected by the second mobile phone in the video recording application of the first mobile phone according to the above-mentioned step S304. The clock of the mobile phone is used as the benchmark to determine the generation time of the audio recorded by the second mobile phone, and then the media recording module in the APP base in the video recording application of the first mobile phone refers to the current mature homologous audio and video synchronization technology (such as lip synchronization technology) , to complete the synchronous mixed recording of the video data collected by the first mobile phone and the audio data collected by the second mobile phone.
可以理解的是,本申请实施例提供的方法在第一手机与第二手机之间的距离较近的情况下也可以使用。It can be understood that, the method provided by the embodiment of the present application can also be used when the distance between the first mobile phone and the second mobile phone is relatively short.
此外,本申请实施例提供的方法还可以应用在拍摄者与被拍摄对象之间的距离变化的场景下,根据拍摄者与被拍摄对象之间的距离,第一手机可以在普通录制视频的方式与第一手机录制视频,获取来自第二手机的音频的方式之间切换。例如,若开始录像时,拍摄者与被拍摄对象之间的距离较近,第一手机可以有效录制被拍摄对象侧的音频,此时第一手机直接录制音频和视频。之后若拍摄者与被拍摄对象之间的距离变得较远,或者拍摄者改变被拍摄对象后,新的被拍摄对象与拍摄者之间的距离较远,使得第一手机无法有效录制被拍摄对象侧的音频时,第一手机可以自动切换到第一手机录制视频,第二手机录制音频的方式,或者第一手机也可以响应于用户的切换操作来切换到第一手机录制视频,第二手机录制音频的方式。进而,第一手机将第一手机录制的视频与第二手机录制的音频进行同步混合录制。这样,能够保证第一手机持续获取到被拍摄对象侧的音频,避免由于录像过程中的变化而导致最终获取的视频中音频录制不完整的情况。In addition, the method provided by the embodiment of the present application can also be applied to a scene where the distance between the photographer and the subject changes. According to the distance between the photographer and the subject, the first mobile phone can record a video in an ordinary way. Switch between ways to record video with the first phone and get audio from the second phone. For example, if the distance between the photographer and the subject is relatively short when video recording is started, the first mobile phone can effectively record the audio on the subject side, and the first mobile phone directly records the audio and video. Later, if the distance between the photographer and the subject becomes farther, or after the photographer changes the subject, the distance between the new subject and the photographer is farther, so that the first mobile phone cannot effectively record the subject. When the audio is on the object side, the first mobile phone can automatically switch to the first mobile phone to record video and the second mobile phone to record audio, or the first mobile phone can also switch to the first mobile phone to record video in response to the user's switching operation, and the second mobile phone to record video. The way your phone records audio. Further, the first mobile phone records the video recorded by the first mobile phone and the audio recorded by the second mobile phone synchronously and mixed. In this way, it can be ensured that the first mobile phone can continuously obtain the audio on the side of the photographed object, so as to avoid a situation in which the audio recording in the finally obtained video is incomplete due to changes in the video recording process.
上面描述的本申请实施例是以场景10距离说明的,并不限定本申请实施例提供的方法只能应用在本地视频与远程音频的同步混合录制。在第一手机将第一手机录制的音频与来自第二手机录制的视频进行同步混合录制的场景下,本申请实施例提供的方法同样适用。也就是说,第一手机可以向第二手机发起视频呼叫,在第二手机检测到用户接受该视频呼叫的操作后,第二手机与第一手机之间建立视频通话,第二手机可以将录制的第二手机的视频传输给第一手机。第一手机检测到用户打开麦克风的操作后,可以录制第一手机侧的音频。最后,第一手机将采集的音频数据与来自第二手机的视频数据进行同步混合录制。本申请实施例对该过程不再详细赘述。根据该方法,也可以将来自不同电子设备的音频和视频进行同步混合录制,不用依赖NTP时间服务器,能够避免由广域网授时精度带来的较大的时延偏差所导致的音频和视频不同步的问题,同时,该方法对设备和链路没有额外要求,成本较低。The embodiments of the present application described above are described based on the distance of scene 10, and it is not limited that the methods provided by the embodiments of the present application can only be applied to synchronous mixed recording of local video and remote audio. In a scenario where the first mobile phone records the audio recorded by the first mobile phone and the video recorded by the second mobile phone synchronously and mixed, the method provided by the embodiment of the present application is also applicable. That is to say, the first mobile phone can initiate a video call to the second mobile phone. After the second mobile phone detects that the user accepts the operation of the video call, a video call is established between the second mobile phone and the first mobile phone, and the second mobile phone can record the video call. The video of the second mobile phone is transmitted to the first mobile phone. After the first mobile phone detects the operation of the user turning on the microphone, the audio on the side of the first mobile phone can be recorded. Finally, the first mobile phone synchronously mixes and records the collected audio data and the video data from the second mobile phone. This process is not described in detail in this embodiment of the present application. According to this method, the audio and video from different electronic devices can also be recorded synchronously and mixed without relying on the NTP time server, which can avoid the asynchronous audio and video caused by the large delay deviation caused by the timing accuracy of the WAN. At the same time, this method has no additional requirements on equipment and links, and the cost is low.
可以理解的是,为了实现上述功能,电子设备包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that, in order to realize the above-mentioned functions, the electronic device includes corresponding hardware and/or software modules for executing each function. The present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
本实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In this embodiment, the electronic device can be divided into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
示例性地,图10为本申请实施例提供的第一电子设备的结构示意图。如图10所示,本申请实施例提供了一种第一电子设备100,包括检测单元1001、收发单元1002、采集单元1003、显示单元1004和同步单元1005。Exemplarily, FIG. 10 is a schematic structural diagram of a first electronic device provided by an embodiment of the present application. As shown in FIG. 10 , an embodiment of the present application provides a first electronic device 100 , including a detection unit 1001 , a transceiver unit 1002 , a collection unit 1003 , a display unit 1004 , and a synchronization unit 1005 .
其中,检测单元1001用于检测用户的录像操作、结束录制的操作以及其他操作。收发单元1002用于向第二电子设备发起音频呼叫,以及获取来自第二电子设备的音频数据 和每帧音频数据对应的时延数据。采集单元1003用于采集被拍摄对象侧的视频数据。显示单元1004用于显示预设图标、联系对象的列表以及录制的被拍摄对象的图像等。同步单元1005用于将来自第二电子设备的音频数据与第一电子设备采集的视频数据进行同步。Wherein, the detection unit 1001 is used to detect the user's recording operation, the operation of ending recording, and other operations. The transceiver unit 1002 is configured to initiate an audio call to the second electronic device, and acquire audio data from the second electronic device and delay data corresponding to each frame of audio data. The acquisition unit 1003 is used to acquire video data on the side of the photographed object. The display unit 1004 is used for displaying preset icons, a list of contact objects, and recorded images of the photographed objects, and the like. The synchronization unit 1005 is used for synchronizing the audio data from the second electronic device and the video data collected by the first electronic device.
又示例性地,图11为本申请实施例提供的第二电子设备的结构示意图。如图11所示,本申请实施例提供了一种第二电子设备110,包括检测单元1101、收发单元1102、采集单元1103、显示单元1104和计算单元1005。For another example, FIG. 11 is a schematic structural diagram of a second electronic device provided by an embodiment of the present application. As shown in FIG. 11 , an embodiment of the present application provides a second electronic device 110 , including a detection unit 1101 , a transceiver unit 1102 , a collection unit 1103 , a display unit 1104 , and a calculation unit 1005 .
其中,检测单元1101用于检测用户接受音频呼叫的操作以及其他操作。收发单元1102用于向第一电子设备传输音频数据和每帧音频数据对应的时延数据。采集单元1103用于采集被拍摄对象侧的音频数据。显示单元1004用于显示提醒消息、通话界面等界面。计算单元1105用于计算处理时延和传输时延。Wherein, the detection unit 1101 is used to detect the operation of the user accepting the audio call and other operations. The transceiver unit 1102 is configured to transmit audio data and delay data corresponding to each frame of audio data to the first electronic device. The acquisition unit 1103 is used to acquire audio data on the side of the photographed object. The display unit 1004 is used to display interfaces such as reminder messages and call interfaces. The calculation unit 1105 is used to calculate processing delay and transmission delay.
本申请实施例还提供一种电子设备,包括一个或多个处理器以及一个或多个存储器。该一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行上述相关方法步骤实现上述实施例中的音频和视频同步混合录制方法。Embodiments of the present application further provide an electronic device, including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform The above-mentioned related method steps implement the audio and video synchronous mixed recording method in the above-mentioned embodiment.
本申请的实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的音频和视频同步混合录制方法。Embodiments of the present application further provide a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on an electronic device, the electronic device executes the above-mentioned related method steps to realize the above-mentioned embodiments The audio and video sync hybrid recording method in .
本申请的实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中电子设备执行的音频和视频同步混合录制方法。Embodiments of the present application also provide a computer program product, which, when running on a computer, causes the computer to execute the above-mentioned relevant steps, so as to implement the method for synchronously mixing audio and video performed by the electronic device in the above-mentioned embodiment.
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中电子设备执行的音频和视频同步混合录制方法。In addition, the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the audio and video synchronous mixed recording method executed by the electronic device in the above method embodiments.
其中,本实施例提供的电子设备、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。Wherein, the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。From the description of the above embodiments, those skilled in the art can understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated by different The function module is completed, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示 的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above content is only a specific embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (23)

  1. 一种音频与视频同步的方法,其特征在于,应用于第一电子设备,所述方法包括:A method for synchronizing audio and video, characterized in that, applied to a first electronic device, the method comprising:
    响应于第一用户操作,向第二电子设备发起音频呼叫;Initiating an audio call to the second electronic device in response to the first user operation;
    响应于第二用户操作,采集视频数据;获取来自所述第二电子设备的音频数据,所述音频数据通过多个音频数据包传输;以及获取来自所述第二电子设备的所述多个音频数据包中的每个音频数据包对应的时延数据;其中,所述时延数据用于表示所述多个音频数据包中的每个音频数据包的产生时刻与所述第一电子设备获取到所述多个音频数据包中的每个音频数据包的时刻之间的时延;in response to a second user operation, capturing video data; acquiring audio data from the second electronic device, the audio data being transmitted through a plurality of audio data packets; and acquiring the plurality of audios from the second electronic device Time delay data corresponding to each audio data packet in the data packet; wherein, the time delay data is used to indicate that the generation time of each audio data packet in the plurality of audio data packets is obtained from the first electronic device the time delay to the time of each audio data packet in the plurality of audio data packets;
    响应于第三用户操作,基于所述时延数据将所述音频数据和所述视频数据同步。In response to a third user operation, the audio data and the video data are synchronized based on the delay data.
  2. 根据权利要求1所述的方法,其特征在于,所述时延数据包括第一时延数据和第二时延数据;The method according to claim 1, wherein the delay data comprises first delay data and second delay data;
    其中,所述第一时延数据为所述第二电子设备处理所述音频数据过程的处理时延,所述第二时延数据为所述音频数据从所述第二电子设备到所述第一电子设备的传输时延。Wherein, the first delay data is the processing delay of the process of processing the audio data by the second electronic device, and the second delay data is the audio data from the second electronic device to the first time. The transmission delay of an electronic device.
  3. 根据权利要求2所述的方法,其特征在于,所述处理包括所述音频数据的编码处理和缓存处理。The method according to claim 2, wherein the processing includes encoding processing and buffering processing of the audio data.
  4. 根据权利要求3所述的方法,其特征在于,所述处理还包括所述音频数据的组装处理。4. The method of claim 3, wherein the processing further comprises assembly processing of the audio data.
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述多个音频数据包中的每个音频数据包还传输所述每个音频数据包对应的时延数据。The method according to any one of claims 1-4, wherein each audio data packet in the plurality of audio data packets further transmits delay data corresponding to each audio data packet.
  6. 根据权利要求5所述的方法,其特征在于,所述每个音频数据包是实时传输协议RTP报文,其中,所述时延数据位于RTP报文头的扩展头字段。The method according to claim 5, wherein each audio data packet is a real-time transport protocol (RTP) packet, wherein the delay data is located in an extended header field of the RTP packet header.
  7. 根据权利要求5所述的方法,其特征在于,所述每个音频数据包是RTP报文,其中,所述时延数据位于RTP报文头的时间戳timestamp字段。The method according to claim 5, wherein each audio data packet is an RTP packet, wherein the delay data is located in a timestamp field of the RTP packet header.
  8. 根据权利要求1-4中任一项所述的方法,其特征在于,所述时延数据以控制信令格式传输。The method according to any one of claims 1-4, wherein the delay data is transmitted in a control signaling format.
  9. 根据权利要求8所述的方法,其特征在于,所述时延数据还包括所述每个音频数据包的序列号信息。The method according to claim 8, wherein the time delay data further comprises sequence number information of each audio data packet.
  10. 根据权利要求9所述的方法,其特征在于,若连续N个音频数据包分别对应的时延数据的方差小于预设阈值,则所述时延数据包括所述N个音频数据包分别对应的时延数据的平均值,以及所述N个音频数据包的开始序列号信息和结束序列号信息。The method according to claim 9, wherein if the variance of the delay data corresponding to the N consecutive audio data packets is smaller than a preset threshold, the delay data includes the corresponding delay data of the N audio data packets. The average value of the delay data, and the start sequence number information and the end sequence number information of the N audio data packets.
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,基于所述时延数据将所述音频数据和所述视频数据同步,包括:The method according to any one of claims 1-10, wherein synchronizing the audio data and the video data based on the delay data comprises:
    基于所述时延数据和所述第一电子设备获取到所述每个音频数据包的时刻,计算所述每个音频数据包的产生时刻;Calculate the generation time of each audio data packet based on the time delay data and the time when the first electronic device obtains the each audio data packet;
    根据所述每个音频数据包的产生时刻,将所述音频数据和所述视频数据进行同步,得到同步的视频文件。According to the generation time of each audio data packet, the audio data and the video data are synchronized to obtain a synchronized video file.
  12. 根据权利要求1-11中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-11, wherein the method further comprises:
    启动视频录制应用,所述视频录制应用显示有预设图标或联系对象的列表。A video recording application is started, and the video recording application displays a list of preset icons or contact objects.
  13. 根据权利要求12所述的方法,其特征在于,若所述视频录制应用显示有所述预 设图标,则所述响应于第一用户操作,向第二电子设备发起音频呼叫,包括:The method according to claim 12, wherein, if the video recording application displays the preset icon, the initiating an audio call to the second electronic device in response to the first user operation comprises:
    响应于用户对所述预设图标的操作,显示所述联系对象的列表;In response to the user's operation on the preset icon, displaying the list of contact objects;
    响应于用户对所述联系对象的列表中所述第二电子设备的选择操作,向所述第二电子设备发起音频呼叫。In response to a user selection operation on the second electronic device in the list of contact objects, an audio call is initiated to the second electronic device.
  14. 根据权利要求12所述的方法,其特征在于,若所述视频录制应用显示有所述联系对象的列表,则所述响应于第一用户操作,向第二电子设备发起音频呼叫,包括:The method according to claim 12, wherein if the video recording application displays a list of the contact objects, the initiating an audio call to the second electronic device in response to the first user operation comprises:
    响应于用户对所述联系对象的列表中所述第二电子设备的选择操作,向所述第二电子设备发起音频呼叫。In response to a user selection operation on the second electronic device in the list of contact objects, an audio call is initiated to the second electronic device.
  15. 根据权利要求12-14中任一项所述的方法,其特征在于,所述联系对象的信息来自于所述第一电子设备的通讯录。The method according to any one of claims 12-14, wherein the information of the contact object comes from an address book of the first electronic device.
  16. 根据权利要求12-15中任一项所述的方法,其特征在于,所述联系对象对应的电子设备与所述第一电子设备之间建立有蓝牙连接,或者所述联系对象对应的电子设备与所述第一电子设备在同一Wi-Fi局域网内。The method according to any one of claims 12-15, wherein a Bluetooth connection is established between the electronic device corresponding to the contact object and the first electronic device, or the electronic device corresponding to the contact object In the same Wi-Fi local area network as the first electronic device.
  17. 根据权利要求12-16中任一项所述的方法,其特征在于,所述第一电子设备存储有历史联系对象的信息,所述历史联系对象对应的电子设备包括之前与所述第一电子设备建立过音频通话的电子设备。The method according to any one of claims 12-16, wherein the first electronic device stores information of a historical contact object, and the electronic device corresponding to the historical contact object includes an electronic device previously associated with the first electronic device. Device An electronic device that has established an audio call.
  18. 一种第一电子设备,其特征在于,包括:一个或多个处理器;存储器;以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述第一电子设备执行时,使得所述第一电子设备执行如权利要求1-17中任一项所述的方法。A first electronic device, comprising: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the first electronic device, cause the first electronic device to perform the method of any of claims 1-17.
  19. 一种音频与视频同步的***,其特征在于,包括第一电子设备和第二电子设备,其中,A system for synchronizing audio and video, comprising a first electronic device and a second electronic device, wherein,
    所述第一电子设备响应于第一用户操作,向所述第二电子设备发起音频呼叫;The first electronic device initiates an audio call to the second electronic device in response to a first user operation;
    所述第二电子设备接受所述音频呼叫;the second electronic device accepts the audio call;
    所述第二电子设备采集音频数据,所述音频数据通过多个音频数据包传输;并获取所述多个音频数据包中的每个音频数据包对应的时延数据,将所述音频数据以及所述时延数据传输至所述第一电子设备;The second electronic device collects audio data, and the audio data is transmitted through a plurality of audio data packets; the delay data is transmitted to the first electronic device;
    所述第一电子设备响应于第二用户操作,采集视频数据,获取来自所述第二电子设备的所述音频数据以及所述时延数据;The first electronic device collects video data in response to a second user operation, and obtains the audio data and the time delay data from the second electronic device;
    所述第一电子设备响应于第三用户操作,基于所述时延数据将所述音频数据和所述视频数据同步。The first electronic device synchronizes the audio data and the video data based on the time delay data in response to a third user operation.
  20. 根据权利要求19所述的***,其特征在于,在所述第一电子设备和所述第二电子设备之间的距离大于所述第一电子设备的收音距离的情况下,所述第一电子设备响应于第一用户操作,向所述第二电子设备发起音频呼叫。The system according to claim 19, wherein when the distance between the first electronic device and the second electronic device is greater than the sound collection distance of the first electronic device, the first electronic device The device initiates an audio call to the second electronic device in response to the first user operation.
  21. 根据权利要求19或20所述的***,其特征在于,在所述第一电子设备采集视频数据时,所述第二电子设备持续显示提醒消息;The system according to claim 19 or 20, wherein when the first electronic device collects video data, the second electronic device continuously displays a reminder message;
    其中,所述提醒消息用于提醒用户所述第一电子设备在采集视频数据。The reminder message is used to remind the user that the first electronic device is collecting video data.
  22. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如权利要求1-17中任一项所述的方法。A computer-readable storage medium, characterized by comprising computer instructions, which, when executed on a computer, cause the computer to perform the method of any one of claims 1-17.
  23. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-17中任一项所述的方法。A computer program product, characterized in that, when the computer program product is run on a computer, the computer is caused to perform the method according to any one of claims 1-17.
PCT/CN2021/134168 2020-11-30 2021-11-29 Audio and video synchronization method and device WO2022111712A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011377423.7A CN114584648A (en) 2020-11-30 2020-11-30 Method and equipment for synchronizing audio and video
CN202011377423.7 2020-11-30

Publications (1)

Publication Number Publication Date
WO2022111712A1 true WO2022111712A1 (en) 2022-06-02

Family

ID=81754064

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134168 WO2022111712A1 (en) 2020-11-30 2021-11-29 Audio and video synchronization method and device

Country Status (2)

Country Link
CN (1) CN114584648A (en)
WO (1) WO2022111712A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471780A (en) * 2022-11-11 2022-12-13 荣耀终端有限公司 Method and device for testing sound-picture time delay

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2585946A1 (en) * 2006-04-21 2007-10-21 Evertz Microsystems Ltd. Systems and methods for synchronizing audio and video data signals
CN105611191A (en) * 2016-01-29 2016-05-25 高翔 Voice and video file synthesizing method, device and system
CN106231226A (en) * 2015-09-21 2016-12-14 零度智控(北京)智能科技有限公司 Audio-visual synthetic method, Apparatus and system
CN107404599A (en) * 2017-07-17 2017-11-28 歌尔股份有限公司 Audio, video data synchronous method, apparatus and system
CN107872605A (en) * 2016-09-26 2018-04-03 青柠优视科技(北京)有限公司 A kind of UAS and unmanned plane audio/video processing method
CN110771158A (en) * 2018-07-02 2020-02-07 深圳市大疆创新科技有限公司 Control method of photographing apparatus, photographing system, and storage medium
CN111225173A (en) * 2020-02-20 2020-06-02 深圳市昊一源科技有限公司 Audio and video transmission device and audio and video transmission system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008048374A (en) * 2006-07-21 2008-02-28 Victor Co Of Japan Ltd Video camera apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2585946A1 (en) * 2006-04-21 2007-10-21 Evertz Microsystems Ltd. Systems and methods for synchronizing audio and video data signals
CN106231226A (en) * 2015-09-21 2016-12-14 零度智控(北京)智能科技有限公司 Audio-visual synthetic method, Apparatus and system
CN105611191A (en) * 2016-01-29 2016-05-25 高翔 Voice and video file synthesizing method, device and system
CN107872605A (en) * 2016-09-26 2018-04-03 青柠优视科技(北京)有限公司 A kind of UAS and unmanned plane audio/video processing method
CN107404599A (en) * 2017-07-17 2017-11-28 歌尔股份有限公司 Audio, video data synchronous method, apparatus and system
CN110771158A (en) * 2018-07-02 2020-02-07 深圳市大疆创新科技有限公司 Control method of photographing apparatus, photographing system, and storage medium
CN111225173A (en) * 2020-02-20 2020-06-02 深圳市昊一源科技有限公司 Audio and video transmission device and audio and video transmission system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471780A (en) * 2022-11-11 2022-12-13 荣耀终端有限公司 Method and device for testing sound-picture time delay

Also Published As

Publication number Publication date
CN114584648A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN110381345B (en) Screen projection display method and electronic equipment
CN115426064B (en) Audio data synchronization method and equipment
EP3982641A1 (en) Screen projection method and device
WO2020249098A1 (en) Bluetooth communication method, tws bluetooth headset, and terminal
WO2020244623A1 (en) Air-mouse mode implementation method and related device
EP4199422A1 (en) Cross-device audio playing method, mobile terminal, electronic device and storage medium
WO2021027623A1 (en) Device capability discovery method and p2p device
WO2022199613A1 (en) Method and apparatus for synchronous playback
WO2022111712A1 (en) Audio and video synchronization method and device
WO2021043250A1 (en) Bluetooth communication method, and related device
CN115694598A (en) Multiframe fusion transmission method and related device in Beidou communication system
EP4195659A1 (en) Screen sharing method, electronic device and system
WO2022206771A1 (en) Screen projection method, electronic device, and system
WO2021114950A1 (en) Multipath http channel multiplexing method and terminal
WO2023061217A1 (en) Data transmission method and apparatus
WO2022228213A1 (en) Data tracking method and related apparatus
WO2022174664A1 (en) Livestreaming method, apparatus and system
WO2024055881A1 (en) Clock synchronization method, electronic device, system, and storage medium
WO2023165513A1 (en) Communication method, electronic device, and apparatus
CN116981108B (en) Wireless screen-throwing connection method, mobile terminal and computer readable storage medium
WO2022095581A1 (en) Data transmission method and terminal device
WO2022152323A1 (en) Data transmission method, chip, terminal and storage medium
CN117729588A (en) Cache queue adjusting method and electronic equipment
CN113934388A (en) Synchronous display method, terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21897212

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21897212

Country of ref document: EP

Kind code of ref document: A1