WO2017185995A1 - Audio and video conversion method and device - Google Patents

Audio and video conversion method and device Download PDF

Info

Publication number
WO2017185995A1
WO2017185995A1 PCT/CN2017/080416 CN2017080416W WO2017185995A1 WO 2017185995 A1 WO2017185995 A1 WO 2017185995A1 CN 2017080416 W CN2017080416 W CN 2017080416W WO 2017185995 A1 WO2017185995 A1 WO 2017185995A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
audio
preset
quality
packet loss
Prior art date
Application number
PCT/CN2017/080416
Other languages
French (fr)
Chinese (zh)
Inventor
陈�峰
杨伯辉
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017185995A1 publication Critical patent/WO2017185995A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0894Packet rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • H04N21/4363Adapting the video stream to a specific local network, e.g. a Bluetooth® network
    • H04N21/43637Adapting the video stream to a specific local network, e.g. a Bluetooth® network involving a wireless protocol, e.g. Bluetooth, RF or wireless LAN [IEEE 802.11]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/268Signal distribution or switching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone

Definitions

  • the present invention relates to the field of communications, and in particular to an audio and video conversion method and apparatus.
  • the current main method is to start FEC compensation and IP speed increase to adapt to packet loss of various networks. And exceptions.
  • this method is usually done manually, and the convenience is poor.
  • the method cannot be decoded immediately, and a delay is introduced, which causes an increase in bandwidth and a low execution efficiency. Therefore, the existing methods of starting FEC compensation and IP lifting speed cannot effectively guarantee the user experience in video conferencing.
  • the embodiment of the invention provides a method and a device for converting audio and video, so as to at least solve the problem that the video quality is poor and the user experience is reduced due to network congestion, delay or jitter in the related art.
  • an audio-video conversion method including: detecting a video quality of a video in a first preset time period; determining whether the video quality is always lower than a preset view. Frequency quality threshold; in the case of a yes result, the video is converted to audio.
  • converting the video into audio comprises: detecting an image quality of the video; determining whether the image quality is lower than a preset image quality threshold; and converting the video into audio if the image quality is lower than a preset image quality threshold.
  • determining whether the video quality is always lower than the preset video quality threshold includes: determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold; and determining that the video loss rate of the video is never Below the preset video packet loss rate threshold, it is determined that the video quality is always lower than the preset video quality threshold.
  • a preset video packet loss rate threshold by establishing a packet loss rate and a peak value for characterizing the video quality.
  • the mapping relationship between the signal-to-noise ratio (PSNR) and the preset video loss rate threshold corresponding to the preset PSNR threshold is determined according to the mapping relationship.
  • the video packet loss ratio is determined by: acquiring first information carried in a real-time transmission control protocol RTCP data packet of the video, where the first information includes information of an input message and an output message of the video; The input packet and the output packet calculate the video packet loss rate.
  • the method further includes: detecting an audio quality of the audio in the second preset time period; determining whether the audio quality is always not lower than a preset audio quality threshold; and determining that the audio quality is always Convert audio to video without falling below the preset audio quality threshold.
  • determining whether the audio quality is always lower than the preset audio quality threshold includes: determining whether the audio packet loss rate of the audio is always lower than a preset audio packet loss rate threshold; and determining that the audio loss rate of the audio is always low. In the case of a preset audio packet loss rate threshold, it is determined that the audio quality is always not lower than the preset audio quality threshold.
  • determining an audio packet loss rate by acquiring second information in a real-time transmission control protocol RTCP data packet of the audio, where the second information includes an audio input message and an output message information; The input packet and the output packet are used to calculate the audio packet loss rate.
  • an audio-video conversion apparatus including: The measuring module is configured to detect the video quality of the video in the first preset time period; the determining module is configured to determine whether the video quality is always lower than the preset video quality threshold; and the conversion module is set to be in the case that the determination result is yes , convert the video to audio.
  • the conversion module includes: a detecting unit configured to detect an image quality of the video; a determining unit configured to determine whether the image quality is lower than a preset image quality threshold; and a converting unit configured to lower the image quality than the preset image quality In the case of a threshold, the video is converted to audio.
  • a storage medium is also provided.
  • the storage medium is configured to store program code for performing the following steps: detecting a video quality of the video within a first preset time period; determining whether the video quality is always lower than a preset video quality threshold; and if the determination result is yes , convert the video to audio.
  • a storage medium is also provided.
  • the storage medium is arranged to store program code for performing the following steps:
  • the storage medium is further arranged to store program code for performing the following steps:
  • Detecting an image quality of the video determining whether the image quality is lower than a preset image quality threshold; and converting the video to the audio if the image quality is lower than the preset image quality threshold.
  • the storage medium is further arranged to store program code for performing the following steps:
  • the storage medium is further arranged to store program code for performing the following steps:
  • determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold Before determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold, establishing a mapping between a packet loss rate and a peak signal to noise ratio PSNR for characterizing the video quality And determining, according to the mapping relationship, the preset video packet loss rate threshold corresponding to the preset PSNR threshold.
  • the storage medium is further arranged to store program code for performing the following steps:
  • the first information carried in the real-time transmission control protocol RTCP data packet of the video where the first information includes information of an input message and an output message of the video; and an input message according to the video
  • the output message calculates the video packet loss rate.
  • the video to audio switching is performed to solve the problem.
  • the related art has a problem of poor video quality and reduced user experience due to network congestion, delay, or jitter.
  • the invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.
  • FIG. 1 is a flowchart of an audio-video conversion method according to an embodiment of the present invention.
  • FIG. 2 is a flow chart of an alternative method of video conversion to audio in accordance with an embodiment of the present invention
  • FIG. 3 is a flow chart of an alternative method of converting audio to video in accordance with an embodiment of the present invention.
  • FIG. 4 is a block diagram showing the structure of an audio-video conversion apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an optional audio and video conversion apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an optional network information feedback module according to an embodiment of the invention.
  • FIG. 1 is a flowchart of an audio and video conversion method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
  • Step S102 Detecting video quality of the video in the first preset time period
  • Step S104 determining whether the video quality is always lower than a preset video quality threshold
  • step S106 if the result of the determination is YES, the video is converted into audio.
  • the execution body of the foregoing steps may be a processor, in particular, a processor used in a video conference, but is not limited thereto.
  • the audio-video conversion method embodiment according to this embodiment may be performed in a mobile terminal, a computer terminal, or the like.
  • step S102 may include: detecting video quality once every preset time interval within a preset time period.
  • the preset time period is from the current time T0 of the video conference to the continuation of 10 seconds, and within 10 seconds, the video quality of the video conference is detected every interval of 1 second.
  • step S104 may include: determining, during the time period, whether each detected video quality is lower than a preset video quality threshold. For example, the time points T0, T0+1, T0+2, ... T0+9, the video quality detected at each time point is lower than the preset video quality threshold.
  • the preset video quality threshold may be an experience value of the user or may be set according to user requirements.
  • the video quality described above can be obtained by other parameters that characterize the quality of the video.
  • the Peak Signal to Noise Ratio (PSNR), the Image Similarity Index (SSIM), the Signal-to-Noise Ratio (SNR), the delay, the jitter, the packet loss rate, and the combination thereof can be used. Characterization is performed, but is not limited thereto.
  • the solution is solved.
  • the video quality is poor due to network congestion, delay, or jitter, and the user experience is reduced.
  • the invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.
  • converting the video into audio comprises: detecting an image quality of the video; determining whether the image quality is lower than a preset image quality threshold; and converting the video into audio if the image quality is lower than a preset image quality threshold.
  • the image quality of the video can be obtained by using the objective quality evaluation method without reference frame.
  • the image sharpness algorithm can be selected based on the canny operator edge detection algorithm, which is an optimal step type edge detection algorithm, which has an edge detection optimal filter capable of filtering out noise and maintaining edge characteristics, which adopts a Order differential filter.
  • the first-order directional derivative in any direction of the two-dimensional Gaussian function is used as a noise filter, and is filtered by image convolution, and then the local maximum value of the image gradient is searched for the filtered image to determine the image.
  • the optimal approximation operator is obtained; the resolution score is in the range of 0 to 100; the contour is extracted from an image, and Gaussian filtering is performed to calculate the contour twice.
  • the contour pixel difference can characterize the image clarity.
  • an appropriate upper and lower thresholds may be set. When the lower threshold is lower than the lower threshold, the sharpness score is 0, and between the upper and lower thresholds, the sharpness score is between 0 and 100.
  • the image quality is better predicted by using the canny operator edge detection algorithm, and the conversion of the video conference to the audio conference can be performed in the case where the detected sound score value of the video is relatively low.
  • Video to audio switching is more accurate and effective.
  • determining whether the video quality is always lower than the preset video quality threshold includes: determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold; and determining that the video loss rate of the video is never When the video loss rate threshold is lower than the preset video, the video quality is determined. Ultimately below the preset video quality threshold.
  • Loss Tolerance or Packet Loss Rate is the ratio of the number of lost packets to the data sent. Generally speaking, the packet loss rate is calculated as: [(input message-output message)/input message]*100%. The packet loss rate is related to the packet length and the packet transmission frequency. The corresponding information (input message and output message) can be obtained through the RTCP message. General network congestion, insufficient bandwidth, delay or jitter can cause network packet loss.
  • the network indicator information related to the video conference call is extracted, and the current video packet loss rate is calculated according to the network indicator information. If the current packet loss rate is greater than the packet loss rate threshold of 1.5%, the packet loss rate at the current time point T0 is recorded. Otherwise, the packet loss rate is detected; from T0, the packet loss statistics are continuous for t (which can be estimated for 10 seconds). If the packet loss rate is always greater than 1.5%, the video to audio conversion is performed. In addition, in the case where it is judged that the packet loss rate is always greater than 1.5%, the image can be detected before the video-to-audio conversion is performed, and if the sharpness of the image is poor, it is determined that the video-to-audio conversion is performed.
  • the video quality is characterized by the packet loss rate. Since the packet loss rate acquisition does not occupy too much system resources, the method is more conducive to saving system resources.
  • a preset video packet loss rate threshold by establishing a packet loss rate and a peak value for characterizing the video quality.
  • the mapping relationship between the signal-to-noise ratio (PSNR) and the preset video loss rate threshold corresponding to the preset PSNR threshold is determined according to the mapping relationship.
  • the image quality of the video can be evaluated by SSIM, SNR, and PSNR algorithms.
  • the PSNR can be used to perform image quality evaluation of the video (as video quality).
  • Table 1 provides an alternative PSNR value and MOS (Mean Opinion Score) Relationship, in which MOS is an indicator for measuring the audio and video quality of a communication system, as shown in Table 1.
  • the preset video packet loss rate threshold corresponding to the preset PSNR threshold may be obtained according to the mapping relationship.
  • the PSNR value of the video stream drops rapidly.
  • the PSNR value drops below 25 and the video stream The quality reaches the poor level, so the threshold for the packet loss rate can be set to 1.5%.
  • the video stream is converted into an audio stream for transmission in the case that the network condition is deteriorated to be insufficient to support the video call, thereby effectively improving the user experience.
  • the mapping relationship between the packet loss ratio and the PSNR is obtained in advance, and the packet loss rate threshold is obtained through the mapping relationship, so that the detection efficiency of the video quality (or audio quality) can be effectively improved.
  • PSNR is used to directly predict the quality of video or audio
  • a large amount of resources are consumed, which seriously affects the execution efficiency.
  • the video signal quality is characterized by a higher accuracy peak signal-to-noise ratio PSNR, and the mapping relationship between the packet loss rate and the PSNR is established, and the convenience of the preset video packet loss rate threshold is obtained. higher
  • the video packet loss ratio is determined by: acquiring first information carried in a real-time transmission control protocol RTCP data packet of the video, where the first information includes information of an input message and an output message of the video; The input packet and the output packet calculate the video packet loss rate.
  • Network delay and network jitter can cause packet delay and be discarded, resulting in an increase in media packet loss rate, resulting in degradation of video image quality.
  • the related information fed back in the Real-time Transport Control Protocol (RTCP) packet such as the input packet and the output packet, may be collected, and the identifier of the current network is calculated according to the feedback information.
  • RTCP Real-time Transport Control Protocol
  • the delay, jitter, bandwidth, etc. of the current network can also be determined by the related information fed back in the RTCP packet. Jitter and available bandwidth can be used as an auxiliary parameter for terminal information statistics. This embodiment occupies less resources and has a high calculation effect.
  • the method further includes: detecting an audio quality of the audio in the second preset time period; determining whether the audio quality is always not lower than a preset audio quality threshold; and determining that the audio quality is always Convert audio to video without falling below the preset audio quality threshold.
  • the audio is also restored to video at an appropriate timing, so that the user experience is better.
  • determining whether the audio quality is always lower than the preset audio quality threshold includes: determining whether the audio packet loss rate of the audio is always lower than a preset audio packet loss rate threshold; and determining that the audio loss rate of the audio is always low. In the case of a preset audio packet loss rate threshold, it is determined that the audio quality is always not lower than the preset audio quality threshold.
  • the relevant network indicator information in the audio conference can be extracted, and the current audio packet loss rate is calculated according to the network indicator information; if the previous packet loss rate is less than If the packet loss threshold is 1.5%, the packet loss rate at the current time point T0 is recorded. Otherwise, the packet loss rate is continued. Starting from T0, statistics on packet loss during continuous t (up to 10 seconds) are counted. The rate is always less than the packet loss threshold, and the audio is restored to video.
  • This embodiment characterizes the audio quality by the packet loss rate, and the acquisition of the packet loss rate does not occupy too much. More system resources, therefore, this method is more conducive to saving system resources.
  • determining an audio packet loss rate by acquiring second information in a real-time transmission control protocol RTCP data packet of the audio, where the second information includes an audio input message and an output message information; The input packet and the output packet are used to calculate the audio packet loss rate.
  • Network delay and network jitter can cause packet delay and be discarded, resulting in an increase in media packet loss rate, resulting in degradation of video image quality.
  • the related information fed back in the Real-time Transport Control Protocol (RTCP) packet such as the input packet and the output packet, may be collected, and the identifier of the current network is calculated according to the feedback information. - Audio packet loss rate.
  • the delay, jitter, bandwidth, etc. of the current network can also be determined by the related information fed back in the RTCP packet. This embodiment occupies less resources and has a high calculation effect.
  • an optional method for converting video into audio is provided. As shown in FIG. 2, the method includes:
  • Step S202 calculating a current packet loss rate of the video.
  • the packet loss rate is equivalent to the above video packet loss rate.
  • step S204 it is determined whether the current packet loss rate is greater than the packet loss rate threshold of 1.5%. If the determination result is yes, step S206 is performed; if the determination result is negative, step S202 is performed.
  • the packet loss rate threshold of 1.5% can be used as an optional implementation manner of the preset video packet loss rate threshold.
  • step S206 the current time point T0 is recorded.
  • Step S208 detecting a continuous packet loss condition in the time T0 to t; t is greater than T0; t and T0 are both positive numbers.
  • the time from T0 to t corresponds to the first preset time period described above.
  • step S210 when the packet loss rate is detected to be greater than 1.5%, the image quality is predicted by the canny operator edge detection algorithm.
  • Step S212 determining whether the image quality reaches a preset that requires video switching to audio.
  • the image quality threshold if the determination result is yes, step S214 is performed; if the determination result is no, step S204 is performed.
  • step S214 the video is switched to audio.
  • This embodiment proposes a suitable change strategy based on image quality prediction in video conferencing.
  • This method realizes network service quality feedback mechanism, packet loss rate prediction and image quality detection without starting FEC compensation and IP lifting speed. It can detect the quality of network video images according to the packet loss rate in real time, and make appropriate strategies to improve the user experience by predicting the image quality, which can effectively solve the problem of poor user experience caused by network status changes during video communication using IP networks. .
  • an optional method for converting audio into video is provided. As shown in FIG. 3, the method includes:
  • step S302 the current packet loss rate of the audio is calculated.
  • the packet loss rate is equivalent to the above audio packet loss rate.
  • step S304 it is determined whether the packet loss rate is greater than a packet loss rate threshold of 1.5%. If the determination result is yes, step S306 is performed. If the determination result is no, step S302 is performed.
  • the packet loss rate threshold of 1.5% can be used as an optional implementation manner of the preset audio packet loss rate threshold.
  • step S306 the current time point T1 is recorded.
  • Step S308 detecting a continuous packet loss situation from T1 to t1.
  • the time T1 to t1 corresponds to the second preset time period described above.
  • T1 to t1 time and the above T to t time may be the same time period.
  • the packet loss rate threshold and the packet loss rate threshold in FIG. 2 may both be 1.5%, or may be set to different values.
  • step S310 the audio is restored to the video when the packet loss rate is detected to be lower than 1.5%.
  • the packet loss rate of the video may be further calculated according to step S302 to perform video to audio switching at an appropriate timing.
  • FIG. 2 and FIG. 3 above fully take into account the fact that the FEC is started in the related art.
  • a series of defects (occupying resources, low efficiency, etc.) in the form of packet compensation or IP speed-up, combined with the phenomenon of packet loss caused by network congestion, and the packet loss rate is proposed as a parameter for characterizing video or audio quality.
  • the timing to perform the audio and video switching strategy reduces the frequency of FEC packet loss compensation or IP speed, effectively ensuring smooth conversations such as video conferencing.
  • a complete solution is proposed for the impact of network status changes on the multimedia call user experience.
  • the minimum tolerable packet loss rate threshold in video transmission is determined, and then the network information feedback module is used to predict the packet loss rate in the network and predict. Image quality, and finally through a specific audio and video switching method, to achieve a choice of adaptable strategies for audio and video transmission under different network conditions.
  • the multimedia adapting method proposed in this embodiment can adapt to the network condition, and determines the medium (audio or video) most suitable for the user to perform multimedia call according to the packet loss rate in the network. This method is important for improving the user experience of the multimedia call. significance.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • an audio-video conversion device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 4 is a structural block diagram of an audio-video conversion apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes:
  • the detecting module 40 is configured to detect a video quality of the video during the first preset time period
  • the determining module 42 is configured to determine whether the video quality is always lower than a preset video quality threshold
  • the conversion module 44 is arranged to convert the video into audio if the result of the determination is YES.
  • the detection module 40 continuously detects whether the video quality within a period of time is always lower than the preset video quality threshold, and when the determining module 42 determines that the video quality is always lower than the preset video quality threshold,
  • the conversion module 44 performs video-to-audio switching, and solves the problem of poor video quality and reduced user experience due to network congestion, delay, or jitter in the related art.
  • the invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.
  • the converting module 44 includes: a detecting unit configured to detect an image quality of the video; a determining unit configured to determine whether the image quality is lower than a preset image quality threshold; and a converting unit configured to lower the image quality than the preset image In the case of a quality threshold, the video is converted to audio.
  • an optional audio and video conversion device is provided. As shown in FIG. 5, the device includes:
  • the VCSP module can be used as a platform-based business management center responsible for protocol, media processing, and communication.
  • the module encapsulates the video stream into an RTP (Real-time Transport Protocol) packet, a UDP (User Datagram Protocol) packet, and an IP packet, and then transmits the encapsulated IP packet to the Internet through the Internet.
  • the receiving end receives the IP data packet, and puts the video stream data into the DSP codec module (codec) for decoding according to the sequence number in the RTP header.
  • the network information feedback module is configured to collect related information fed back in the RTCP packet, and calculate the current network identifier (loss rate, delay, jitter, bandwidth, etc.) according to the feedback information, as shown in FIG.
  • the jitter, delay and available bandwidth can be used as auxiliary parameters for terminal information statistics.
  • the current network identifier can be reported to the audio and video switching module through the VCSP module.
  • the network information feedback module can implement the functions of the foregoing detection module 40.
  • the audio and video switching module is configured to predict the packet loss rate according to the network condition, report the prediction result to the image quality detecting module, accelerate the image quality prediction through the hardware of the DSP codec module, and make a suitable decision according to the prediction result; wherein, the audio call is made in the audio call
  • the audio and video switching module continuously tracks the current network condition, predicts the network service quality, and selects an appropriate timing to perform audio and video switching according to the network condition (including video switching to audio, and automatically restores the original video at a suitable timing thereafter). meeting).
  • the audio and video switching module can implement the functions of the determining module 42 and the converting module 44.
  • the embodiment fully utilizes the RTCP data packet to monitor the network service quality in real time, and can obtain the corresponding relationship between the predicted network packet loss rate and the image quality, and fully utilizes the DSP codec module to perform all non-reference image quality calculations and predicts, which is suitable for adaptation.
  • the strategy provides reliable parameter metrics.
  • the embodiment realizes the switching between audio and video without starting the FEC codec and the IP lifting speed through the adaptable strategy, thereby effectively ensuring the smooth call during the video conference call.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the above modules are in any combination.
  • the forms are located in different processors.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the video is executed.
  • the switching to the audio solves the problem that the video quality is poor and the user experience is reduced due to network congestion, delay or jitter in the related art.
  • the invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

An audio and video conversion method and device. The method comprises: detecting video quality of a video in a first preset time period; determining whether the video quality is lower than a preset video quality threshold; and if the determining result is yes, converting the video into an audio. The problems in the related art of poor video quality and lowered user experience caused by network congestions, delay or jitters are resolved.

Description

音视频转换方法及装置Audio and video conversion method and device 技术领域Technical field
本发明涉及通信领域,具体而言,涉及一种音视频转换方法及装置。The present invention relates to the field of communications, and in particular to an audio and video conversion method and apparatus.
背景技术Background technique
随着视频会议技术的迅猛发展,网络上的多媒体应用越来越丰富,人们对多媒体业务也产生了更高的要求,其中,不仅要求***支持的媒体类型丰富,还要求保证进行多媒体通话时的QoS(Quality of Service,服务质量)、FEC(Forward Error Correction,向前纠错)和IP升降速。但是,虽然IP网络尽力而为,但由于UDP(User Datagram Protocol,用户数据报协议)的不可靠性,往往导致视频或者音频质量下降,这已成为人们享受高质量视频和音频的障碍。With the rapid development of video conferencing technology, multimedia applications on the network are becoming more and more abundant, and people have higher requirements for multimedia services. Among them, not only the types of media supported by the system are required, but also the multimedia calls are guaranteed. QoS (Quality of Service), FEC (Forward Error Correction), and IP speed. However, although the IP network does its best, the unreliability of the User Datagram Protocol (UDP) often leads to a decline in video or audio quality, which has become an obstacle for people to enjoy high quality video and audio.
为了解决在召开视频会议时,由于网络堵塞、延时或者抖动引起的视频质量较差、用户体验降低的问题,目前的主要方法是启动FEC补偿和IP升降速,以适应各种网络的丢包和异常。但是,该方法通常是手动完成,便捷性较差;并且,该方法不能立即解码,引入了延时,造成了带宽增加,执行效率较低。因此,现有启动FEC补偿和IP升降速的方法并不能有效保障视频会议中的用户体验。In order to solve the problem of poor video quality and reduced user experience caused by network congestion, delay or jitter during video conferencing, the current main method is to start FEC compensation and IP speed increase to adapt to packet loss of various networks. And exceptions. However, this method is usually done manually, and the convenience is poor. Moreover, the method cannot be decoded immediately, and a delay is introduced, which causes an increase in bandwidth and a low execution efficiency. Therefore, the existing methods of starting FEC compensation and IP lifting speed cannot effectively guarantee the user experience in video conferencing.
针对相关技术中,由于网络堵塞、延时或者抖动引起的视频质量较差、用户体验降低的问题,目前尚未提出有效的解决方案。In the related art, an effective solution has not been proposed yet due to poor video quality and reduced user experience due to network congestion, delay or jitter.
发明内容Summary of the invention
本发明实施例提供了一种音视频转换方法及装置,以至少解决相关技术中由于网络堵塞、延时或者抖动引起的视频质量较差、用户体验降低的问题。The embodiment of the invention provides a method and a device for converting audio and video, so as to at least solve the problem that the video quality is poor and the user experience is reduced due to network congestion, delay or jitter in the related art.
根据本发明的一个实施例,提供了一种音视频转换方法,包括:在第一预设时间段内检测视频的视频质量;判断视频质量是否始终低于预设视 频质量阈值;在判断结果为是的情况下,将视频转换为音频。According to an embodiment of the present invention, an audio-video conversion method is provided, including: detecting a video quality of a video in a first preset time period; determining whether the video quality is always lower than a preset view. Frequency quality threshold; in the case of a yes result, the video is converted to audio.
可选地,将视频转换为音频包括:检测视频的图像质量;判断图像质量是否低于预设图像质量阈值;在图像质量低于预设图像质量阈值的情况下,将视频转换为音频。Optionally, converting the video into audio comprises: detecting an image quality of the video; determining whether the image quality is lower than a preset image quality threshold; and converting the video into audio if the image quality is lower than a preset image quality threshold.
可选地,判断视频质量是否始终低于预设视频质量阈值包括:判断视频的视频丢包率是否始终不低于预设视频丢包率阈值;在判断结果为视频的视频丢包率始终不低于预设视频丢包率阈值的情况下,确定视频质量始终低于预设视频质量阈值。Optionally, determining whether the video quality is always lower than the preset video quality threshold includes: determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold; and determining that the video loss rate of the video is never Below the preset video packet loss rate threshold, it is determined that the video quality is always lower than the preset video quality threshold.
可选地,在判断视频的视频丢包率是否始终不低于预设视频丢包率阈值之前,通过以下方式确定预设视频丢包率阈值:建立丢包率与用于表征视频质量的峰值信噪比PSNR之间的映射关系;根据映射关系,确定预设PSNR阈值对应的预设视频丢包率阈值。Optionally, before determining whether the video packet loss rate of the video is not lower than the preset video packet loss rate threshold, determine a preset video packet loss rate threshold by establishing a packet loss rate and a peak value for characterizing the video quality. The mapping relationship between the signal-to-noise ratio (PSNR) and the preset video loss rate threshold corresponding to the preset PSNR threshold is determined according to the mapping relationship.
可选地,通过以下方式确定视频丢包率:获取视频的实时传输控制协议RTCP数据包中携带的第一信息,其中,第一信息包括视频的输入报文和输出报文的信息;根据视频的输入报文和输出报文计算视频丢包率。Optionally, the video packet loss ratio is determined by: acquiring first information carried in a real-time transmission control protocol RTCP data packet of the video, where the first information includes information of an input message and an output message of the video; The input packet and the output packet calculate the video packet loss rate.
可选地,在将视频转换为音频之后,还包括:在第二预设时间段内检测音频的音频质量;判断音频质量是否始终不低于预设音频质量阈值;在判断结果为音频质量始终不低于预设音频质量阈值的情况下,将音频转换为视频。Optionally, after the video is converted into audio, the method further includes: detecting an audio quality of the audio in the second preset time period; determining whether the audio quality is always not lower than a preset audio quality threshold; and determining that the audio quality is always Convert audio to video without falling below the preset audio quality threshold.
可选地,判断音频质量是否始终不低于预设音频质量阈值包括:判断音频的音频丢包率是否始终低于预设音频丢包率阈值;在判断结果为音频的音频丢包率始终低于预设音频丢包率阈值的情况下,确定音频质量始终不低于预设音频质量阈值。Optionally, determining whether the audio quality is always lower than the preset audio quality threshold includes: determining whether the audio packet loss rate of the audio is always lower than a preset audio packet loss rate threshold; and determining that the audio loss rate of the audio is always low. In the case of a preset audio packet loss rate threshold, it is determined that the audio quality is always not lower than the preset audio quality threshold.
可选地,通过以下方式确定音频丢包率:获取音频的实时传输控制协议RTCP数据包中的第二信息,其中,第二信息包括音频的输入报文和输出报文的信息;根据音频的输入报文和输出报文计算音频丢包率。Optionally, determining an audio packet loss rate by acquiring second information in a real-time transmission control protocol RTCP data packet of the audio, where the second information includes an audio input message and an output message information; The input packet and the output packet are used to calculate the audio packet loss rate.
根据本发明的另一个实施例,提供了一种音视频转换装置,包括:检 测模块,设置为在第一预设时间段内检测视频的视频质量;判断模块,设置为判断视频质量是否始终低于预设视频质量阈值;转换模块,设置为在判断结果为是的情况下,将视频转换为音频。According to another embodiment of the present invention, an audio-video conversion apparatus is provided, including: The measuring module is configured to detect the video quality of the video in the first preset time period; the determining module is configured to determine whether the video quality is always lower than the preset video quality threshold; and the conversion module is set to be in the case that the determination result is yes , convert the video to audio.
可选地,转换模块包括:检测单元,设置为检测视频的图像质量;判断单元,设置为判断图像质量是否低于预设图像质量阈值;转换单元,设置为在图像质量低于预设图像质量阈值的情况下,将视频转换为音频。Optionally, the conversion module includes: a detecting unit configured to detect an image quality of the video; a determining unit configured to determine whether the image quality is lower than a preset image quality threshold; and a converting unit configured to lower the image quality than the preset image quality In the case of a threshold, the video is converted to audio.
根据本发明的又一个实施例,还提供了一种存储介质。该存储介质设置为存储用于执行以下步骤的程序代码:在第一预设时间段内检测视频的视频质量;判断视频质量是否始终低于预设视频质量阈值;在判断结果为是的情况下,将视频转换为音频。According to still another embodiment of the present invention, a storage medium is also provided. The storage medium is configured to store program code for performing the following steps: detecting a video quality of the video within a first preset time period; determining whether the video quality is always lower than a preset video quality threshold; and if the determination result is yes , convert the video to audio.
根据本发明的又一个实施例,还提供了一种存储介质。该存储介质设置为存储用于执行以下步骤的程序代码:According to still another embodiment of the present invention, a storage medium is also provided. The storage medium is arranged to store program code for performing the following steps:
在第一预设时间段内检测视频的视频质量;判断所述视频质量是否始终低于预设视频质量阈值;在判断结果为是的情况下,将所述视频转换为音频。Detecting a video quality of the video in a first preset time period; determining whether the video quality is always lower than a preset video quality threshold; and if the determination result is yes, converting the video into audio.
可选地,存储介质还设置为存储用于执行以下步骤的程序代码:Optionally, the storage medium is further arranged to store program code for performing the following steps:
检测所述视频的图像质量;判断所述图像质量是否低于预设图像质量阈值;在所述图像质量低于所述预设图像质量阈值的情况下,将所述视频转换为所述音频。Detecting an image quality of the video; determining whether the image quality is lower than a preset image quality threshold; and converting the video to the audio if the image quality is lower than the preset image quality threshold.
可选地,存储介质还设置为存储用于执行以下步骤的程序代码:Optionally, the storage medium is further arranged to store program code for performing the following steps:
判断所述视频的视频丢包率是否始终不低于预设视频丢包率阈值;在判断结果为所述视频的视频丢包率始终不低于所述预设视频丢包率阈值的情况下,确定所述视频质量始终低于所述预设视频质量阈值。Determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold; if the result of the video is that the video packet loss rate of the video is not lower than the preset video packet loss rate threshold And determining that the video quality is always lower than the preset video quality threshold.
可选地,存储介质还设置为存储用于执行以下步骤的程序代码:Optionally, the storage medium is further arranged to store program code for performing the following steps:
在判断所述视频的视频丢包率是否始终不低于预设视频丢包率阈值之前,建立丢包率与用于表征视频质量的峰值信噪比PSNR之间的映射关 系;根据所述映射关系,确定预设PSNR阈值对应的所述预设视频丢包率阈值。Before determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold, establishing a mapping between a packet loss rate and a peak signal to noise ratio PSNR for characterizing the video quality And determining, according to the mapping relationship, the preset video packet loss rate threshold corresponding to the preset PSNR threshold.
可选地,存储介质还设置为存储用于执行以下步骤的程序代码:Optionally, the storage medium is further arranged to store program code for performing the following steps:
获取所述视频的实时传输控制协议RTCP数据包中携带的第一信息,其中,所述第一信息包括所述视频的输入报文和输出报文的信息;根据所述视频的输入报文和输出报文计算所述视频丢包率。Acquiring the first information carried in the real-time transmission control protocol RTCP data packet of the video, where the first information includes information of an input message and an output message of the video; and an input message according to the video The output message calculates the video packet loss rate.
通过本发明实施例,由于持续检测一段时间之内的视频质量是否始终低于预设视频质量阈值,并在视频质量始终低于预设视频质量阈值的情况下,执行视频到音频的切换,解决了相关技术中由于网络堵塞、延时或者抖动引起的视频质量较差、用户体验降低的问题。本发明在不启动FEC编解码和IP升降速的前提下,实现了音视频切换,有效改善了视频会议通话中的用户体验。According to the embodiment of the present invention, since the video quality within a period of time is always lower than the preset video quality threshold, and the video quality is always lower than the preset video quality threshold, the video to audio switching is performed to solve the problem. The related art has a problem of poor video quality and reduced user experience due to network congestion, delay, or jitter. The invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是根据本发明实施例的音视频转换方法的流程图;1 is a flowchart of an audio-video conversion method according to an embodiment of the present invention;
图2是根据本发明实施例的一种可选的视频转换为音频的方法的流程图;2 is a flow chart of an alternative method of video conversion to audio in accordance with an embodiment of the present invention;
图3是根据本发明实施例的一种可选的音频转换为视频的方法的流程图;3 is a flow chart of an alternative method of converting audio to video in accordance with an embodiment of the present invention;
图4是根据本发明实施例的音视频转换装置的结构框图;4 is a block diagram showing the structure of an audio-video conversion apparatus according to an embodiment of the present invention;
图5是根据本发明实施例的一种可选的音视频转换装置的示意图;FIG. 5 is a schematic diagram of an optional audio and video conversion apparatus according to an embodiment of the present invention; FIG.
图6是根据本发明实施例的一种可选的网络信息反馈模块的示意图。FIG. 6 is a schematic diagram of an optional network information feedback module according to an embodiment of the invention.
具体实施方式 detailed description
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order.
在本实施例中提供了一种音视频转换方法,图1是根据本发明实施例的音视频转换方法的流程图,如图1所示,该流程包括如下步骤:An audio and video conversion method is provided in this embodiment. FIG. 1 is a flowchart of an audio and video conversion method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
步骤S102,在第一预设时间段内检测视频的视频质量;Step S102: Detecting video quality of the video in the first preset time period;
步骤S104,判断视频质量是否始终低于预设视频质量阈值;Step S104, determining whether the video quality is always lower than a preset video quality threshold;
步骤S106,在判断结果为是的情况下,将视频转换为音频。In step S106, if the result of the determination is YES, the video is converted into audio.
可选地,上述步骤的执行主体可以为处理器,特别是应用于视频会议中的处理器,但不限于此。Optionally, the execution body of the foregoing steps may be a processor, in particular, a processor used in a video conference, but is not limited thereto.
可选地,根据该实施例的音视频转换方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。Alternatively, the audio-video conversion method embodiment according to this embodiment may be performed in a mobile terminal, a computer terminal, or the like.
可选地,步骤S102可以包括:在预设时间段内,每间隔预设时间检测一次视频质量。例如,预设时间段为从视频会议的当前时间T0直至延续10秒,在这10秒时间之内,每间隔1秒检测一次视频会议的视频质量。Optionally, step S102 may include: detecting video quality once every preset time interval within a preset time period. For example, the preset time period is from the current time T0 of the video conference to the continuation of 10 seconds, and within 10 seconds, the video quality of the video conference is detected every interval of 1 second.
可选地,步骤S104可以包括:在该时间段内,判断每次检测到的视频质量是否均低于预设视频质量阈值。例如,时间点T0、T0+1、T0+2……T0+9,每个时间点检测到的视频质量均低于预设视频质量阈值。Optionally, step S104 may include: determining, during the time period, whether each detected video quality is lower than a preset video quality threshold. For example, the time points T0, T0+1, T0+2, ... T0+9, the video quality detected at each time point is lower than the preset video quality threshold.
需要说明的是,上述的预设视频质量阈值可以是用户的经验值或者可以是根据用户需求设置的。It should be noted that the preset video quality threshold may be an experience value of the user or may be set according to user requirements.
上述的视频质量可以通过表征视频质量的其他参量获取。例如,可以通过峰值信噪比PSNR(Peak Signal to Noise Ratio)、图像相似度指标SSIM(Structural Similarity Index Measurement)、信噪比SNR(Signal Noise Ratio)、时延、抖动、丢包率及其结合进行表征,但不限于此。 The video quality described above can be obtained by other parameters that characterize the quality of the video. For example, the Peak Signal to Noise Ratio (PSNR), the Image Similarity Index (SSIM), the Signal-to-Noise Ratio (SNR), the delay, the jitter, the packet loss rate, and the combination thereof can be used. Characterization is performed, but is not limited thereto.
本发明实施例,通过持续检测一段时间之内的视频质量是否始终低于预设视频质量阈值,并在视频质量始终低于预设视频质量阈值的情况下,执行视频到音频的切换,解决了相关技术中由于网络堵塞、延时或者抖动引起的视频质量较差、用户体验降低的问题。本发明在不启动FEC编解码和IP升降速的前提下,实现了音视频切换,有效改善了视频会议通话中的用户体验。In the embodiment of the present invention, by continuously detecting whether the video quality within a period of time is always lower than a preset video quality threshold, and performing video to audio switching in a case where the video quality is always lower than a preset video quality threshold, the solution is solved. In the related art, the video quality is poor due to network congestion, delay, or jitter, and the user experience is reduced. The invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.
可选地,将视频转换为音频包括:检测视频的图像质量;判断图像质量是否低于预设图像质量阈值;在图像质量低于预设图像质量阈值的情况下,将视频转换为音频。Optionally, converting the video into audio comprises: detecting an image quality of the video; determining whether the image quality is lower than a preset image quality threshold; and converting the video into audio if the image quality is lower than a preset image quality threshold.
考虑到视频会议中传输的视频流是实时的,不易获取原始视频,因此该实施例中,可优先采用无参考帧客观质量评测方法获取视频的图像质量。其中,图像的清晰度算法可选择基于canny算子边缘检测算法,该算法是最优的阶梯型边缘检测算法,它具有能滤去噪声又保持边缘特性的边缘检测最优滤波器,其采用一阶微分滤波器。具体地,其采用二维高斯函数的任意方向上的一阶方向导数为噪声滤波器,通过与图像卷积进行滤波,然后对滤波后的图像寻找图像梯度的局部最大值,以此来确定图像边缘;根据对信噪比与定位乘积进行测度,得到最优化逼近算子;清晰度分值在0~100区间内;对一幅图像提取轮廓,对其进行高斯滤波后再计算轮廓,两次的轮廓像素差值可以表征图像清晰度。可选地,可设定适当的上下阈值,当低于下阈值,或者高于上阈值,清晰度得分都为0,当在上下阈值之间,清晰度得分在0~100之间。Considering that the video stream transmitted in the video conference is real-time, it is difficult to obtain the original video. Therefore, in this embodiment, the image quality of the video can be obtained by using the objective quality evaluation method without reference frame. Among them, the image sharpness algorithm can be selected based on the canny operator edge detection algorithm, which is an optimal step type edge detection algorithm, which has an edge detection optimal filter capable of filtering out noise and maintaining edge characteristics, which adopts a Order differential filter. Specifically, the first-order directional derivative in any direction of the two-dimensional Gaussian function is used as a noise filter, and is filtered by image convolution, and then the local maximum value of the image gradient is searched for the filtered image to determine the image. Edge; according to the signal-to-noise ratio and positioning product, the optimal approximation operator is obtained; the resolution score is in the range of 0 to 100; the contour is extracted from an image, and Gaussian filtering is performed to calculate the contour twice. The contour pixel difference can characterize the image clarity. Optionally, an appropriate upper and lower thresholds may be set. When the lower threshold is lower than the lower threshold, the sharpness score is 0, and between the upper and lower thresholds, the sharpness score is between 0 and 100.
在该实施例中,使用canny算子边缘检测算法预测图像质量,准确性更好,在检测得到的视频的清晰度得分值比较低的情况下,可执行视频会议到音频会议的转换,使得视频到音频的切换更具准确性和有效性。In this embodiment, the image quality is better predicted by using the canny operator edge detection algorithm, and the conversion of the video conference to the audio conference can be performed in the case where the detected sound score value of the video is relatively low. Video to audio switching is more accurate and effective.
可选地,判断视频质量是否始终低于预设视频质量阈值包括:判断视频的视频丢包率是否始终不低于预设视频丢包率阈值;在判断结果为视频的视频丢包率始终不低于预设视频丢包率阈值的情况下,确定视频质量始 终低于预设视频质量阈值。Optionally, determining whether the video quality is always lower than the preset video quality threshold includes: determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold; and determining that the video loss rate of the video is never When the video loss rate threshold is lower than the preset video, the video quality is determined. Ultimately below the preset video quality threshold.
丢包率(Loss Tolerance或Packet Loss Rate)是指丢失数据包数量占所发送数据的比率。通常来说,丢包率的计算方法为:[(输入报文-输出报文)/输入报文]*100%。丢包率与数据包长度以及包发送频率相关。可以通过RTCP报文获得相应信息(输入报文和输出报文)。一般网络堵塞、带宽不足、延时或者抖动会引起网络丢包。Loss Tolerance or Packet Loss Rate is the ratio of the number of lost packets to the data sent. Generally speaking, the packet loss rate is calculated as: [(input message-output message)/input message]*100%. The packet loss rate is related to the packet length and the packet transmission frequency. The corresponding information (input message and output message) can be obtained through the RTCP message. General network congestion, insufficient bandwidth, delay or jitter can cause network packet loss.
例如,提取视频会议通话相关的网络指标信息,根据该网络指标信息计算当前的视频丢包率;当前丢包率如果大于丢包率阈值1.5%,则记录当前时间点T0时刻的丢包率值,否则检测丢包率;从T0开始,连续t(可以估算10秒)时间内的丢包情况统计,如果丢包率一直大于1.5%,则执行视频到音频的转换。另外,在判断出丢包率一直大于1.5%的情况下,在执行视频到音频的转换之前,还可以对图像进行检测,如果图像的清晰度较差,则确定执行视频到音频的转换。For example, the network indicator information related to the video conference call is extracted, and the current video packet loss rate is calculated according to the network indicator information. If the current packet loss rate is greater than the packet loss rate threshold of 1.5%, the packet loss rate at the current time point T0 is recorded. Otherwise, the packet loss rate is detected; from T0, the packet loss statistics are continuous for t (which can be estimated for 10 seconds). If the packet loss rate is always greater than 1.5%, the video to audio conversion is performed. In addition, in the case where it is judged that the packet loss rate is always greater than 1.5%, the image can be detected before the video-to-audio conversion is performed, and if the sharpness of the image is poor, it is determined that the video-to-audio conversion is performed.
该实施例通过丢包率来表征视频质量,由于丢包率的获取不会占用太多的***资源,因此,该方法更加有助于节省***资源。In this embodiment, the video quality is characterized by the packet loss rate. Since the packet loss rate acquisition does not occupy too much system resources, the method is more conducive to saving system resources.
可选地,在判断视频的视频丢包率是否始终不低于预设视频丢包率阈值之前,通过以下方式确定预设视频丢包率阈值:建立丢包率与用于表征视频质量的峰值信噪比PSNR之间的映射关系;根据映射关系,确定预设PSNR阈值对应的预设视频丢包率阈值。Optionally, before determining whether the video packet loss rate of the video is not lower than the preset video packet loss rate threshold, determine a preset video packet loss rate threshold by establishing a packet loss rate and a peak value for characterizing the video quality. The mapping relationship between the signal-to-noise ratio (PSNR) and the preset video loss rate threshold corresponding to the preset PSNR threshold is determined according to the mapping relationship.
在该实施例中,通过研究发现,视频质量变差实际上是由于时延抖动严重而导致的数据包延迟而被丢弃,造成媒体丢包率增加,从而导致视屏质量下降。在该实施例中,通过建立丢包率和视频质量等级的映射关系,可以获知丢包率对视频质量的影响规律,进而可以获知当视频质量恶化到一定程度时(不足以支持视频通话),对应的丢包率为多大。可选地,可以通过SSIM、SNR和PSNR算法等对视频的图像质量进行评价。优选地,可使用PSNR来进行视频的图像质量评测(作为视频质量)。表1提供了一种可选的PSNR值和MOS(Mean Opinion Score,平均意见值)的对应 关系,其中,MOS为衡量通信***音视频质量的指标,具体如表1所示。In this embodiment, it is found through research that video quality degradation is actually discarded due to packet delay caused by severe delay jitter, resulting in an increase in media packet loss rate, resulting in degradation of video quality. In this embodiment, by establishing a mapping relationship between the packet loss rate and the video quality level, the influence of the packet loss rate on the video quality can be known, and then it can be known that when the video quality deteriorates to a certain extent (not enough to support the video call), The corresponding packet loss rate is how big. Alternatively, the image quality of the video can be evaluated by SSIM, SNR, and PSNR algorithms. Preferably, the PSNR can be used to perform image quality evaluation of the video (as video quality). Table 1 provides an alternative PSNR value and MOS (Mean Opinion Score) Relationship, in which MOS is an indicator for measuring the audio and video quality of a communication system, as shown in Table 1.
表1Table 1
PSNRPSNR MOSMOS
>37>37 5(Excellent)5 (Excellent)
31~3731~37 4(good)4 (good)
25~3125~31 3(fair)3 (fair)
20~2520~25 2(poor)2 (poor)
<20<20 1(bad)1 (bad)
从表1可见,PSNR和视频质量等级之间同样存在一定的映射关系。例如,设定MOS等级达到2以上认为视频质量较好,则对应的预设视频质量阈值为3。按照表1所述的对应关系,则预设PSNR阈值为25。It can be seen from Table 1 that there is also a certain mapping relationship between PSNR and video quality level. For example, if the MOS level is set to 2 or higher and the video quality is good, the corresponding preset video quality threshold is 3. According to the correspondence described in Table 1, the preset PSNR threshold is 25.
在建立了视频丢包率和PSNR之间的映射关系、以及确定了预设PSNR阈值之后,则可以根据该映射关系获取预设PSNR阈值对应的预设视频丢包率阈值。After the mapping relationship between the video packet loss rate and the PSNR is established, and the preset PSNR threshold is determined, the preset video packet loss rate threshold corresponding to the preset PSNR threshold may be obtained according to the mapping relationship.
例如,针对H.246或H.264HP格式的视频流,当丢包发生时,视频流的PSNR值下降很快,当丢包率达到1.5%左右时,PSNR值就下降到25以下,视频流的质量达到poor等级,因此,可将丢包率的阈值设定为1.5%。For example, for a video stream in H.246 or H.264HP format, when a packet loss occurs, the PSNR value of the video stream drops rapidly. When the packet loss rate reaches about 1.5%, the PSNR value drops below 25, and the video stream The quality reaches the poor level, so the threshold for the packet loss rate can be set to 1.5%.
该实施例,可以保证在网络状况恶化到不足以支持视频通话的情况下,将视频流转换成音频流进行传输,从而有效改善用户体验。In this embodiment, it is ensured that the video stream is converted into an audio stream for transmission in the case that the network condition is deteriorated to be insufficient to support the video call, thereby effectively improving the user experience.
需要说明的是,该实施例通过预先获取丢包率和PSNR之间的映射关系,并通过该映射关系获取丢包率阈值,可以有效提升视频质量(或者音频质量)的检测效率。例如,如果采用PSNR直接进行视频或者音频的质量预测的话,则每次进行PSNR的采集,均会消耗大量的资源,严重影响执行效率。It should be noted that, in this embodiment, the mapping relationship between the packet loss ratio and the PSNR is obtained in advance, and the packet loss rate threshold is obtained through the mapping relationship, so that the detection efficiency of the video quality (or audio quality) can be effectively improved. For example, if PSNR is used to directly predict the quality of video or audio, each time PSNR acquisition is performed, a large amount of resources are consumed, which seriously affects the execution efficiency.
该实施例,采用准确性更高的峰值信噪比PSNR表征视频质量,并建立了丢包率和PSNR之间的映射关系,获取预设视频丢包率阈值的便捷性 更高In this embodiment, the video signal quality is characterized by a higher accuracy peak signal-to-noise ratio PSNR, and the mapping relationship between the packet loss rate and the PSNR is established, and the convenience of the preset video packet loss rate threshold is obtained. higher
可选地,通过以下方式确定视频丢包率:获取视频的实时传输控制协议RTCP数据包中携带的第一信息,其中,第一信息包括视频的输入报文和输出报文的信息;根据视频的输入报文和输出报文计算视频丢包率。Optionally, the video packet loss ratio is determined by: acquiring first information carried in a real-time transmission control protocol RTCP data packet of the video, where the first information includes information of an input message and an output message of the video; The input packet and the output packet calculate the video packet loss rate.
网络时延和网络抖动严重会导致数据包延迟而被丢弃,造成媒体丢包率增加,从而造成视频图像质量下降。本发明实施例中,可通过收集实时传输控制协议RTCP(Real-time Transport Control Protocol)包中反馈的相关信息,如输入报文和输出报文等,并根据反馈的信息计算当前网络的标识量-视频丢包率。另外,也可通过RTCP包中反馈的相关信息确定当前网络的时延、抖动、带宽等。抖动和可用带宽可以作为终端信息统计的辅助参数。该实施例占用资源较少,计算效果高。Network delay and network jitter can cause packet delay and be discarded, resulting in an increase in media packet loss rate, resulting in degradation of video image quality. In the embodiment of the present invention, the related information fed back in the Real-time Transport Control Protocol (RTCP) packet, such as the input packet and the output packet, may be collected, and the identifier of the current network is calculated according to the feedback information. - Video packet loss rate. In addition, the delay, jitter, bandwidth, etc. of the current network can also be determined by the related information fed back in the RTCP packet. Jitter and available bandwidth can be used as an auxiliary parameter for terminal information statistics. This embodiment occupies less resources and has a high calculation effect.
可选地,在将视频转换为音频之后,还包括:在第二预设时间段内检测音频的音频质量;判断音频质量是否始终不低于预设音频质量阈值;在判断结果为音频质量始终不低于预设音频质量阈值的情况下,将音频转换为视频。该实施例,在进行视频转换为音频的处理之后,在合适的时机还会将音频恢复为视频,使得用户体验更佳。Optionally, after the video is converted into audio, the method further includes: detecting an audio quality of the audio in the second preset time period; determining whether the audio quality is always not lower than a preset audio quality threshold; and determining that the audio quality is always Convert audio to video without falling below the preset audio quality threshold. In this embodiment, after performing video conversion to audio processing, the audio is also restored to video at an appropriate timing, so that the user experience is better.
可选地,判断音频质量是否始终不低于预设音频质量阈值包括:判断音频的音频丢包率是否始终低于预设音频丢包率阈值;在判断结果为音频的音频丢包率始终低于预设音频丢包率阈值的情况下,确定音频质量始终不低于预设音频质量阈值。Optionally, determining whether the audio quality is always lower than the preset audio quality threshold includes: determining whether the audio packet loss rate of the audio is always lower than a preset audio packet loss rate threshold; and determining that the audio loss rate of the audio is always low. In the case of a preset audio packet loss rate threshold, it is determined that the audio quality is always not lower than the preset audio quality threshold.
例如,在因网络等原因引起图像质量下降,由视频自动切换成音频之后,可提取音频会议中的相关的网络指标信息,根据该网络指标信息计算当前音频丢包率;前丢包率如果小于丢包阈值1.5%,则记录当前时间点T0时刻的丢包率值,否则继续检测丢包率;从T0开始,对连续t(可以估算10秒)时间内的丢包情况统计,如果丢包率一直小于丢包率阈值,则将音频恢复为视频。For example, after the image quality is degraded due to the network or the like, and the video is automatically switched to audio, the relevant network indicator information in the audio conference can be extracted, and the current audio packet loss rate is calculated according to the network indicator information; if the previous packet loss rate is less than If the packet loss threshold is 1.5%, the packet loss rate at the current time point T0 is recorded. Otherwise, the packet loss rate is continued. Starting from T0, statistics on packet loss during continuous t (up to 10 seconds) are counted. The rate is always less than the packet loss threshold, and the audio is restored to video.
该实施例通过丢包率来表征音频质量,由于丢包率的获取不会占用太 多的***资源,因此,该方法更加有助于节省***资源。This embodiment characterizes the audio quality by the packet loss rate, and the acquisition of the packet loss rate does not occupy too much. More system resources, therefore, this method is more conducive to saving system resources.
可选地,通过以下方式确定音频丢包率:获取音频的实时传输控制协议RTCP数据包中的第二信息,其中,第二信息包括音频的输入报文和输出报文的信息;根据音频的输入报文和输出报文计算音频丢包率。Optionally, determining an audio packet loss rate by acquiring second information in a real-time transmission control protocol RTCP data packet of the audio, where the second information includes an audio input message and an output message information; The input packet and the output packet are used to calculate the audio packet loss rate.
网络时延和网络抖动严重会导致数据包延迟而被丢弃,造成媒体丢包率增加,从而造成视频图像质量下降。本发明实施例中,可通过收集实时传输控制协议RTCP(Real-time Transport Control Protocol)包中反馈的相关信息,如输入报文和输出报文等,并根据反馈的信息计算当前网络的标识量-音频丢包率。另外,也可以通过RTCP包中反馈的相关信息确定当前网络的时延、抖动、带宽等。该实施例占用资源较少,计算效果高。Network delay and network jitter can cause packet delay and be discarded, resulting in an increase in media packet loss rate, resulting in degradation of video image quality. In the embodiment of the present invention, the related information fed back in the Real-time Transport Control Protocol (RTCP) packet, such as the input packet and the output packet, may be collected, and the identifier of the current network is calculated according to the feedback information. - Audio packet loss rate. In addition, the delay, jitter, bandwidth, etc. of the current network can also be determined by the related information fed back in the RTCP packet. This embodiment occupies less resources and has a high calculation effect.
下面根据本发明的实施例,提供了一种可选的视频转换为音频的方法,如图2所示,该方法包括:In the following, according to an embodiment of the present invention, an optional method for converting video into audio is provided. As shown in FIG. 2, the method includes:
步骤S202,计算视频的当前丢包率。Step S202, calculating a current packet loss rate of the video.
该丢包率等同于上述的视频丢包率。The packet loss rate is equivalent to the above video packet loss rate.
步骤S204,判断当前的丢包率是否大于丢包率阈值1.5%;在判断结果为是的情况下,执行步骤S206;在判断结果为否的情况下,执行步骤S202。In step S204, it is determined whether the current packet loss rate is greater than the packet loss rate threshold of 1.5%. If the determination result is yes, step S206 is performed; if the determination result is negative, step S202 is performed.
该丢包率阈值1.5%可以作为上述预设视频丢包率阈值的一种可选实施方式。The packet loss rate threshold of 1.5% can be used as an optional implementation manner of the preset video packet loss rate threshold.
步骤S206,记录当前时间点T0。In step S206, the current time point T0 is recorded.
步骤S208,检测T0至t时间内的持续丢包情况;t大于T0;t、T0均为正数。Step S208, detecting a continuous packet loss condition in the time T0 to t; t is greater than T0; t and T0 are both positive numbers.
其中,从T0至t时间相当于上述的第一预设时间段。The time from T0 to t corresponds to the first preset time period described above.
步骤S210,在检测到丢包率持续大于1.5%的情况下,利用canny算子边缘检测算法预测图像质量。In step S210, when the packet loss rate is detected to be greater than 1.5%, the image quality is predicted by the canny operator edge detection algorithm.
步骤S212,判断图像质量是否达到需要进行视频切换为音频的预设 图像质量阈值;如果判断结果为是,则执行步骤S214;如果判断结果为否,则执行步骤S204。Step S212, determining whether the image quality reaches a preset that requires video switching to audio. The image quality threshold; if the determination result is yes, step S214 is performed; if the determination result is no, step S204 is performed.
步骤S214,将视频切换为音频。In step S214, the video is switched to audio.
该实施例提出了一种基于视频会议中图像质量预测的适变策略,该方法在不启动FEC补偿和IP升降速的情况下,实现了网络服务质量反馈机制、丢包率预测和图像质量检测,可以实时根据丢包率检测网络视频图像质量,通过预测图像质量来作出适变策略以改善用户体验,可有效解决利用IP网络等进行视频通讯时因网络状况变化而造成的用户体验差的问题。This embodiment proposes a suitable change strategy based on image quality prediction in video conferencing. This method realizes network service quality feedback mechanism, packet loss rate prediction and image quality detection without starting FEC compensation and IP lifting speed. It can detect the quality of network video images according to the packet loss rate in real time, and make appropriate strategies to improve the user experience by predicting the image quality, which can effectively solve the problem of poor user experience caused by network status changes during video communication using IP networks. .
下面根据本发明的实施例,提供了一种可选的音频转换为视频的方法,如图3所示,该方法包括:In the following, according to an embodiment of the present invention, an optional method for converting audio into video is provided. As shown in FIG. 3, the method includes:
步骤S302,计算音频的当前丢包率。In step S302, the current packet loss rate of the audio is calculated.
该丢包率等同于上述的音频丢包率。The packet loss rate is equivalent to the above audio packet loss rate.
步骤S304,判断该丢包率是否大于丢包率阈值1.5%;在判断结果为是的情况下,执行步骤S306;在判断结果为否的情况下,执行步骤S302。In step S304, it is determined whether the packet loss rate is greater than a packet loss rate threshold of 1.5%. If the determination result is yes, step S306 is performed. If the determination result is no, step S302 is performed.
该丢包率阈值1.5%可以作为上述的预设音频丢包率阈值的一种可选的实施方式。The packet loss rate threshold of 1.5% can be used as an optional implementation manner of the preset audio packet loss rate threshold.
步骤S306,记录当前时间点T1。In step S306, the current time point T1 is recorded.
步骤S308,检测T1至t1时间内的持续丢包情况。Step S308, detecting a continuous packet loss situation from T1 to t1.
该T1至t1时间相当于上述的第二预设时间段。The time T1 to t1 corresponds to the second preset time period described above.
需要说明的是,T1至t1时间和上述的T至t时间可以是相同的时间段。该丢包率阈值和图2中的丢包率阈值可以均为1.5%,也可以设置为不同的值。It should be noted that the T1 to t1 time and the above T to t time may be the same time period. The packet loss rate threshold and the packet loss rate threshold in FIG. 2 may both be 1.5%, or may be set to different values.
步骤S310,在检测到丢包率持续低于1.5%的情况下,将音频恢复为视频。对于恢复后的视频,还可按照步骤S302继续计算视频的丢包率,以在合适的时机进行视频到音频的切换。In step S310, the audio is restored to the video when the packet loss rate is detected to be lower than 1.5%. For the restored video, the packet loss rate of the video may be further calculated according to step S302 to perform video to audio switching at an appropriate timing.
上述图2和图3所述的实施例,充分考虑到相关技术中启动FEC丢 包补偿或IP升降速等方式的一系列的缺陷(占用资源,效率低等),并结合发现的网络拥塞导致丢包发生的现象,提出将丢包率作为表征视频或者音频质量的一个参数,通过检测一段时间内的丢包率是否达到预设阈值,来确定是否进行音视频的转换,以及是否将音频恢复为视频,实现了在不影响音视频质量的前提下,根据网络状况选择合适的时机来执行音视频切换策略,降低了FEC丢包补偿或IP升降速的使用频率,有效保证了视频会议等通话顺畅。The embodiments described in FIG. 2 and FIG. 3 above fully take into account the fact that the FEC is started in the related art. A series of defects (occupying resources, low efficiency, etc.) in the form of packet compensation or IP speed-up, combined with the phenomenon of packet loss caused by network congestion, and the packet loss rate is proposed as a parameter for characterizing video or audio quality. By detecting whether the packet loss rate reaches a preset threshold for a certain period of time, whether to perform audio and video conversion, and whether to restore the audio to video, and selecting an appropriate one according to the network condition without affecting the quality of the audio and video The timing to perform the audio and video switching strategy reduces the frequency of FEC packet loss compensation or IP speed, effectively ensuring smooth conversations such as video conferencing.
针对网络状况变化对多媒体通话用户体验造成的影响给出了完整的解决方案,首先确定视频传输中最低容忍的丢包率阈值,然后根据网络信息反馈模块来预测网络中的丢包率,以及预测图像质量,最后通过特定的音视频切换方法,实现对不同网络状况下音视频传输的适变策略作出选择。该实施例提出的多媒体适变方法能够随网络状况进行适变,根据网络中丢包率情况确定最适合用户进行多媒体通话的媒介(音频或者视频),该方法对改善多媒体通话的用户体验具有重要意义。A complete solution is proposed for the impact of network status changes on the multimedia call user experience. First, the minimum tolerable packet loss rate threshold in video transmission is determined, and then the network information feedback module is used to predict the packet loss rate in the network and predict. Image quality, and finally through a specific audio and video switching method, to achieve a choice of adaptable strategies for audio and video transmission under different network conditions. The multimedia adapting method proposed in this embodiment can adapt to the network condition, and determines the medium (audio or video) most suitable for the user to perform multimedia call according to the packet loss rate in the network. This method is important for improving the user experience of the multimedia call. significance.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
在本实施例中还提供了一种音视频转换装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。 In the embodiment, an audio-video conversion device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
图4是根据本发明实施例的音视频转换装置的结构框图,如图4所示,该装置包括:4 is a structural block diagram of an audio-video conversion apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes:
检测模块40,设置为在第一预设时间段内检测视频的视频质量;The detecting module 40 is configured to detect a video quality of the video during the first preset time period;
判断模块42,设置为判断视频质量是否始终低于预设视频质量阈值;The determining module 42 is configured to determine whether the video quality is always lower than a preset video quality threshold;
转换模块44,设置为在判断结果为是的情况下,将视频转换为音频。The conversion module 44 is arranged to convert the video into audio if the result of the determination is YES.
本发明实施例,通过检测模块40持续检测一段时间之内的视频质量是否始终低于预设视频质量阈值,并在判断模块42判断出视频质量始终低于预设视频质量阈值的情况下,利用转换模块44执行视频到音频的切换,解决了相关技术中由于网络堵塞、延时或者抖动引起的视频质量较差、用户体验降低的问题。本发明在不启动FEC编解码和IP升降速的前提下,实现了音视频切换,有效改善了视频会议通话中的用户体验。In the embodiment of the present invention, the detection module 40 continuously detects whether the video quality within a period of time is always lower than the preset video quality threshold, and when the determining module 42 determines that the video quality is always lower than the preset video quality threshold, The conversion module 44 performs video-to-audio switching, and solves the problem of poor video quality and reduced user experience due to network congestion, delay, or jitter in the related art. The invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.
可选地,转换模块44包括:检测单元,设置为检测视频的图像质量;判断单元,设置为判断图像质量是否低于预设图像质量阈值;转换单元,设置为在图像质量低于预设图像质量阈值的情况下,将视频转换为音频。Optionally, the converting module 44 includes: a detecting unit configured to detect an image quality of the video; a determining unit configured to determine whether the image quality is lower than a preset image quality threshold; and a converting unit configured to lower the image quality than the preset image In the case of a quality threshold, the video is converted to audio.
下面根据本发明的实施例,提供了一种可选的音视频转换装置,如图5所示,该装置包括:In the following, according to an embodiment of the present invention, an optional audio and video conversion device is provided. As shown in FIG. 5, the device includes:
VCSP模块,可作为一个平台化的业务管理中心,负责协议、媒体处理、通讯等。该模块将视频流封装成RTP(Real-time Transport Protocol,实时传输协议)包、UDP(User Datagram Protocol,用户数据报协议)包和IP包,然后将封装好的IP数据包通过Internet传送到接收端;接收端收到IP数据包,根据RTP报头中的序号将视频流数据放入DSP编解码模块(编解码器)进行解码。The VCSP module can be used as a platform-based business management center responsible for protocol, media processing, and communication. The module encapsulates the video stream into an RTP (Real-time Transport Protocol) packet, a UDP (User Datagram Protocol) packet, and an IP packet, and then transmits the encapsulated IP packet to the Internet through the Internet. The receiving end receives the IP data packet, and puts the video stream data into the DSP codec module (codec) for decoding according to the sequence number in the RTP header.
网络信息反馈模块,设置为收集RTCP包中反馈的相关信息,并根据反馈的信息计算当前网络标志量(丢包率、时延、抖动、带宽等),如图6所示,其中,测量出的抖动、时延和可用带宽可以作为终端信息统计的辅助参数。上述当前网络标志量可通过VCSP模块上报给音视频切换模块。The network information feedback module is configured to collect related information fed back in the RTCP packet, and calculate the current network identifier (loss rate, delay, jitter, bandwidth, etc.) according to the feedback information, as shown in FIG. The jitter, delay and available bandwidth can be used as auxiliary parameters for terminal information statistics. The current network identifier can be reported to the audio and video switching module through the VCSP module.
其中,网络信息反馈模块可以实现上述检测模块40的功能。 The network information feedback module can implement the functions of the foregoing detection module 40.
音视频切换模块,设置为根据网络状况预测丢包率,将预测结果上报图像质量检测模块,通过DSP编解码模块硬件加速进行图像质量预测,并根据预测结果作出适变决策;其中,在音频通话中,该音视频切换模块持续跟踪当前的网络状况,预测网络服务质量,根据网络状况选择合适的时机执行音视频切换(包括视频切换为音频,并在其后的合适时机自动恢复成原来的视频会议)。The audio and video switching module is configured to predict the packet loss rate according to the network condition, report the prediction result to the image quality detecting module, accelerate the image quality prediction through the hardware of the DSP codec module, and make a suitable decision according to the prediction result; wherein, the audio call is made in the audio call The audio and video switching module continuously tracks the current network condition, predicts the network service quality, and selects an appropriate timing to perform audio and video switching according to the network condition (including video switching to audio, and automatically restores the original video at a suitable timing thereafter). meeting).
其中,音视频切换模块可以实现判断模块42和转换模块44的功能。The audio and video switching module can implement the functions of the determining module 42 and the converting module 44.
该实施例充分利用RTCP数据包实时监控网络服务质量,能够获取预测的网络丢包率和图像质量的对应关系,通过充分利用DSP编解码模块进行全无参考图像质量运算并进行预测,为适变策略提供可靠的参数指标。该实施例通过适变策略,在不启动FEC编解码和IP升降速的前提下实现了音视频之间的切换,从而有效保证了视频会议通话时的通话顺畅。The embodiment fully utilizes the RTCP data packet to monitor the network service quality in real time, and can obtain the corresponding relationship between the predicted network packet loss rate and the image quality, and fully utilizes the DSP codec module to perform all non-reference image quality calculations and predicts, which is suitable for adaptation. The strategy provides reliable parameter metrics. The embodiment realizes the switching between audio and video without starting the FEC codec and the IP lifting speed through the adaptable strategy, thereby effectively ensuring the smooth call during the video conference call.
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。It should be noted that each of the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the above modules are in any combination. The forms are located in different processors.
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:Embodiments of the present invention also provide a storage medium. Optionally, in the embodiment, the foregoing storage medium may be configured to store program code for performing the following steps:
S1,在第一预设时间段内检测视频的视频质量;S1: detecting a video quality of the video in the first preset time period;
S2,判断视频质量是否始终低于预设视频质量阈值;S2, determining whether the video quality is always lower than a preset video quality threshold;
S3,在判断结果为是的情况下,将视频转换为音频。S3, in the case where the judgment result is YES, the video is converted into audio.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment, the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory. A variety of media that can store program code, such as a disc or a disc.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。 For example, the specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the optional embodiments, and details are not described herein again.
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
工业实用性Industrial applicability
在本发明实施例的音视频转换过程中,由于持续检测一段时间之内的视频质量是否始终低于预设视频质量阈值,并在视频质量始终低于预设视频质量阈值的情况下,执行视频到音频的切换,解决了相关技术中由于网络堵塞、延时或者抖动引起的视频质量较差、用户体验降低的问题。本发明在不启动FEC编解码和IP升降速的前提下,实现了音视频切换,有效改善了视频会议通话中的用户体验。 In the audio and video conversion process of the embodiment of the present invention, since the video quality within a period of time is continuously detected to be lower than the preset video quality threshold, and the video quality is always lower than the preset video quality threshold, the video is executed. The switching to the audio solves the problem that the video quality is poor and the user experience is reduced due to network congestion, delay or jitter in the related art. The invention realizes the audio and video switching without starting the FEC codec and the IP lifting speed, and effectively improves the user experience in the video conference call.

Claims (10)

  1. 一种音视频转换方法,包括:An audio and video conversion method includes:
    在第一预设时间段内检测视频的视频质量;Detecting the video quality of the video during the first preset time period;
    判断所述视频质量是否始终低于预设视频质量阈值;Determining whether the video quality is always lower than a preset video quality threshold;
    在判断结果为是的情况下,将所述视频转换为音频。In the case where the judgment result is YES, the video is converted into audio.
  2. 根据权利要求1所述的方法,其中,将所述视频转换为所述音频包括:The method of claim 1 wherein converting the video to the audio comprises:
    检测所述视频的图像质量;Detecting image quality of the video;
    判断所述图像质量是否低于预设图像质量阈值;Determining whether the image quality is lower than a preset image quality threshold;
    在所述图像质量低于所述预设图像质量阈值的情况下,将所述视频转换为所述音频。The video is converted to the audio if the image quality is lower than the preset image quality threshold.
  3. 根据权利要求1所述的方法,其中,判断所述视频质量是否始终低于预设视频质量阈值包括:The method of claim 1, wherein determining whether the video quality is always lower than a preset video quality threshold comprises:
    判断所述视频的视频丢包率是否始终不低于预设视频丢包率阈值;Determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold;
    在判断结果为所述视频的视频丢包率始终不低于所述预设视频丢包率阈值的情况下,确定所述视频质量始终低于所述预设视频质量阈值。And determining, in a case that the video packet loss rate of the video is not lower than the preset video packet loss rate threshold, determining that the video quality is always lower than the preset video quality threshold.
  4. 根据权利要求3所述的方法,其中,在判断所述视频的视频丢包率是否始终不低于预设视频丢包率阈值之前,通过以下方式确定所述预设视频丢包率阈值:The method according to claim 3, wherein before determining whether the video packet loss rate of the video is not lower than a preset video packet loss rate threshold, the preset video packet loss rate threshold is determined by:
    建立丢包率与用于表征视频质量的峰值信噪比PSNR之间的映射关系; Establishing a mapping relationship between a packet loss rate and a peak signal to noise ratio (PSNR) used to characterize video quality;
    根据所述映射关系,确定预设PSNR阈值对应的所述预设视频丢包率阈值。And determining, according to the mapping relationship, the preset video packet loss rate threshold corresponding to the preset PSNR threshold.
  5. 根据权利要求3所述的方法,其中,通过以下方式确定所述视频丢包率:The method of claim 3, wherein the video packet loss rate is determined by:
    获取所述视频的实时传输控制协议RTCP数据包中携带的第一信息,其中,所述第一信息包括所述视频的输入报文和输出报文的信息;Acquiring the first information carried in the real-time transmission control protocol RTCP data packet of the video, where the first information includes information of an input message and an output message of the video;
    根据所述视频的输入报文和输出报文计算所述视频丢包率。And calculating the video packet loss rate according to the input message and the output message of the video.
  6. 根据权利要求1所述的方法,其中,在将视频转换为音频之后,还包括:The method of claim 1 further comprising: after converting the video to audio, further comprising:
    在第二预设时间段内检测所述音频的音频质量;Detecting an audio quality of the audio during a second preset time period;
    判断所述音频质量是否始终不低于预设音频质量阈值;Determining whether the audio quality is always not lower than a preset audio quality threshold;
    在判断结果为所述音频质量始终不低于所述预设音频质量阈值的情况下,将所述音频转换为所述视频。In the case where the result of the determination is that the audio quality is always not lower than the preset audio quality threshold, the audio is converted into the video.
  7. 根据权利要求6所述的方法,其中,判断所述音频质量是否始终不低于预设音频质量阈值包括:The method of claim 6, wherein determining whether the audio quality is always not lower than a preset audio quality threshold comprises:
    判断所述音频的音频丢包率是否始终低于预设音频丢包率阈值;Determining whether the audio packet loss rate of the audio is always lower than a preset audio packet loss rate threshold;
    在判断结果为所述音频的音频丢包率始终低于所述预设音频丢包率阈值的情况下,确定所述音频质量始终不低于所述预设音频质量阈值。In a case where the result of the judgment is that the audio packet loss rate of the audio is always lower than the preset audio packet loss rate threshold, it is determined that the audio quality is always not lower than the preset audio quality threshold.
  8. 根据权利要求7所述的方法,其中,通过以下方式确定所述音频丢包率:The method of claim 7, wherein the audio packet loss rate is determined by:
    获取所述音频的实时传输控制协议RTCP数据包中的第二信息,其中,所述第二信息包括所述音频的输入报文和输出报文的信息; Obtaining second information in a real-time transmission control protocol RTCP data packet of the audio, where the second information includes information of an input message and an output message of the audio;
    根据所述音频的输入报文和输出报文计算所述音频丢包率。And calculating the audio packet loss rate according to the input message and the output message of the audio.
  9. 一种音视频转换装置,包括:An audio and video conversion device includes:
    检测模块,设置为在第一预设时间段内检测视频的视频质量;a detecting module, configured to detect a video quality of the video within a first preset time period;
    判断模块,设置为判断所述视频质量是否始终低于预设视频质量阈值;a determining module, configured to determine whether the video quality is always lower than a preset video quality threshold;
    转换模块,设置为在判断结果为是的情况下,将所述视频转换为音频。The conversion module is configured to convert the video into audio if the determination result is YES.
  10. 根据权利要求9所述的装置,其中,所述转换模块包括:The apparatus of claim 9, wherein the conversion module comprises:
    检测单元,设置为检测所述视频的图像质量;a detecting unit configured to detect an image quality of the video;
    判断单元,设置为判断所述图像质量是否低于预设图像质量阈值;a determining unit, configured to determine whether the image quality is lower than a preset image quality threshold;
    转换单元,设置为在所述图像质量低于所述预设图像质量阈值的情况下,将所述视频转换为所述音频。 And a conversion unit configured to convert the video into the audio if the image quality is lower than the preset image quality threshold.
PCT/CN2017/080416 2016-04-28 2017-04-13 Audio and video conversion method and device WO2017185995A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610284697.9 2016-04-28
CN201610284697.9A CN107333091A (en) 2016-04-28 2016-04-28 Audio-video conversion method and device

Publications (1)

Publication Number Publication Date
WO2017185995A1 true WO2017185995A1 (en) 2017-11-02

Family

ID=60161855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/080416 WO2017185995A1 (en) 2016-04-28 2017-04-13 Audio and video conversion method and device

Country Status (2)

Country Link
CN (1) CN107333091A (en)
WO (1) WO2017185995A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112653865A (en) * 2020-12-24 2021-04-13 维沃移动通信有限公司 Video call processing method and device and electronic equipment
CN114885391A (en) * 2022-06-16 2022-08-09 锐迪科微电子科技(天津)有限公司 Network call control method and device
CN115250375A (en) * 2021-04-26 2022-10-28 北京中关村科金技术有限公司 Method and device for detecting audio and video content compliance based on fixed telephone technology

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110035251B (en) * 2019-04-18 2021-04-06 电科云(北京)科技有限公司 Method for realizing code rate control processing based on video conference server
CN112908346B (en) * 2019-11-19 2023-04-25 ***通信集团山东有限公司 Packet loss recovery method and device, electronic equipment and computer readable storage medium
CN111083572A (en) * 2019-11-28 2020-04-28 武汉兴图新科电子股份有限公司 Method for dynamically adjusting media transmission based on multi-platform network monitoring
CN110996103A (en) * 2019-12-12 2020-04-10 杭州叙简科技股份有限公司 Method for adjusting video coding rate according to network condition
CN110944013A (en) * 2019-12-17 2020-03-31 腾讯科技(深圳)有限公司 Network session switching method and device, computer equipment and storage medium
CN111391784B (en) * 2020-03-13 2022-05-17 Oppo广东移动通信有限公司 Information prompting method and device, storage medium and related equipment
CN115842919B (en) * 2023-02-21 2023-05-09 四川九强通信科技有限公司 Video low-delay transmission method based on hardware acceleration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101668310A (en) * 2009-09-24 2010-03-10 中兴通讯股份有限公司 Method for broadcasting stream media and device therefor
CN101834879A (en) * 2010-02-09 2010-09-15 北京中科大洋科技发展股份有限公司 Intelligent efficient video/audio data transmission method adapted to different network environments
CN103118238A (en) * 2011-11-17 2013-05-22 中国电信股份有限公司 Controlling method of video conference and video conference system
WO2014194622A1 (en) * 2013-06-04 2014-12-11 Tencent Technology (Shenzhen) Company Limited System and method for data transmission
CN105323529A (en) * 2015-10-19 2016-02-10 掌赢信息科技(上海)有限公司 Method for switching between audio call and video call, and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101668310A (en) * 2009-09-24 2010-03-10 中兴通讯股份有限公司 Method for broadcasting stream media and device therefor
CN101834879A (en) * 2010-02-09 2010-09-15 北京中科大洋科技发展股份有限公司 Intelligent efficient video/audio data transmission method adapted to different network environments
CN103118238A (en) * 2011-11-17 2013-05-22 中国电信股份有限公司 Controlling method of video conference and video conference system
WO2014194622A1 (en) * 2013-06-04 2014-12-11 Tencent Technology (Shenzhen) Company Limited System and method for data transmission
CN105323529A (en) * 2015-10-19 2016-02-10 掌赢信息科技(上海)有限公司 Method for switching between audio call and video call, and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112653865A (en) * 2020-12-24 2021-04-13 维沃移动通信有限公司 Video call processing method and device and electronic equipment
WO2022135291A1 (en) * 2020-12-24 2022-06-30 维沃移动通信有限公司 Video call processing method and apparatus, and electronic device
CN115250375A (en) * 2021-04-26 2022-10-28 北京中关村科金技术有限公司 Method and device for detecting audio and video content compliance based on fixed telephone technology
CN115250375B (en) * 2021-04-26 2024-01-26 北京中关村科金技术有限公司 Audio and video content compliance detection method and device based on fixed telephone technology
CN114885391A (en) * 2022-06-16 2022-08-09 锐迪科微电子科技(天津)有限公司 Network call control method and device

Also Published As

Publication number Publication date
CN107333091A (en) 2017-11-07

Similar Documents

Publication Publication Date Title
WO2017185995A1 (en) Audio and video conversion method and device
CN108141443B (en) User equipment, media stream transmission network auxiliary node and media stream transmission method
US7965650B2 (en) Method and system for quality monitoring of media over internet protocol (MOIP)
JP4986243B2 (en) Transmitting apparatus, method and program for controlling number of layers of media stream
US11223669B2 (en) In-service quality monitoring system with intelligent retransmission and interpolation
US9602376B2 (en) Detection of periodic impairments in media streams
US10652138B2 (en) Link decision-making method and decision-making device
EP1892920B1 (en) Monitoring system and method for trunk gateway
EP3429124B1 (en) Optimizing video-call quality of service
US9600355B2 (en) Redundant encoding
US9571424B2 (en) Method and apparatus for compensating for voice packet loss
WO2014029291A1 (en) Method, network element and system for evaluating voice quality
US8873590B2 (en) Apparatus and method for correcting jitter
US8184546B2 (en) Endpoint device configured to permit user reporting of quality problems in a communication network
US10440087B2 (en) Estimation of losses in a video stream
CN108269589B (en) Voice quality evaluation method and device for call
JP2011228823A (en) Packet loss rate estimating device, packet loss rate estimating method, packet loss rate estimating program, and communication system
US7848243B2 (en) Method and system for estimating modem and fax performance over packet networks
WO2021047763A1 (en) Transmission of a representation of a speech signal
US20240214844A1 (en) Wireless Communication Network Voice Quality Monitoring
JP2008167223A (en) Communication quality control method and packet communication system
CN109076400A (en) Method and apparatus for determining the encoding/decoding mode collection of service communication
WO2022238729A1 (en) Wireless communication network voice quality monitoring
EP3182647A1 (en) A method to perform an out-of-call network quality conditions estimation and computer programs products thereof

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17788647

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17788647

Country of ref document: EP

Kind code of ref document: A1