CN113613025A - Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting - Google Patents

Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting Download PDF

Info

Publication number
CN113613025A
CN113613025A CN202010369348.3A CN202010369348A CN113613025A CN 113613025 A CN113613025 A CN 113613025A CN 202010369348 A CN202010369348 A CN 202010369348A CN 113613025 A CN113613025 A CN 113613025A
Authority
CN
China
Prior art keywords
real
sound
data
time
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010369348.3A
Other languages
Chinese (zh)
Inventor
高爱平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Wenhui Technology Co ltd
Original Assignee
Anhui Wenhui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Wenhui Technology Co ltd filed Critical Anhui Wenhui Technology Co ltd
Priority to CN202010369348.3A priority Critical patent/CN113613025A/en
Publication of CN113613025A publication Critical patent/CN113613025A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/441Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
    • H04N21/4415Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card using biometric characteristics of the user, e.g. by voice recognition or fingerprint scanning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a method for real-time voice conversion caption data synchronous processing and picture synthesis live broadcasting, belonging to the technical field of image processing, and solving the problem of synchronous output of sound picture captions, wherein the method comprises the following steps: the first pickup collects the real-time sound on site and transmits the real-time sound to the first camera and the data processing host, and the real-time image data collected by the first camera and the sound picture transmitted by the pickup are synchronously synthesized into the streaming media data which are live broadcast in real time and transmitted to the data processing host; the data processing host separates the sound and the video of the acquired streaming media data, and identifies and converts the real-time sound data of the first sound pickup collected on site into text subtitles in real time; the data processing host directly and synchronously superimposes, synthesizes and outputs the character subtitles and the live broadcast pictures, and the synchronous live broadcast pictures containing the character subtitles are output through signals of HDMI/VGA/SDI, RTMP streaming media and the like.

Description

Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method and a device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting.
Background
Under the fun of network live broadcast forms such as news program live broadcast, network live broadcast, fine course live broadcast public class and the like, in order to enable deaf-mute, foreign friends and participants to more intuitively know the capacity of the lecturer, the speech of the on-site real-time lecturer needs to be converted into displayable subtitles and real-time pictures for superposition output.
In order to achieve the above live broadcast effect, it is necessary to transmit the sound and the picture to the data processing host device through the sound pickup and the camera in real time, synchronize and convert the sound and the picture through the data processing host, and then superimpose and synthesize the text subtitle and the synchronized picture to output the final synchronized live broadcast video containing the text subtitle.
Disclosure of Invention
The invention provides a method and a device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcast, which aim to solve the problems.
The invention provides a method for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting. The method comprises the following steps: the first pickup collects on-site real-time sound and transmits the on-site real-time sound to the first camera and the data processing host, and the first camera collects on-site real-time image data and sound transmitted by the pickup; synchronously processing and synthesizing the sound and the picture into real-time live streaming media data in a video coding and decoding mode, and transmitting the generated data to a data processing host through network signals or entity wires such as HDMI/SDI/VGA and the like; after the data are transmitted to the data processing host, the data processing host separates the sound and the video of the acquired streaming media data according to specific decoding, connects the real-time sound data acquired by the first sound pickup on site to a voice recognition engine in real time, and recognizes and converts the real-time sound data into character subtitles in various styles to display the subtitles on a screen; the method comprises the steps that the sound obtained by a data processing host after streaming media data separation is subjected to coding comparison synchronous processing with the live real-time sound collected by a first sound pick-up, video data are directly coded and synchronized with the preprocessed text subtitle information through a processing timestamp, pictures synchronized with the text subtitles are directly output through the data processing host, the preprocessed text subtitles are adjusted to be superposed with the output pictures synchronized with the text subtitles, the live pictures synchronously containing the text subtitles are synthesized, the synthesized pictures are coded into streaming media data again, and signals such as HDMI/VGA/SDI and RTMP streaming media are output. The invention also provides a method and a device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting, wherein the device comprises the following steps: the method comprises the following steps: the system comprises a sound pick-up, a camera and data processing host equipment, wherein the sound pick-up is connected with the data processing host equipment, and the camera is connected with the data processing host equipment; the sound pick-up is used for collecting real-time field sound; the camera is used for acquiring a field real-time image; the data processing host equipment is used for carrying out decoding processing, comparison synchronization processing, synthesis coding output processing and the like on data transmitted by the sound pick-up and the camera; if the on-site pickup collects on-site real-time sound and transmits the on-site real-time sound to the data processing host device and simultaneously transmits the on-site real-time sound to the camera, the camera collects on-site real-time images and synthesizes the on-site real-time images and the sound to the data processing host device, the data processing host device can convert the sound into character subtitles and synthesize the character subtitles and real-time live pictures in a synchronous and superposed mode, and outputs video pictures with synchronous subtitles, and the pictures can be output through HDMI/VGA/SDI signals or RTMP streaming media signals.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart illustrating a method for real-time voice conversion and caption data synchronization and frame composition live broadcasting according to a preferred embodiment of the present invention;
fig. 2 is a block diagram of an apparatus for real-time voice conversion subtitle data synchronization processing and frame composition live broadcasting according to a preferred embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Fig. 1 is a flowchart illustrating a method for real-time voice conversion subtitle data synchronization processing and picture composition live broadcasting according to a preferred embodiment of the present invention. As shown in fig. 1, the method for real-time voice conversion subtitle data synchronization processing and picture composition live broadcasting according to the preferred embodiment of the present invention includes steps 101-109.
Step 101: the first sound pickup collects real-time field sound;
the first microphone is fixed to the field area and collects real-time sounds of the field.
Step 102: transmitting to a first camera and a data processing host;
real-time sound collected by a first sound pickup in a field area is transmitted to a first camera and a data processing host.
Step 103: the first camera collects on-site real-time image data and sound transmitted by the first sound pickup;
when the camera collects real-time image data of the field area, real-time sound transmitted from the first sound pick-up is transmitted.
Step 104: synchronously processing and synthesizing the sound and the picture into real-time live streaming media data according to a video coding and decoding mode, and transmitting the real-time live streaming media data to a data processing host;
when the first camera obtains the real-time image data and the real-time sound data, the real-time image data and the real-time sound data are subjected to video coding to form a video with audio, and the video is synchronously processed into streaming media data and transmitted to the data processing host.
105, separating the sound and the video according to the specific decoding by the data processing host;
when the data processing host receives the video streaming media data of the first camera, the data processing host can separate sound and video from the transmitted video streaming media data according to a specific decoding mode.
Step 106, connecting the real-time sound data collected by the first sound pickup on site to a voice recognition engine in real time, and recognizing and converting the real-time sound data into character subtitles with various styles for screen display;
the real-time voice data of scene that first adapter gathered connect speech recognition engine in real time, discern the voice data and convert into the text message, refresh the text message in real time and become the subtitle on the screen again and demonstrate.
Step 107, the timestamp processed by the data processing host directly encodes and synchronizes the video data with the preprocessed text subtitle information;
after the data processing host computer separates the voice and the video of the video streaming media data, the timestamp is used for synchronizing the text information, namely the preprocessed text, which is obtained by identifying and converting the video data and the voice data.
Step 108, the data processing host machine superposes the preprocessed character subtitles and the output synchronous pictures of the character subtitles to synthesize the superposed pictures into the synchronous live broadcast pictures containing the character subtitles;
after the video data and the text information are synchronized by the data processing host, the synchronized video data and the text information are subjected to picture superposition, and the picture layer superposition mode can be displayed in real time or synthesized and superposed by using a player picture layer mode, and meanwhile, the data processing host can also process in a data coding synthesis output mode.
Step 109, the data processing host machine encodes and outputs the synthesized picture;
after the synthesis processing is completed, the data processing host continues to encode the processed video and the superimposed subtitles, the encoded video can be pushed to an RTMP server, can also be played through a streaming media player at a network far end, and can be directly output and displayed through the HDMI, the VGA and the SDI of the data processing host.
The following examples are given for illustrative purposes:
in live broadcast activities, a set of data processing host 206, a camera 202 and a sound pickup 201 are deployed on site, the sound pickup 201 collects live real-time sound, the collected sound is synchronously transmitted to the camera 202, the camera 202 synchronously processes and synthesizes the live real-time image data and the sound transmitted by the sound pickup at the same time, the synthesized live broadcast streaming media data is transmitted to the data processing host 206, the data processing host 206 separates the sound and the video according to specific decoding, the sound pickup 201 collects the live real-time sound data and connects the live real-time sound data with a voice recognition engine in real time, and the live real-time sound data are recognized and converted into various styles of character subtitles to be displayed on a screen; the sound obtained by the data processing host 206 after separating the streaming media data is coded, compared and synchronously processed with the live real-time sound collected by the sound pickup 201, the video data is directly coded and synchronized with the preprocessed text subtitle information by the processed timestamp, the picture synchronized with the text subtitle is directly output by the data processing host 206, the preprocessed text subtitle is adjusted to be superposed with the picture synchronized with the output text subtitle to be synthesized into a live broadcast picture synchronously containing the text subtitle, the synthesized picture is coded into streaming media data again, and signals such as HDMI/VGA/SDI and RTMP streaming media are output. The invention also provides a method and a device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting, wherein the device comprises the following steps: the method comprises the following steps: the system comprises a sound pickup 201, a video camera 202 and a data processing host 206, wherein the sound pickup 201 is connected with the data processing host 206, and the video camera 202 is connected with the data processing host 206; the sound pickup 201 is used for collecting real-time field sound; the camera 202 is used for acquiring a live real-time image; the data processing host 206 is used for decoding, comparing and synchronizing, synthesizing and encoding output processing and the like of pictures and sounds 203 acquired in real time by data transmitted from the sound pickup 201 and the camera 202; if the on-site sound pickup 201 collects on-site real-time sounds and transmits the on-site real-time sounds to the data processing host 206 device and the camera 202 collects on-site real-time images and sounds and synthesizes the on-site real-time images and sounds to the data processing host 206 device, the data processing host 206 device can convert the sounds into text subtitles and synchronize and superimpose and synthesize the text subtitles and real-time live broadcast pictures, and output video pictures with synchronized subtitles, i.e., synthesized and output pictures and sounds 204, and the synthesized and output pictures and sounds 204 can be output to the display 207 or other display and play devices 208 through HDMI/VGA/SDI signals or RTMP streaming media signals.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcast method is characterized by comprising the following steps:
the first pickup collects on-site real-time sound and transmits the on-site real-time sound to the first camera and the data processing host, and the first camera collects on-site real-time image data and sound transmitted by the pickup;
synchronously processing and synthesizing the sound and the picture into real-time live streaming media data in a video coding and decoding mode, and transmitting the generated data to a data processing host through network signals or entity wires such as HDMI/SDI/VGA and the like; after the data are transmitted to the data processing host, the data processing host separates the sound and the video of the acquired streaming media data according to specific decoding, connects the real-time sound data acquired by the first sound pickup on site to a voice recognition engine in real time, and recognizes and converts the real-time sound data into character subtitles in various styles to display the subtitles on a screen; the method comprises the steps that the sound obtained by a data processing host after streaming media data separation is subjected to coding comparison synchronous processing with the live real-time sound collected by a first sound pick-up, video data are directly coded and synchronized with the preprocessed text subtitle information through a processing timestamp, pictures synchronized with the text subtitles are directly output through the data processing host, the preprocessed text subtitles are adjusted to be superposed with the output pictures synchronized with the text subtitles, the live pictures synchronously containing the text subtitles are synthesized, the synthesized pictures are coded into streaming media data again, and signals such as HDMI/VGA/SDI and RTMP streaming media are output.
2. The method of claim 1, wherein the data processing host separates the audio and video of the acquired streaming media data, compares the separated audio with the live real-time audio data collected by the first microphone in real time to perform timestamp synchronization, and processes the live real-time audio data collected by the first microphone with the text-to-speech recognition engine to perform timestamp comparison synchronization.
3. The method as claimed in claim 1, wherein after the live sound and the picture are synchronously inputted, the preprocessed text subtitle is adjusted to be overlapped with the picture synchronously outputted by the text subtitle, and the data processing host collects and synthesizes the desktop of the overlapped picture in a coding software mode into the live broadcast picture synchronously containing the text subtitle.
4. The method as claimed in claim 1, wherein after the live sound and the image are input synchronously, the preprocessed text subtitle is adjusted to be overlapped with the output image with the synchronous text subtitle, and the data processing host collects and synthesizes the desktop of the overlapped image into the live broadcast image with the synchronous text subtitle in a hardware collection card mode.
5. The method as claimed in claim 1, wherein after the live sound and the picture are synchronously inputted, the preprocessed text subtitle is adjusted to be overlapped with the outputted picture synchronized with the text subtitle, and the data processing host collects and synthesizes the desktop of the overlapped picture by a hardware collecting encoder mode into the live broadcast picture synchronously containing the text subtitle.
6. An apparatus for real-time voice conversion and subtitle data synchronization processing and picture composition live broadcasting, comprising: the system comprises a sound pick-up, a camera and data processing host equipment, wherein the sound pick-up is connected with the data processing host equipment, and the camera is connected with the data processing host equipment; the pickup is used for collecting on-site real-time sound, the camera is used for collecting on-site real-time images, the data processing host equipment is used for decoding, comparing, synchronizing, synthesizing, encoding and outputting the data transmitted by the pickup and the camera, if the on-site pickup collects the on-site real-time sound, the on-site real-time sound is transmitted to the data processing host equipment and is simultaneously transmitted to the camera, the camera collects the on-site real-time images and the sound, the data processing host equipment can convert the sound into text subtitles and synchronously superimpose and synthesize the real-time live pictures, and the video pictures with the synchronized subtitles are output and can be output through HDMI/VGA/SDI signals or RTMP streaming media signals.
CN202010369348.3A 2020-05-05 2020-05-05 Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting Pending CN113613025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010369348.3A CN113613025A (en) 2020-05-05 2020-05-05 Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010369348.3A CN113613025A (en) 2020-05-05 2020-05-05 Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting

Publications (1)

Publication Number Publication Date
CN113613025A true CN113613025A (en) 2021-11-05

Family

ID=78303134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010369348.3A Pending CN113613025A (en) 2020-05-05 2020-05-05 Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting

Country Status (1)

Country Link
CN (1) CN113613025A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106340294A (en) * 2016-09-29 2017-01-18 安徽声讯信息技术有限公司 Synchronous translation-based news live streaming subtitle on-line production system
CN107690089A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Data processing method, live broadcasting method and device
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107690089A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Data processing method, live broadcasting method and device
CN106340294A (en) * 2016-09-29 2017-01-18 安徽声讯信息技术有限公司 Synchronous translation-based news live streaming subtitle on-line production system
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US7623176B2 (en) Meta-data display system, meta-data synthesis apparatus, video-signal recording/reproduction apparatus, imaging apparatus and meta-data display method
WO2019205872A1 (en) Video stream processing method and apparatus, computer device and storage medium
CN102006453B (en) Superposition method and device for auxiliary information of video signals
US5677739A (en) System and method for providing described television services
KR100965471B1 (en) Captioned still image content creating device, captioned still image content creating program and captioned still image content creating system
US5900908A (en) System and method for providing described television services
US7859561B2 (en) Method and system for video conference
EP3319344A1 (en) Method and apparatus for generating and playing audio signals, and system for processing audio signals
CN106791913A (en) Digital television program simultaneous interpretation output intent and system
CN110691204B (en) Audio and video processing method and device, electronic equipment and storage medium
KR101899588B1 (en) System for automatically generating a sign language animation data, broadcasting system using the same and broadcasting method
JPH09130750A (en) Video-audio data supply device
WO2007064159A1 (en) Method for providing 3d contents service based on digital broadcasting
US20050002648A1 (en) Video-recording system, meta-data addition apparatus, imaging apparatus, video-signal recording apparatus, video-recording method, and meta-data format
KR20150021258A (en) Display apparatus and control method thereof
JP2011091619A (en) Transmitting apparatus, transmitting method, receiving apparatus, receiving method, program, and broadcasting system
CN109379619B (en) Sound and picture synchronization method and device
JP2012109901A (en) Data presentation device
US20060106597A1 (en) System and method for low bit-rate compression of combined speech and music
JP4391477B2 (en) Image transmission apparatus and image transmission system
JP2000322077A (en) Television device
CN113613025A (en) Method and device for real-time voice conversion subtitle data synchronous processing and picture synthesis live broadcasting
KR102160117B1 (en) a real-time broadcast content generating system for disabled
JPH1141538A (en) Voice recognition character display device
JPH1013809A (en) Vod system and vod terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211105