CN114125331A

CN114125331A - Subtitle adding system

Info

Publication number: CN114125331A
Application number: CN202111331955.1A
Authority: CN
Inventors: 刘坚; 李秋平; 王明轩
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-03-01

Abstract

The embodiment of the disclosure discloses a caption adding system, which comprises: the system comprises a first service end, a second service end and a main control end; the first server is in communication connection with the main control end and used for acquiring a live video stream; the main control end is in communication connection with the second server end and is used for acquiring an audio stream in a live video stream from the first server end, performing voice recognition on the audio stream, acquiring a plurality of first subtitles and sending the plurality of first subtitles to the second server end; the second service end is in communication connection with the first service end and used for obtaining the live video stream from the first service end, and adding the first subtitles to corresponding video frames in the live video stream respectively based on the timestamps of the first subtitles to obtain the live video stream comprising the first subtitles. The aim of adding subtitles in the live video stream is achieved.

Description

Subtitle adding system

Technical Field

The disclosure relates to the technical field of information, in particular to a subtitle adding system.

Background

With the continuous development of video live broadcast technology, the demand of users for live video streaming is also increasing.

In order to improve user experience, the subtitles can be distributed to the live video stream, and then the live video stream added with the subtitles is sent to a user terminal to be played.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a subtitle adding system, which achieves the purpose of adding subtitles in a live video stream.

The embodiment of the present disclosure provides a subtitle adding system, including:

the system comprises a first service end, a second service end and a main control end;

the first server is in communication connection with the main control end and used for acquiring a live video stream;

the main control end is in communication connection with the second server end and is used for acquiring an audio stream in the live video stream from the first server end, performing voice recognition on the audio stream, acquiring a plurality of first subtitles and sending the plurality of first subtitles to the second server end;

the second server is in communication connection with the first server, and is configured to acquire the live video stream from the first server, and add the plurality of first subtitles to corresponding video frames in the live video stream based on timestamps of the first subtitles, so as to acquire a live video stream including the first subtitles.

An embodiment of the present disclosure further provides an electronic device, which includes:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a subtitle addition method.

The disclosed embodiments also provide a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements a subtitle adding method.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has at least the following advantages:

the title adding system provided by the embodiment of the disclosure acquires and stores a live video stream through a first server, extracts an audio stream in the live video stream from the first server through a main control end, performs voice recognition on the audio stream, acquires a plurality of first titles, the main control end further sends the first titles to a second server, and the second server is responsible for adding the first titles to the live video stream, thereby achieving the purpose of adding titles in the live video stream, and through respectively setting the main control end, the first server and the second server, the coupling between the system interiors can be reduced, the main control end can be enabled to be dedicated to audio stream extraction and voice recognition, the second server is added with titles, and higher real-time performance is ensured while stability is ensured.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic structural diagram of a subtitle adding system in an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of another subtitle adding system in an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of another subtitle adding system in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of another subtitle adding system in an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a playing frame of a live video stream added with subtitles according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of another subtitle adding system in an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of a subtitle adding method in an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Before explaining the subtitle adding scheme provided by the embodiment of the present disclosure, hardware devices and application scenarios related to the subtitle adding scheme are briefly introduced to facilitate better understanding of the subtitle adding scheme provided by the embodiment of the present disclosure.

The simultaneous transmission proofreading means that: the method comprises the steps of adding subtitles to audio and video contents with high real-time performance and then sending the audio and video contents to a user side so that a user can see audio and video pictures with the subtitles, firstly carrying out voice recognition on original audio by a machine in the subtitle adding process to obtain a first subtitle to be corrected, and then carrying out machine translation on the first subtitle to be corrected to obtain a second subtitle to be corrected (for example, the first subtitle is Chinese, and the second subtitle is corresponding English). And the original text proofreader proofreads the first subtitles, if an error is found, the manual modification is carried out, the translated text proofreader proofreads the second subtitles, and if an error is found, the manual modification is carried out. It is understood that the original proofreader and the translation proofreader may be the same person or different persons, and usually, in order to reduce the work intensity and improve the work efficiency and the proofreading accuracy, the original proofreader and the translation proofreader are different persons.

Wherein, the simultaneous transmission proofreading process comprises the following steps: the method comprises the steps that the co-transmission hardware equipment pulls an original video stream from a server or a client side and processes the original video stream (the processing comprises the steps of collecting audio in the original video stream, carrying out voice recognition on the audio to obtain a first subtitle to be corrected, translating the first subtitle to be corrected to obtain a second subtitle to be corrected), displaying the first subtitle and the second subtitle on a display interface, correcting the first subtitle by an original text corrector, carrying out manual modification if an error is found, correcting the second subtitle by a translation corrector, and carrying out manual modification if an error is found.

Fig. 1 is a schematic structural diagram of a subtitle adding system in an embodiment of the present disclosure, where the system is applied to a live scene and is used to add subtitles to a live video stream.

As shown in fig. 1, the system specifically includes: a first service end 110, a second service end 120 and a master control end 130.

The first server 110 is in communication connection with the main control terminal 130, and is configured to obtain a live video stream. Specifically, the live video stream may be pulled from another server storing the live video stream, or a live video stream pushed by another server storing the live video stream may be received; or directly obtain the live video stream from a live terminal (such as a mobile phone, a tablet computer and the like of a main broadcast).

The main control end 130 is communicatively connected to the second service end 120, and is configured to obtain an audio stream in the live video stream from the first service end 110, perform voice recognition on the audio stream, obtain a plurality of first subtitles, and send the plurality of first subtitles to the second service end 120.

The second server 120 is in communication connection with the first server 110, and configured to obtain the live video stream from the first server, and add the plurality of first subtitles to corresponding video frames in the live video stream based on timestamps of the first subtitles, to obtain a live video stream including the first subtitles. Therefore, the cinema-type subtitles displayed sentence by sentence are added into the live video stream, namely, all the subtitles of each sentence are displayed when the first character of the sentence is played, and the subtitles disappear when the last character of the sentence is played. Wherein the timestamp of the first subtitle refers to a timestamp of an audio frame containing the first subtitle.

On the basis of the foregoing embodiments, referring to the schematic structural diagram of a subtitle adding system as shown in fig. 2, the system includes a first service end 110, a second service end 120, a main control end 130, and a screen projection end 210.

The first server 110 is in communication connection with the main control terminal 130, and is configured to obtain a live video stream.

The main control end 130 is further communicatively connected to the screen projecting end 210, and the main control end 130 is configured to obtain an audio stream in the live video stream from the first service end 110, perform voice recognition on the audio stream, obtain a plurality of first subtitles, and synchronize the plurality of first subtitles to the screen projecting end 210 for display.

The screen projection end 210 is configured to display the first subtitle and send a display picture including the first subtitle to the second server 120.

The second service end 120 is configured to obtain the live video stream from the first service end 110, and add the plurality of first subtitles to corresponding video frames in the live video stream based on timestamps of the first subtitles, to obtain a live video stream including the first subtitles. Specifically, the second server 120 is configured to perform compression synthesis on the display picture including the first subtitle and the corresponding video frame, so that the display picture and the corresponding video frame are played simultaneously, that is, when the video frame is played, the first subtitle corresponding to the video frame is synchronously displayed.

By additionally arranging the screen projecting end 210, the screen projecting end 210 is responsible for displaying the first subtitle and sending the display picture including the first subtitle to the second server 120, and compared with the situation that the main control end 130 directly sends the display picture including the first subtitle to the second server 120, the performance requirement on the main control end 130 can be reduced. The reason is that the video capture card is needed to capture the display image including the first caption, that is, if the main control terminal 130 directly sends the display image including the first caption to the second service terminal 120, the video capture card needs to be integrated in the main control terminal 130, and meanwhile, the video capture card also occupies system resources when capturing the display image, thereby affecting the performance of the main control terminal 130 in performing the voice recognition synchronously. Therefore, the screen projection end 210 is additionally arranged and is used for displaying the first caption and collecting the display picture comprising the first caption, so that the performance requirement on the main control end 130 can be reduced, the main control end is focused on audio extraction and voice recognition, the accuracy of the voice recognition is ensured, and the accurate first caption is obtained. On the other hand, if the main control terminal 130 directly sends the display picture including the first subtitle to the second service terminal 120, the distance between the main control terminal 130 and the second service terminal 120 cannot be too far away, otherwise, the real-time performance is greatly affected, and the method cannot be well applied to a live broadcast scene with high real-time performance; by additionally arranging the screen projection end 210, only network connection is needed between the main control end 130 and the screen projection end 210, and because the main control end 130 sends text information of the first subtitle to the screen projection end 210, the transmission of the text information is very easy compared with the transmission of video pictures, and higher real-time performance can be ensured. Therefore, the distance between the master 130 and the second server 120 is not limited any more, and a higher degree of decoupling is achieved.

The title adding system provided by the embodiment of the disclosure acquires and stores a live video stream through a first server, extracts an audio stream in the live video stream from the first server through a main control end, performs voice recognition on the audio stream to obtain a plurality of first titles, the main control end further sends the first titles to a screen projection end for display, the screen projection end displays the first titles, acquires a display picture comprising the first titles, and simultaneously sends the display picture to a second server, the second server performs compression synthesis processing on the display picture comprising the first titles and a corresponding video frame to obtain the live video stream comprising the first titles, so that the live video stream displays the first titles when being played. Thereby realized adding the purpose of subtitle in live broadcast video stream, and through setting up the main control end respectively, throw screen end, first service end and second service end, the coupling nature between the reducible system inside can make the main control end be absorbed in audio frequency stream and draw and speech recognition, throw the screen end and be absorbed in the picture collection, the second service end is absorbed in the subtitle and adds, guarantees higher real-time when guaranteeing stability.

On the basis of the above embodiments, referring to the schematic structural diagram of a subtitle adding system as shown in fig. 3, the system includes a first service end 110, a second service end 120, a main control end 130, a screen projection end 210, and a first display terminal 310.

The first display terminal 310 is communicatively connected to the main control terminal 130, and the first display terminal 310 is configured to: displaying a first user interface, the first user interface including the plurality of first subtitles; and responding to a first subtitle modification instruction, and modifying the first subtitle pointed by the first subtitle modification instruction. Specifically, the first display terminal 310 corresponds to a terminal device of an original text proofreader, and the original text proofreader proofreads the first subtitle through the first display terminal 310. Since the first subtitle is a subtitle obtained by the main control terminal 130 through speech recognition, the accuracy of the first subtitle is to be improved continuously. For example, the real text corresponding to the audio is "zhang san", and the result of the voice recognition is "zhang san", so that in order to improve the accuracy of the first subtitle, the first subtitle is manually corrected after the first subtitle is obtained, and is timely modified when an error is found, the first subtitle finally seen by the user at the viewing end can be guaranteed to be the correct subtitle, and the purpose of improving the user experience is achieved.

Further, the main control terminal 130 obtains an audio stream in the live video stream from the first service terminal 110 through an audio capture card. Through using the audio acquisition card to carry out external loop and receive voice, can monitor tone quality and the volume of the audio frequency of input in real time through hardware equipment under the line, compare in receiving voice through software built-in, audio fidelity is higher, occupies still less to the resource of system, helps promoting the stability of main control end 130. Meanwhile, the original text proofreader can listen to the audio collected by the audio collecting card and proofread the first subtitles at the same time, so that the proofreading efficiency and precision can be improved.

It can be understood that the first subtitle displayed at the screen projection end 210 and the first subtitle displayed at the first display terminal 310 are updated synchronously, that is, if an original proofreader modifies a certain first subtitle, the first subtitle displayed at the screen projection end 210 is the subtitle modified by the original proofreader. That is, the first subtitle displayed at the screen-casting end 210 is the corrected first subtitle, so that the first subtitle sent to the second service end 120 by the screen-casting end 210 is guaranteed to be the corrected subtitle.

Further, a plurality of first subtitles are displayed in the first user interface in a contextual manner. By displaying the plurality of first subtitles on the first user interface in the form of context, an original proofreader can proofread the first subtitles by combining longitudinal context information, so that the proofreading precision can be improved, and the proofreading efficiency can be improved.

In one embodiment, the language corresponding to the first subtitle is the same as the language corresponding to the audio stream. For example, if the language corresponding to the audio stream is chinese, the first subtitle is a chinese text, and if the language corresponding to the audio stream is english, the first subtitle is an english text.

In one embodiment, the language corresponding to the first subtitle is different from the language corresponding to the audio stream. For example, if the language corresponding to the audio stream is chinese, the first subtitle is an english text, and if the language corresponding to the audio stream is english, the first subtitle is a chinese text.

On the basis of the foregoing embodiments, referring to the schematic structural diagram of a subtitle adding system as shown in fig. 4, the system includes a first service end 110, a second service end 120, a main control end 130, a screen projection end 210, a first display terminal 310, and a second display terminal 410.

The first display terminal 310 is communicatively connected to the main control terminal 130, and the first display terminal 310 is configured to: displaying a first user interface, the first user interface including the plurality of first subtitles; and responding to a first subtitle modification instruction, and modifying the first subtitle pointed by the first subtitle modification instruction.

The second display terminal 410 is communicatively connected to the main control terminal 130, and the second display terminal 410 is configured to: displaying a second user interface, where the second user interface includes the first subtitle and a second subtitle corresponding to the first subtitle, and the second subtitle is obtained by performing machine translation on the main control end 130 based on the first subtitle; and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction. The second display terminal corresponds to the terminal equipment of the translation proofreader, and the translation proofreader can conveniently proofread the second subtitles. Since the second subtitle is a subtitle obtained by the main control terminal 130 performing machine translation based on the first subtitle, the accuracy of the second subtitle is to be improved continuously. In order to improve the accuracy of the second caption, the second caption is manually checked after the second caption is obtained, and the second caption is modified in time when an error is found so as to ensure that the second caption displayed at the watching end is a correct caption, thereby achieving the purpose of improving the user experience.

In some embodiments, in order to improve the efficiency of the translation proofreader in proofreading the second subtitle, the second subtitle and the first subtitle are in a transverse contrast relationship at the second user interface, so that the translation proofreader can proofread the second subtitle by referring to the first subtitle.

Further, the language of the second caption is different from the language of the first caption; and the language of the first caption is the same as the language corresponding to the audio stream. For example, the first caption is chinese and the second caption is english.

Illustratively, referring to a schematic diagram of a playing picture of a live video stream added with subtitles as shown in fig. 5, when the video picture is played, a first subtitle and a second subtitle are synchronously displayed.

On the basis of the foregoing embodiments, referring to the schematic structural diagram of a subtitle adding system as shown in fig. 6, the system includes a first service end 110, a second service end 120, a main control end 130, a screen projection end 210, a first display terminal 310, a second display terminal 410, a third service end 610, a fourth service end 620, and a fifth service end 630.

The screen projecting end 210 is configured to display the first subtitle and send a display picture including the first subtitle to the second server 120 according to a preset delay time.

The second server 120 is in communication connection with the first server 110 through the fourth server 620 and the third server 610, the second server 120 obtains the live video stream from the fourth server 620, the fourth server 620 obtains the live video stream from the third server 610, and the third server 610 obtains the live video stream from the first server 110. The second server 120 adds the plurality of first subtitles to corresponding video frames in the live video stream respectively based on the timestamps of the first subtitles, so as to obtain a live video stream including the first subtitles. Specifically, the second server 120 is configured to perform compression synthesis on the display picture including the first subtitle and the corresponding video frame, so that the display picture and the corresponding video frame are played simultaneously, that is, when the video frame is played, the first subtitle corresponding to the video frame is synchronously displayed.

The second display terminal 410 is communicatively connected to the main control terminal 130, and the second display terminal 410 is configured to: displaying a second user interface, where the second user interface includes the first subtitle and a second subtitle corresponding to the first subtitle, and the second subtitle is obtained by performing machine translation on the main control end 130 based on the first subtitle; and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction.

The third server 610 is in communication connection with the first server 610 and the fourth server 620, and configured to obtain the live video stream from the first server 610 and push the live video stream to the fourth server 620 according to a preset delay time.

The fifth server 630 is in communication connection with the second server 120, and is configured to store the live video stream including the first subtitle pushed by the second server 120, and push the live video stream including the first subtitle to the viewing end, so that the viewing end displays the first subtitle when playing the live video stream.

Specifically, since voice recognition, machine translation, proofreading of the first caption by the proof reader of the original text and proofreading of the second caption by the proof reader of the translated text all take some time, in order to ensure stability of adding captions in the live video stream, ensure that the added captions do not jump, facilitate reading by the user at the broadcasting end, and make the captions read more smoothly, the scheme of the embodiment of the present disclosure is implemented by setting the preset delay time. Namely, the time difference between the direct-broadcast receiving stream and the push stream is utilized to perform stable generation of the subtitles and the correction and optimization work of a human translator, so that high-quality subtitle output is ensured. Specifically, when the live video stream is acquired from the first service end 110, the third service end 610 delays for a preset time and then pushes the live video stream to the fourth service end 620; meanwhile, when receiving the first subtitle and the second subtitle, the screen projection terminal 210 delays the preset time and then displays the first subtitle and the second subtitle, or delays the preset time and then sends the picture including the first subtitle and the second subtitle to the second server terminal 120. The second server 120 performs a compression synthesis process on the received live video stream and the subtitle picture to obtain a live video stream including subtitles.

According to the technical scheme of the embodiment of the disclosure, the service operation of voice recognition and machine translation is more stable by decoupling the main control end, the translator end and the screen projection end. The main control end concentrates on sound collection and voice recognition; and the original translator monitors the audio by receiving sound through the external sound card loop, and then corrects the first caption to achieve the stability of audio transmission and identification. And the translator accesses the main control terminal through the network and the conference code based on the second display terminal, and proofreads the second caption according to the proofread first caption. And the screen projection end displays the finally corrected first subtitle and the second subtitle, and is also connected to the main control end in a conference code mode. The main control end and the translator end (the first display terminal and the second display terminal) are independent terminal equipment and are coupled with the live source flow in an independent push-pull flow mode, so that the aim of avoiding mutual interference is fulfilled. By using a pure software scheme (delaying the stream after the preset time) to realize accurate delay of the source stream and merging and pushing with the caption, the problems of access, transportation, stability and the like of excessive offline hardware can be solved.

In summary, referring to the schematic flow chart of a subtitle adding method shown in fig. 7, the method includes acquiring a live video stream, specifically, extracting an audio stream from the live video stream by a main control end, performing voice recognition on the audio stream to obtain a first subtitle, delaying according to a subtitle timestamp (one timestamp corresponds to each word) generated in the voice recognition process, and displaying the subtitle at a screen-casting end after delaying for a preset time; and pushing the live video stream after delaying for a preset time, and compressing and merging the subtitle picture and the live video stream at a specific time point and pushing the compressed and merged subtitle picture and the live video stream to a next node, for example, a fifth server.

Fig. 8 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure. Referring now specifically to fig. 8, a schematic diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 500 in the embodiments of the present disclosure may include, but is not limited to, mobile terminals such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), a wearable electronic device, and the like, and fixed terminals such as a digital TV, a desktop computer, a smart home device, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes to implement methods according to embodiments described in this disclosure in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 8 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart, thereby implementing the method as described above. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement a subtitle addition procedure.

Optionally, when the one or more programs are executed by the electronic device, the electronic device may further perform other steps described in the above embodiments.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a subtitle adding system including: the system comprises a first service end, a second service end and a main control end; the first server is in communication connection with the main control end and used for acquiring a live video stream; the main control end is in communication connection with the second server end and is used for acquiring an audio stream in the live video stream from the first server end, performing voice recognition on the audio stream, acquiring a plurality of first subtitles and sending the plurality of first subtitles to the second server end; the second server is in communication connection with the first server, and is configured to acquire the live video stream from the first server, and add the plurality of first subtitles to corresponding video frames in the live video stream based on timestamps of the first subtitles, so as to acquire a live video stream including the first subtitles.

According to one or more embodiments of the present disclosure, in the system provided by the present disclosure, optionally, the system further includes: the screen projecting end is in communication connection with the main control end and the second server end respectively; the main control end is in communication connection with the second server end, and sends the first subtitles to the second server end, including: the main control end is in communication connection with the second server end through the screen projection end and sends the first subtitles to the second server end through the screen projection end; the screen projection end is used for displaying the first caption and sending a display picture comprising the first caption to the second server end.

According to one or more embodiments of the present disclosure, in the system provided by the present disclosure, optionally, the system further includes a first display terminal in communication connection with the main control terminal, where the first display terminal is configured to: displaying a first user interface, the first user interface including the plurality of first subtitles; and responding to a first subtitle modification instruction, and modifying the first subtitle pointed by the first subtitle modification instruction.

According to one or more embodiments of the present disclosure, in the system provided by the present disclosure, optionally, the plurality of first subtitles are arranged in a contextual manner at the first user interface.

According to one or more embodiments of the present disclosure, in the subtitle adding system provided by the present disclosure, the main control end obtains an audio stream in the live video stream from the first service end through an audio acquisition card.

According to one or more embodiments of the present disclosure, in a subtitle adding system provided by the present disclosure, the subtitle adding system further includes a second display terminal in communication connection with the main control terminal, where the second display terminal is configured to: displaying a second user interface, wherein the second user interface comprises the first subtitle and a second subtitle corresponding to the first subtitle, and the second subtitle is obtained by the main control end through machine translation based on the first subtitle; and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction.

According to one or more embodiments of the present disclosure, in a subtitle adding system provided by the present disclosure, the second subtitle and the first subtitle are in a transverse contrasting relationship at the second user interface; the language of the second caption is different from the language of the first caption; and the language of the first caption is the same as the language corresponding to the audio stream.

According to one or more embodiments of the present disclosure, in the subtitle adding system provided by the present disclosure, the method further includes: a third server and a fourth server; the third server is in communication connection with the first server and the fourth server respectively, and is configured to acquire the live video stream from the first server and push the live video stream to the fourth server according to a preset delay time; the second server is in communication connection with the first server, and is configured to obtain the live video stream from the first server, including: the second server is in communication connection with the first server through the fourth server and the third server, and the second server obtains the live video stream from the fourth server.

According to one or more embodiments of the present disclosure, in the subtitle adding system provided by the present disclosure, the screen projecting end is configured to send a display picture including a first subtitle to the second server according to the preset delay time.

According to one or more embodiments of the present disclosure, in the subtitle adding system provided by the present disclosure, the method further includes: and the fifth server is in communication connection with the second server and used for storing the live video stream including the first subtitle pushed by the second server and pushing the live video stream including the first subtitle to the watching and broadcasting end, so that the first subtitle is displayed when the live video stream is played by the watching and broadcasting end.

In accordance with one or more embodiments of the present disclosure, there is provided an electronic device including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method as provided by the present disclosure.

According to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as provided by the present disclosure.

The disclosed embodiments also provide a computer program product comprising a computer program or instructions which, when executed by a processor, implement the method in the disclosed embodiments.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A subtitle adding system, comprising: the system comprises a first service end, a second service end and a main control end;

2. The system of claim 1, further comprising: the screen projecting end is in communication connection with the main control end and the second server end respectively;

the main control end is in communication connection with the second server end, and sends the first subtitles to the second server end, including:

the main control end is in communication connection with the second server end through the screen projection end and sends the first subtitles to the second server end through the screen projection end;

the screen projection end is used for displaying the first caption and sending a display picture comprising the first caption to the second server end.

3. The system of claim 2, further comprising a first display terminal communicatively connected to the main control terminal, wherein the first display terminal is configured to:

displaying a first user interface, the first user interface including the plurality of first subtitles;

and responding to a first subtitle modification instruction, and modifying the first subtitle pointed by the first subtitle modification instruction.

4. The system of claim 3, wherein the plurality of first subtitles are arranged in a contextual manner at the first user interface.

5. The system according to claim 3, wherein the master control end obtains an audio stream in the live video stream from the first server end through an audio capture card.

6. The system of claim 2, further comprising a second display terminal communicatively connected to the main control terminal, the second display terminal being configured to:

displaying a second user interface, wherein the second user interface comprises the first subtitle and a second subtitle corresponding to the first subtitle, and the second subtitle is obtained by the main control end through machine translation based on the first subtitle;

and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction.

7. The system of claim 6, wherein the second caption is horizontally cross-referenced to the first caption at the second user interface;

the language of the second caption is different from the language of the first caption;

and the language of the first caption is the same as the language corresponding to the audio stream.

8. The system of claim 2, further comprising: a third server and a fourth server;

the third server is in communication connection with the first server and the fourth server respectively, and is configured to acquire the live video stream from the first server and push the live video stream to the fourth server according to a preset delay time;

the second server is in communication connection with the first server, and is configured to obtain the live video stream from the first server, including:

the second server is in communication connection with the first server through the fourth server and the third server, and the second server obtains the live video stream from the fourth server.

9. The system of claim 8, wherein the screen projecting end is configured to send a display frame including a first subtitle to the second server according to the preset delay time.

10. The system of claim 8, further comprising: and the fifth server is in communication connection with the second server and used for storing the live video stream including the first subtitle pushed by the second server and pushing the live video stream including the first subtitle to the watching and broadcasting end, so that the first subtitle is displayed when the live video stream is played by the watching and broadcasting end.