WO2021062757A1 - Simultaneous interpretation method and apparatus, and server and storage medium - Google Patents
Simultaneous interpretation method and apparatus, and server and storage medium Download PDFInfo
- Publication number
- WO2021062757A1 WO2021062757A1 PCT/CN2019/109677 CN2019109677W WO2021062757A1 WO 2021062757 A1 WO2021062757 A1 WO 2021062757A1 CN 2019109677 W CN2019109677 W CN 2019109677W WO 2021062757 A1 WO2021062757 A1 WO 2021062757A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- voice data
- translated text
- language
- text
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
Definitions
- This application relates to simultaneous interpretation technology, in particular to a simultaneous interpretation method, device, server and storage medium.
- Machine simultaneous interpretation technology is a speech translation product for conference scenes that has appeared in recent years. It combines automatic speech recognition (ASR, Automatic Speech Recognition) technology and machine translation (MT, Machine Translation) technology to provide speech content for conference speakers Provide multilingual subtitle display, instead of manual simultaneous interpretation service.
- ASR Automatic Speech Recognition
- MT Machine Translation
- the speech content is usually translated and displayed through text, but the displayed content cannot enable users to truly understand the content of the speech.
- embodiments of the present application provide a simultaneous interpretation method, device, server and storage medium.
- the embodiment of the application provides a simultaneous interpretation method, which is applied to a server, and includes:
- the first translated text generate second voice data; and perform at least one of the following:
- the first image data includes at least a display document corresponding to the first voice data;
- the language corresponding to the first voice data is different from the language corresponding to the typeset document; the language corresponding to the first voice data is different from the language corresponding to the first translated text; the language corresponding to the first voice data is different The language corresponding to the second voice data; the language of the text displayed in the first image data is different from the language of the text included in the image processing result; the first translated text, typeset document, second voice data, and The image processing result is used for presentation on the client when the first voice data is played.
- the embodiment of the present application also provides a simultaneous interpretation device, including:
- An obtaining unit configured to obtain the first to-be-processed data
- the first processing unit is configured to translate the first voice data in the first to-be-processed data to obtain a first translated text
- the second processing unit is configured to generate second voice data according to the first translated text
- the third processing unit is configured to perform at least one of the following:
- the first image data includes at least a display document corresponding to the first voice data;
- the language corresponding to the first voice data is different from the language corresponding to the typeset document; the language corresponding to the first voice data is different from the language corresponding to the first translated text; the language corresponding to the first voice data is different The language corresponding to the second voice data; the language of the text displayed in the first image data is different from the language of the text included in the image processing result; the first translated text, typeset document, second voice data, and The image processing result is used for presentation on the client when the first voice data is played.
- the embodiment of the present application further provides a server, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
- the processor implements the steps of any of the above simultaneous interpretation methods when the program is executed. .
- the embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any of the aforementioned simultaneous interpretation methods are implemented.
- the simultaneous interpretation method, device, server, and storage medium provided in the embodiments of this application obtain the first to-be-processed data; translate the first voice data in the first to-be-processed data to obtain the first translated text;
- the first translated text is used to generate second voice data; and at least one of the following is performed: according to the first voice data and the first translated text, a typeset document is obtained; Perform image word processing to obtain image processing results;
- the first image data includes at least a display document corresponding to the first voice data; wherein the language corresponding to the first voice data is different from the language corresponding to the typeset document
- the language corresponding to the first voice data is different from the language corresponding to the first translated text;
- the language corresponding to the first voice data is different from the language corresponding to the second voice data;
- the first image data is displayed
- the language of the text is different from the language of the text included in the image processing result;
- the first translated text, typeset document, second voice data, and image processing result are used for presentation
- Figure 1 is a schematic diagram of the system architecture of the application of simultaneous interpretation methods in related technologies
- FIG. 2 is a schematic flowchart of a simultaneous interpretation method according to an embodiment of the application
- FIG. 3 is a schematic diagram of a system architecture applied by the simultaneous interpretation method according to an embodiment of the application
- FIG. 4 is a schematic diagram of the composition structure of the simultaneous interpretation device according to an embodiment of the application.
- FIG. 5 is a schematic diagram of the composition structure of a server in an embodiment of the application.
- Figure 1 is a schematic diagram of the system architecture of the application of the simultaneous interpretation method in the related technology; as shown in Figure 1, the system may include: a machine simultaneous interpretation server, a speech recognition server, a translation server, a mobile terminal issuing server, and a viewer mobile terminal , PC (Personal Computer) client, display screen.
- a machine simultaneous interpretation server a speech recognition server
- a translation server a mobile terminal issuing server
- a viewer mobile terminal a viewer mobile terminal
- PC Personal Computer
- the lecturer can give a conference lecture through the PC client, and project the displayed documents, such as presentation software (PPT, PowerPoint) documents, to the display screen, and show them to the user through the display screen.
- the PC client collects the speaker’s audio and sends the collected audio to the machine simultaneous interpretation server.
- the machine simultaneous interpretation server recognizes the audio data through the voice recognition server to obtain the recognized text. Then translate the recognized text through the translation server to obtain the translation result; the machine simultaneous interpretation server sends the translation result to the PC client, and sends the translation result to the viewer's mobile terminal through the mobile terminal delivery server to display the translation for the user
- the speech content of the speaker can be translated into the language required by the user and displayed.
- the solutions in related technologies can display speech content (ie translation results) in different languages, but only perform simultaneous interpretation for the speaker’s oral content, without translating the document presented by the speaker, making it difficult for users of different languages to understand the content of the document.
- speech content ie translation results
- the current machine simultaneous interpretation technology is more of a visual display of text content. In the speech expression process of the speaker, the excessive display of text does not make the user understand the speech content well; the above problems lead to the user's sensory experience Bad.
- the speech content is translated to obtain the translation result (which may include translated speech and text), and the translation result is sorted (such as abstracting and typesetting) to obtain a typeset document, Abstract documents, to translate the displayed documents; send the translation results, sorted documents, and translated display documents to the audience's mobile terminal for display to help users understand the content of the speech, and it is also convenient for users to summarize and summarize the content of the speech.
- the translation result which may include translated speech and text
- the translation result is sorted (such as abstracting and typesetting) to obtain a typeset document, Abstract documents, to translate the displayed documents
- send the translation results, sorted documents, and translated display documents to the audience's mobile terminal for display to help users understand the content of the speech, and it is also convenient for users to summarize and summarize the content of the speech.
- FIG. 2 is a schematic flowchart of a simultaneous interpretation method according to an embodiment of the application; as shown in FIG. 2, the method includes:
- Step 201 Obtain first to-be-processed data.
- the first data to be processed includes: first voice data and first image data.
- the first image data includes at least a display document corresponding to the first voice data.
- the display document may be a Word document, a PPT document or a document in other forms, which is not limited here.
- the first voice data and first image data may be collected by the first terminal and sent to the server.
- the first terminal may be a mobile terminal such as a PC and a tablet computer.
- the first terminal may be provided with or connected to a voice collection module, such as a microphone, through which voice collection is performed to obtain first voice data.
- a voice collection module such as a microphone
- the first terminal may be provided with or connected with an image acquisition module (the image acquisition module can be implemented by a stereo camera, a binocular camera, or a structured light camera), and the image acquisition module can be used to display documents. Shooting to obtain the first image data.
- the first terminal may have a screenshot function, and the first terminal may take a screenshot of a document displayed on its display screen, and use the screenshot result as the first image data.
- the first terminal when the speaker is giving a speech, the first terminal (such as a PC) uses the voice collection module to collect the content of the speech to obtain the first voice data; the speaker displays the documents related to the content of the speech (such as PPT document), the first terminal uses the image acquisition module to capture the displayed PPT document or takes a screenshot of the PPT document on its own display screen to obtain the first image data.
- the voice collection module to collect the content of the speech to obtain the first voice data
- the speaker displays the documents related to the content of the speech (such as PPT document)
- the first terminal uses the image acquisition module to capture the displayed PPT document or takes a screenshot of the PPT document on its own display screen to obtain the first image data.
- a communication connection is established between the first terminal and the server.
- the first terminal sends the acquired first voice data and first image data as the first to-be-processed data to the server, and the server can acquire the first to-be-processed data.
- Step 202 Translate the first voice data in the first to-be-processed data to obtain a first translated text.
- the translating the first voice data in the first to-be-processed data to obtain the first translated text includes:
- the server may use voice recognition technology to perform voice recognition on the first voice data to obtain recognized text.
- the server may use a preset translation model to translate the recognized text to obtain the first translated text.
- the translation model is used to translate a text in a first language into at least one text in a second language; the first language is different from the second language.
- Step 203 Generate second voice data according to the first translated text; and perform at least one of the following:
- the language corresponding to the first voice data is different from the language corresponding to the typeset document; the language corresponding to the first voice data is different from the language corresponding to the first translated text; the language corresponding to the first voice data is different from the language corresponding to the first translated text; The language is different from the language corresponding to the second voice data; the language of the text displayed in the first image data is different from the language of the text included in the image processing result;
- the first translated text, typeset document, second voice data, and image processing result are used for presentation on the client when the first voice data is played.
- the first translated text, the typeset document, and the second voice data are used to send to the client, so as to display the content corresponding to the first voice data on the client when the first voice data is played.
- the image processing result is used to send to the client to display the content corresponding to the display document included in the first image data on the client when the first voice data is played.
- the server may not only use the above method, but also use a preset voice translation model to translate the first voice data, and obtain The second voice data corresponding to the first voice data is then subjected to voice recognition on the second voice data to obtain the first translated text.
- typesetting can be performed on the content of the first voice data to obtain a typeset document. Concise and clear layout of documents can help users intuitively read and understand. In addition, it is also convenient for the user to summarize and organize the content of the first voice data later.
- the determining the typeset document according to the first voice data and the first translated text includes:
- VAD Voice Activity Detection
- Typesetting the at least one paragraph to obtain the typeset document.
- the server may perform voice activity detection on the first voice data, determine the silence period in the first voice data, and record the silence duration of the silence period. When the silence duration meets the condition (for example, the silence duration exceeds a preset Duration), and use the determined silence period as a silence point in the first voice data.
- the server can check all the files according to the mute point.
- the first translated text corresponding to the first voice data is pre-segmented to obtain at least one pre-segmented paragraph; and the context corresponding to the silent point in the first translated text can be obtained, and natural language processing (NLP, Natural The Language Processing technology performs semantic analysis on the context, and determines whether to segment by pre-segmented paragraphs according to the semantic analysis results.
- NLP Natural The Language Processing technology
- the generating second voice data according to the first translated text includes:
- TTS Text To Speech
- segmenting the first translated text may include: performing semantic recognition on the first translated text, and segmenting the first translated text according to the semantic recognition result.
- a combination of voice activity detection technology and semantic recognition technology can also be used to perform segmentation. The specific description has been described in the above-mentioned determining the typeset document based on the first voice data and the first translated text, and will not be repeated here.
- the display document can be abstracted, which can help the user to summarize and summarize the content of the first voice data, so that the user can better understand the first voice data.
- the method may further include:
- Abstract extraction is performed on the first translated text to obtain a summary document for the first translated text; the summary document is used for presentation on the client when the first voice data is played.
- the NLP technology is used to perform automatic summary (Automatic Summarization) extraction on the first translated text to obtain a summary document for the first translated text.
- the performing image word processing on the first image data to obtain an image processing result includes:
- the image processing result is generated.
- optical character recognition OCR, Optical Character Recognition
- OCR optical character recognition
- the OCR technology is a technology that performs character recognition on an image to translate the characters in the image into text.
- Use interface positioning technology to determine the position corresponding to the text.
- the translating the extracted text includes: using a preset translation model to translate the text.
- the translation model is used to translate text in a first language into at least one text in a second language; the first language is different from the second language.
- the generating of the image processing result according to the translated text includes at least one of the following:
- the translated text is used to generate a second translated text, and the second translated text is used as the image processing result.
- the image processing result may include at least one of the following: second image data and second translated text.
- the simultaneous interpretation data obtained by using the first to-be-processed data corresponds to at least one language; the method may further include:
- the simultaneous interpretation data corresponding to at least one language is stored in different databases according to the language.
- the simultaneous interpretation data includes: a first translated text, a second voice data, and also includes at least one of a typeset document, an image processing result, and an abstract document.
- the simultaneous interpretation data corresponding to at least one language can be stored in different databases according to the language, and the first translated text and second voice data of the same language can be stored in the typeset document, image processing result, and abstract document. At least one of the corresponding ones is stored in the same database, and the database corresponds to the language identifier.
- sending the simultaneous interpretation data to each client will execute a serial service, in order to ensure the timeliness of sending simultaneous interpretation data to multiple clients at the same time .
- the server directly obtains the corresponding result from the cache, which can ensure the high timeliness of the simultaneous interpretation data delivery, and can also protect the server's computing resources.
- the method may further include:
- the simultaneous interpretation data corresponding to at least one language is classified and cached according to the language.
- the server may predetermine the preset language of each client in at least one client, and obtain the simultaneous interpretation data corresponding to the preset language from the database for caching.
- the simultaneous interpretation data of the corresponding language can be directly obtained from the cache, thereby improving the timeliness and the protection of computing resources.
- the client selects another language different from the preset language, and the simultaneous interpretation data of the other language may not be cached.
- the server determines that the client sends an acquisition request for selecting another language that is different from its preset language, it may The simultaneous interpretation data of other languages requested by the client is also cached; when another client selects the same language, the corresponding simultaneous interpretation data can be directly obtained from the cache, thereby improving timeliness and compatibility. Protection of computing resources.
- the simultaneous interpretation data corresponding to the target language can be obtained according to the acquisition request sent by the user through the client.
- the method may further include:
- the client may be provided with a human-computer interaction interface through which the user can select a language.
- the client generates an acquisition request containing the target language according to the user's selection, and sends the acquisition request to the server, so that the server receives The acquisition request.
- the client can be installed on the mobile phone; here, considering that most users will carry their mobile phones with them, the simultaneous interpretation data will be sent to the client installed on the mobile phone without adding other devices to receive and display the simultaneous voice. Interpreting data can save costs and is easy to operate.
- the first to-be-processed data corresponds to simultaneous interpretation data corresponding to at least one language
- the simultaneous interpretation data includes: the first translated text and the second voice data; and also includes at least one of the following One: The typeset document, the image processing result, and the abstract document. That is, the first data to be processed corresponds to the first translated text in at least one language, the second voice data in at least one language, and at least one of the following: a typeset document in at least one language, and an image in at least one language Processing results, abstract documents in at least one language.
- the corresponding simultaneous interpretation data can be obtained according to the target time sent by the client.
- the acquisition request may include a target time; when the simultaneous interpretation data corresponding to the target language is acquired from the cached content, the method further includes:
- the simultaneous interpretation data corresponding to the target time is obtained from the cache; the time correspondence represents the time relationship between the various data in the simultaneous interpretation data.
- the user can also select the time through the human-computer interaction interface, and the client generates an acquisition request containing the target time according to the user's selection.
- the simultaneous interpretation method is applied to a meeting; the user selects a time point in the meeting as the target time.
- time relationship between the data in the simultaneous interpretation data refers to the relationship between the first translated text, the second voice data, and at least one of the typeset document, the image processing result, and the abstract document in the simultaneous interpretation data. Time relationship.
- the time correspondence is generated in advance according to the time axis of the first voice data and the time point when the first image data is acquired.
- the acquisition request contains the target language and the acquisition request contains the target time
- the target language can be preset by the client. It can also be implemented in the same scheme (that is, the acquisition request includes both the target language and the target time, and the server obtains the simultaneous interpretation data of the target time corresponding to the target language).
- the corresponding relationship between the data in the simultaneous transmission data can be generated in advance. Based on the corresponding relationship, it can be achieved that when a certain data in the simultaneous transmission data is acquired, the corresponding other data can be obtained at the same time. For example, when the first translated text is obtained, the second voice data, summary document, typeset document corresponding to the first translated text can be obtained correspondingly, and the image processing result corresponding to the display document can be obtained.
- the method further includes:
- the respective data in the simultaneous interpretation data are correspondingly saved using the time correspondence relationship.
- the server when the server receives the first voice data, it determines the receiving time, determines the end time according to the duration of the first voice data, and generates the information for the first voice data according to the receiving time and the end time.
- the first time axis In another embodiment, the first terminal determines the start time and the duration of the first voice data when collecting the first voice data, and sends them to the server, and the server determines the The first time axis of the first voice data.
- the corresponding time point when the server obtains the first image data may be used as the time point for obtaining the first image data.
- the first terminal determines the corresponding time point when intercepting the first image data, and sends the determined time point together with the first image data to the server, and the server receives the time Point and the first image data, and use the time point as the time point for acquiring the first image data.
- the time relationship between the first voice data and the first image data can be determined; and the first translated text and the second voice data in the simultaneous interpretation data
- the typeset document and the summary document are all generated on the basis of the first voice data, so the time relationship between the first translated text, the second voice data, the typeset document, and the summary document respectively and the first voice data can be determined. Based on this, the time correspondence between the data in the simultaneous interpretation data can be generated.
- the time correspondence relationship may be embodied in the form of a time axis, that is, a second time axis is generated; the second time axis may be based on the time axis of the second voice data; the second time axis The start time point and end time point corresponding to each segmented voice in the second voice data are marked on it.
- the time corresponding to each paragraph in the first translated text is marked on the second time axis; the time may specifically be the segmented voice in the second voice data corresponding to each paragraph in the second time axis Point in time.
- the time corresponding to the typeset document is marked on the second time axis.
- the time point of the segmented voice in the second voice data corresponding to the typeset document on the second time axis may be used.
- the time corresponding to the summary document is marked on the second time axis.
- the time point of the segmented voice in the second voice data corresponding to the summary document on the second time axis may be used.
- the time corresponding to the image processing result is marked on the second time axis.
- the relationship between the time corresponding to the image processing result and the second time axis may be determined according to the relationship between the first time axis and the time point of the first image data.
- sending the first translated text and the second voice data in the simultaneous interpretation data to the client includes:
- At least one paragraph in the first translated text and the segmented speech corresponding to the paragraph are sent to the client; the segmented speech is used to play when the client displays the segment corresponding to the segmented speech .
- the paragraph and the segmented voice corresponding to the paragraph are sent to the client together, and when the paragraph is displayed by the client, the client can play the segmented voice corresponding to the paragraph at the same time.
- the method may further include: generating a target document in a preset format according to the typeset document and the summary document; the target document is used for presenting on the client when the first voice data is played.
- the server According to the typeset document and the abstract document, the server generates a target document containing the content of the typeset document and the abstract document, and the target document can display the extracted abstract and typesetting together.
- the method provided in the embodiments of this application can be applied to simultaneous interpretation scenarios, such as simultaneous interpretation in conferences.
- simultaneous interpretation scenarios such as simultaneous interpretation in conferences.
- the translation of conference presentation documents allows users to understand the speaker’s speech more clearly in combination with the presentation documents Content; through typesetting and abstract extraction of speech content (that is, the first voice data), to help users better summarize and retrieve; by combining at least one paragraph in the first translated text with the corresponding segment of the paragraph
- the corresponding voice is sent to the client to help users better accept the content of the speech for the dense text translation content.
- the simultaneous interpretation method obtains the first data to be processed; translates the first voice data in the first data to be processed to obtain the first translated text; and generates the first translated text according to the first translated text.
- Second voice data and perform at least one of the following: obtain a typeset document according to the first voice data and the first translated text; perform image word processing on the first image data in the first to-be-processed data to obtain an image Processing result;
- the first image data includes at least a display document corresponding to the first voice data; wherein the language corresponding to the first voice data is different from the language corresponding to the typeset document; the first voice data
- the corresponding language is different from the language corresponding to the first translated text; the language corresponding to the first voice data is different from the language corresponding to the second voice data; the language of the text displayed in the first image data is different from that of the text.
- the language of the text included in the image processing result; the first translated text, the typeset document, the second voice data, and the image processing result are used to present the first voice data on the client when the first voice data is played, and provide the user with the first voice Data-related text translation results, voice translation results, typeset documents corresponding to the text translation results, and translation results related to the display document corresponding to the first voice data.
- the content of the first voice data of the speech can be more intuitive and comprehensive Intuitive display enables users to understand the summary of the speech content and the content of the displayed document through the client, helps users better accept the content of the speech, and enhances the user experience; it also facilitates the user's subsequent summary of the content of the speech.
- FIG. 3 is a schematic diagram of the system architecture of the application of the simultaneous interpretation method of the embodiment of the application.
- the system is applied to the conference Simultaneous interpretation, the system includes: machine simultaneous interpretation server, speech recognition server, translation server, audience mobile terminal, PC client, display screen, conference management equipment, TTS server, OCR server, NLP server.
- TTS server a server that is used for simultaneous interpretation
- OCR server a server that is used for simultaneous interpretation
- NLP server conference management equipment
- each function can be implemented on multiple servers, which can be implemented separately in the speech recognition server.
- Translation server, TTS server, OCR server, NLP server, conference management equipment, etc. so as to improve the efficiency of simultaneous interpretation and ensure high timeliness.
- the PC client is used to collect the audio of the lecturer's speech content in the conference, that is, to collect the first voice data; the document to be displayed is projected to the display screen by way of projection, and the display screen displays the document to the participants Other users of the conference; and, collecting first image data for the document.
- the document may be a PPT document, a Word document, and so on.
- the PC client is also used to send the collected first voice data and first image data to the machine simultaneous interpretation server.
- the PC client may also have a screenshot function, so that through the screenshot operation on the screen, the document currently displayed by the speaker can be obtained in real time, that is, the first image data is collected;
- the time corresponding to the first image data can be recorded correspondingly, and the first image data and the corresponding time can be sent to the machine simultaneous interpretation server.
- the machine simultaneous interpretation server is used to send the first voice data to a voice recognition server; the first voice data is recognized by the voice recognition server using voice recognition technology to obtain the recognized text And send it to the machine's simultaneous interpretation server; and,
- the recognized text is translated by the translation server using a preset translation model to obtain the first translated text and sent to the machine simultaneous interpretation server;
- the machine simultaneous interpretation server is also used to send the first translated text and the first image data to the conference management device.
- the first translated text and the first image data respectively carry their corresponding time information, where the time information corresponding to the translation result may include information about each paragraph in the translation result corresponding to each segment of the first voice data. Time information.
- the conference management device is configured to receive the first translated text and the first image data
- the translation result is obtained by the NLP server, and at least one of a typeset document and an abstract document is obtained according to the first translated text;
- the first image data is sent to the OCR server; the first image data is received by the OCR server, the text in the first image data is extracted and the position of the text is determined; the extracted text and the text The location is sent to the conference management equipment;
- the translation server sends the received extracted text to the translation server, receive the translation result sent by the translation server, and generate an image processing result according to the translation result and the location of the extracted text; here, the extracted text is received by the translation server Translate the extracted text and send the translation result to the conference management equipment; and,
- the first translated text is sent to the TTS server; the first translated text is received by the TTS server and second voice data is generated according to the first translated text, and the second voice data is sent to the conference management device.
- the conference management device is also used to send the first translated text, the second voice data, the typeset document, the summary document, and the image processing result to the mobile terminal of the audience.
- the OCR server is used to obtain the extractable text in the display document corresponding to the first image data and the interface positioning information corresponding to the text through OCR technology and interface positioning technology;
- the content is translated in multiple languages; according to the interface positioning information, the corresponding translation content of different languages is merged into the picture to obtain the image processing result; the image processing result is stored in the corresponding server according to the language.
- the NLP server is configured to generate at least one of a typeset document and an abstract document according to the first translated text.
- the NLP server is used to use NLP technology and VAD technology to generate a typeset document based on the first voice data and the first translated text; and generate a summary document based on the first translated text.
- VAD Voice Activity Detection
- VAD Voice Activity Detection
- the first translated text corresponding to the first voice data can be pre-segmented to obtain at least one pre-segmented paragraph;
- semantic analysis is performed on the context using NLP technology to determine whether to segment the pre-segmented paragraph, and finally at least one paragraph is determined.
- the NLP server is also used to use NLP's abstract extraction technology to organize the abstract content of the first translated text to obtain an abstract document.
- the conference management device is also used to fill in preset forms according to typeset documents in different languages and abstract documents in different languages to generate a target form.
- the preset table may adopt the format of Table 1 below, and may include: meeting name, meeting time, meeting topic name, topic time, speaker, identification content, translation content of language A, translation content of language B, language A The content of the abstract and the abstract content of language B.
- the simultaneous interpretation account, meeting name, meeting time, meeting topic name, topic time, and speaker can be filled in by the user in advance according to the actual situation.
- the conference management device fills in correspondingly the translated content of language one, the translated content of language two, ..., the translated content of language N in the language correspondence table 1; and For abstract documents in at least one language, fill in the abstract content of language one, the abstract content of language two,..., the abstract content of language N in the language correspondence table 1, so as to realize the sorting and summarization of the conference content.
- the TTS server provides a TTS simultaneous interpretation service; specifically, the TTS server is used to transfer the first translated text in different languages, call the TTS service, and synthesize audio content in different languages, that is, to obtain the second voice data.
- the conference management device is also used to store the first translated text, the second voice data, and at least one of the typeset document, the abstract document, and the image processing result in the database of the corresponding language according to the time-corresponding relationship .
- the time correspondence relationship can be implemented using a time axis, and the specific implementation method has been described in the method shown in FIG. 1 and will not be repeated here.
- the mobile terminal pulls the corresponding first translation document according to the time axis, it can obtain the corresponding second voice data together; it can also obtain the corresponding typeset document, abstract document, and image processing result. at least one.
- the simultaneous interpretation of PPT documents is added through the OCR server, the typesetting and summary of the meeting content are provided through the NLP server, and the "listening" service of simultaneous interpretation by the machine is added through the TTS server, which improves the user's experience in the meeting.
- the sensory experience helps users better understand the content of speeches and documents, and also facilitates the audience to summarize and organize the content of the meeting.
- Fig. 4 is a schematic diagram of the composition structure of the simultaneous interpretation device according to the embodiment of the application; as shown in Fig. 4, the simultaneous interpretation device includes:
- the obtaining unit 41 is configured to obtain the first to-be-processed data
- the first processing unit 42 is configured to translate the first voice data in the first to-be-processed data to obtain a first translated text
- the second processing unit 43 is configured to generate second voice data according to the first translated text
- the third processing unit 44 is configured to perform at least one of the following:
- the first image data includes at least a display document corresponding to the first voice data;
- the language corresponding to the first voice data is different from the language corresponding to the typeset document; the language corresponding to the first voice data is different from the language corresponding to the first translated text; the language corresponding to the first voice data is different The language corresponding to the second voice data; the language of the text displayed in the first image data is different from the language of the text included in the image processing result; the first translated text, typeset document, second voice data, and The image processing result is used for presentation on the client when the first voice data is played.
- the third processing unit 44 is configured to perform voice activity detection on the first voice data, and determine a silent point in the first voice data;
- Typesetting the at least one paragraph to obtain the typeset document.
- the second processing unit 43 is configured to segment the first translated text to obtain at least one paragraph in the first translated text
- the second processing unit 43 segments the first translated text, and the same segmentation method as the first processing unit 42 can be used.
- the third processing unit 44 is configured to extract a summary of the first translated text to obtain a summary document for the first translated text; the summary document is used to play the first translated text.
- a voice data is presented at the client terminal.
- the user can summarize and summarize the content of the first voice data, and better accept the content of the first voice data.
- the third processing unit 44 is configured to determine a character in the first image data and a position corresponding to the character
- the image processing result is generated.
- the image processing result may include at least one of the following: second image data and second translated text.
- the third processing unit 44 is configured to execute at least one of the following to generate the image processing result:
- the translated text is used to generate a second translated text, and the second translated text is used as the image processing result.
- the simultaneous interpretation data obtained by using the first to-be-processed data corresponds to at least one language
- the device further includes: a storage unit; the storage unit is configured to classify and cache the simultaneous interpretation data corresponding to at least one language type according to the language type.
- the device further includes: a communication unit; the communication unit is configured to receive an acquisition request sent by the client; the acquisition request is used to acquire simultaneous interpretation data; the acquisition request includes at least: a target Language
- the acquisition request further includes a target time
- the communication unit is further configured to, when obtaining the simultaneous interpretation data corresponding to the target language from the cached content, obtain the simultaneous interpretation data corresponding to the target time from the cache according to a preset time correspondence relationship;
- the time correspondence relationship represents the time relationship between the data in the simultaneous interpretation data; the time correspondence relationship is generated in advance according to the time axis of the first voice data and the time point at which the first image data is acquired of.
- the storage unit is further configured to determine a first time axis corresponding to the first voice data and a time point for acquiring the first image data;
- the respective data in the simultaneous interpretation data are correspondingly saved using the time correspondence relationship.
- the communication unit is further configured to send at least one paragraph in the first translated text and the segmented speech corresponding to the paragraph to the client; when the paragraph is displayed by the client, The segmented voice corresponding to the paragraph is played by the client.
- the acquisition unit 41 can be implemented through a communication interface; the first processing unit 42, the second processing unit 43, and the third processing unit 44 can all be implemented by a processor in the server, such as a central Processor (CPU, Central Processing Unit), Digital Signal Processor (DSP, Digital Signal Processor), Microcontroller Unit (MCU, Microcontroller Unit) or Programmable Gate Array (FPGA, Field-Programmable Gate Array), etc.;
- the communication unit can be implemented by a communication interface in the server.
- the device provided in the above embodiment performs simultaneous interpretation
- only the division of the above-mentioned program modules is used as an example for illustration.
- the above-mentioned processing can be allocated to different program modules according to needs, i.e.
- the internal structure of the terminal is divided into different program modules to complete all or part of the processing described above.
- the device provided in the foregoing embodiment and the embodiment of the simultaneous interpretation method belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
- FIG. 5 is a schematic diagram of the hardware composition structure of the server according to an embodiment of the present application.
- a computer program that is on the memory 53 and can run on the processor 52; when the processor 52 located on the server executes the program, the method provided by one or more technical solutions on the server side is implemented.
- the processor 52 located in the server 50 executes the program, it realizes: obtain the first to-be-processed data; translate the first voice data in the first to-be-processed data to obtain the first translated text; First translate text, generate second voice data; and perform at least one of the following:
- Image word processing is performed on the first image data in the first to-be-processed data to obtain an image processing result;
- the first image data includes at least a display document corresponding to the first voice data;
- the typeset document, the second voice data, and the image processing result are used for presentation on the client when the first voice data is played.
- the server further includes a communication interface 51; various components in the server are coupled together through the bus system 54.
- the bus system 54 is configured to implement connection and communication between these components.
- the bus system 54 also includes a power bus, a control bus, and a status signal bus.
- the memory 53 in this embodiment may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
- the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), and erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access memory (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
- the volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache.
- RAM random access memory
- SRAM static random access memory
- SSRAM synchronous static random access memory
- Synchronous Static Random Access Memory Synchronous Static Random Access Memory
- DRAM Dynamic Random Access Memory
- SDRAM Synchronous Dynamic Random Access Memory
- DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
- ESDRAM Enhanced Synchronous Dynamic Random Access Memory
- SLDRAM synchronous connection dynamic random access memory
- DRRAM Direct Rambus Random Access Memory
- the memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.
- the method disclosed in the foregoing embodiments of the present application may be applied to the processor 52 or implemented by the processor 52.
- the processor 52 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 52 or instructions in the form of software.
- the aforementioned processor 52 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like.
- the processor 52 may implement or execute various methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
- the general-purpose processor may be a microprocessor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
- the software module may be located in a storage medium, and the storage medium is located in a memory.
- the processor 52 reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
- the embodiments of the present application also provide a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
- Computer instructions that is, computer programs, are stored thereon, and when the computer instructions are executed by the processor, the method provided by one or more technical solutions on the server side is provided.
- the disclosed method and smart device can be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
- the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
- the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit;
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
- the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
- the above-mentioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
- the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (14)
- 一种同声传译方法,应用于服务器,包括:A simultaneous interpretation method, applied to a server, including:获得第一待处理数据;Obtain the first data to be processed;对所述第一待处理数据中的第一语音数据进行翻译,获得第一翻译文本;Translating the first voice data in the first to-be-processed data to obtain the first translated text;根据所述第一翻译文本,生成第二语音数据;并执行以下至少之一:According to the first translated text, generate second voice data; and perform at least one of the following:根据所述第一语音数据和第一翻译文本,获得排版文档;Obtaining a typeset document according to the first voice data and the first translated text;对所述第一待处理数据中的第一图像数据进行图像文字处理,获得图像处理结果;所述第一图像数据至少包含与所述第一语音数据对应的展示文档;其中,Perform image word processing on the first image data in the first to-be-processed data to obtain an image processing result; the first image data includes at least a display document corresponding to the first voice data; wherein,所述第一语音数据对应的语种不同于所述排版文档对应的语种;所述第一语音数据对应的语种不同于所述第一翻译文本对应的语种;所述第一语音数据对应的语种不同于所述第二语音数据对应的语种;所述第一图像数据显示的文字的语种不同于所述图像处理结果包括的文字的语种;所述第一翻译文本、排版文档、第二语音数据和图像处理结果用于在播放第一语音数据时在客户端进行呈现。The language corresponding to the first voice data is different from the language corresponding to the typeset document; the language corresponding to the first voice data is different from the language corresponding to the first translated text; the language corresponding to the first voice data is different The language corresponding to the second voice data; the language of the text displayed in the first image data is different from the language of the text included in the image processing result; the first translated text, typeset document, second voice data, and The image processing result is used for presentation on the client when the first voice data is played.
- 根据权利要求1所述的方法,其中,所述根据第一语音数据和第一翻译文本,确定排版文档,包括:The method according to claim 1, wherein said determining the typeset document according to the first voice data and the first translated text comprises:对所述第一语音数据进行语音活动检测,确定所述第一语音数据中的静音点;Performing voice activity detection on the first voice data, and determining a mute point in the first voice data;获取所述第一翻译文本中所述静音点对应的上下文;Acquiring the context corresponding to the mute point in the first translated text;根据所述静音点和所述上下文的语义对所述第一翻译文本进行分段,获得至少一个段落;Segment the first translated text according to the mute point and the semantics of the context to obtain at least one paragraph;对所述至少一个段落进行排版,得到所述排版文档。Typesetting the at least one paragraph to obtain the typeset document.
- 根据权利要求1所述的方法,其中,所述根据所述第一翻译文本,生成第二语音数据,包括:The method according to claim 1, wherein said generating second voice data according to said first translated text comprises:对所述第一翻译文本进行分段,得到所述第一翻译文本中的至少一个段落;Segment the first translated text to obtain at least one paragraph in the first translated text;根据所述第一翻译文本中的至少一个段落,生成至少一个分段语音;Generating at least one segmented speech according to at least one paragraph in the first translated text;利用所述至少一个分段语音,合成所述第一翻译文本对应的第二语音数据。Using the at least one segmented speech to synthesize second speech data corresponding to the first translated text.
- 根据权利要求1所述的方法,其中,所述方法还包括:The method according to claim 1, wherein the method further comprises:对所述第一翻译文本进行摘要抽取,得到针对所述第一翻译文本的摘要文档;所述摘要文档用于在播放所述第一语音数据时在所述客户端进行呈现。Abstract extraction is performed on the first translated text to obtain a summary document for the first translated text; the summary document is used for presentation on the client when the first voice data is played.
- 根据权利要求1所述的方法,其中,所述对所述第一图像数据进 行图像文字处理,获得图像处理结果,包括:The method according to claim 1, wherein said performing image word processing on said first image data to obtain an image processing result comprises:确定所述第一图像数据中的文字和所述文字对应的位置;Determining a text in the first image data and a position corresponding to the text;提取所述第一图像数据中的文字,并对提取的文字进行翻译;Extracting text in the first image data, and translating the extracted text;根据翻译后的文字,生成所述图像处理结果。According to the translated text, the image processing result is generated.
- 根据权利要求5所述的方法,其中,所述根据翻译后的文字,生成所述图像处理结果,包括以下至少一种:The method according to claim 5, wherein the generating the image processing result according to the translated text includes at least one of the following:根据翻译后的文字,替换所述第一图像数据中所述位置对应的文字,得到第二图像数据,将所述第二图像数据作为所述图像处理结果;According to the translated text, replacing the text corresponding to the position in the first image data to obtain second image data, and using the second image data as the image processing result;利用翻译后的文字生成第二翻译文本,将所述第二翻译文本作为所述图像处理结果。The translated text is used to generate a second translated text, and the second translated text is used as the image processing result.
- 根据权利要求1至6任一项所述的方法,其中,利用所述第一待处理数据获得的同声传译数据对应至少一种语种;所述方法还包括:The method according to any one of claims 1 to 6, wherein the simultaneous interpretation data obtained by using the first to-be-processed data corresponds to at least one language; the method further comprises:将至少一种语种对应的同声传译数据,按语种进行分类缓存。The simultaneous interpretation data corresponding to at least one language is classified and cached according to the language.
- 根据权利要求7述的方法,其中,所述方法还包括:The method according to claim 7, wherein the method further comprises:接收客户端发送的获取请求;所述获取请求用于获取同声传译数据;所述获取请求至少包括:目标语种;Receive an acquisition request sent by the client; the acquisition request is used to acquire simultaneous interpretation data; the acquisition request includes at least: the target language;从缓存的同声传译数据中获取所述目标语种对应的同声传译数据;Acquiring the simultaneous interpretation data corresponding to the target language from the cached simultaneous interpretation data;将获取的所述目标语种对应的同声传译数据发送给客户端。Send the acquired simultaneous interpretation data corresponding to the target language to the client.
- 根据权利要求8所述的方法,其中,所述获取请求还包含目标时间;所述从缓存的内容中获取所述目标语种对应的同声传译数据时,所述方法还包括:The method according to claim 8, wherein the acquisition request further includes a target time; when the simultaneous interpretation data corresponding to the target language is acquired from the cached content, the method further comprises:根据预设的时间对应关系,从缓存中获取所述目标时间对应的同声传译数据;所述时间对应关系表征所述同声传译数据中各数据之间的时间关系;所述时间对应关系是根据所述第一语音数据的时间轴和获取所述第一图像数据的时间点预先生成的。According to the preset time correspondence, the simultaneous interpretation data corresponding to the target time is obtained from the cache; the time correspondence represents the time relationship between the data in the simultaneous interpretation data; the time correspondence is It is generated in advance according to the time axis of the first voice data and the time point when the first image data is acquired.
- 根据权利要求9所述的方法,其中,所述方法还包括:The method according to claim 9, wherein the method further comprises:确定获取所述第一语音数据对应的第一时间轴和获取所述第一图像数据的时间点;Determining a first time axis corresponding to the first voice data and a time point for acquiring the first image data;根据所述第一时间轴和所述时间点,生成所述同声传译数据中各数据之间的时间对应关系;Generating a time correspondence between each data in the simultaneous interpretation data according to the first time axis and the time point;利用所述时间对应关系将所述同声传译数据中各数据对应保存。The respective data in the simultaneous interpretation data are correspondingly saved using the time correspondence relationship.
- 根据权利要求8所述的方法,其中,将所述同声传译数据中的第一翻译文本和第二语音数据发送给客户端,包括:The method according to claim 8, wherein the sending the first translated text and the second voice data in the simultaneous interpretation data to the client terminal comprises:将所述第一翻译文本中的至少一个段落和所述段落对应的分段语音发送给客户端;所述段落由客户端展示时,所述段落对应的分段语音由所述客户端播放。At least one paragraph in the first translated text and the segmented voice corresponding to the paragraph are sent to the client; when the paragraph is displayed by the client, the segmented voice corresponding to the paragraph is played by the client.
- 一种同声传译装置,包括:A simultaneous interpretation device, including:获取单元,配置为获得第一待处理数据;An obtaining unit configured to obtain the first to-be-processed data;第一处理单元,配置为对所述第一待处理数据中的第一语音数据进行翻译,获得第一翻译文本;The first processing unit is configured to translate the first voice data in the first to-be-processed data to obtain a first translated text;第二处理单元,配置为根据所述第一翻译文本,生成第二语音数据;The second processing unit is configured to generate second voice data according to the first translated text;第三处理单元,配置为执行以下至少一个:The third processing unit is configured to execute at least one of the following:根据所述第一语音数据和第一翻译文本,获得排版文档;Obtaining a typeset document according to the first voice data and the first translated text;对所述第一待处理数据中的第一图像数据进行图像文字处理,获得图像处理结果;所述第一图像数据至少包含与所述第一语音数据对应的展示文档;其中,Perform image word processing on the first image data in the first to-be-processed data to obtain an image processing result; the first image data includes at least a display document corresponding to the first voice data; wherein,所述第一语音数据对应的语种不同于所述排版文档对应的语种;所述第一语音数据对应的语种不同于所述第一翻译文本对应的语种;所述第一语音数据对应的语种不同于所述第二语音数据对应的语种;所述第一图像数据显示的文字的语种不同于所述图像处理结果包括的文字的语种;所述第一翻译文本、排版文档、第二语音数据和图像处理结果用于在播放第一语音数据时在客户端进行呈现。The language corresponding to the first voice data is different from the language corresponding to the typeset document; the language corresponding to the first voice data is different from the language corresponding to the first translated text; the language corresponding to the first voice data is different The language corresponding to the second voice data; the language of the text displayed in the first image data is different from the language of the text included in the image processing result; the first translated text, typeset document, second voice data, and The image processing result is used for presentation on the client when the first voice data is played.
- 一种服务器,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至11任一项所述方法的步骤。A server includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the steps of the method according to any one of claims 1 to 11 when the processor executes the program.
- 一种存储介质,其上存储有计算机指令,所述指令被处理器执行时实现权利要求1至11任一项所述方法的步骤。A storage medium having computer instructions stored thereon, which implement the steps of the method described in any one of claims 1 to 11 when the instructions are executed by a processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/109677 WO2021062757A1 (en) | 2019-09-30 | 2019-09-30 | Simultaneous interpretation method and apparatus, and server and storage medium |
CN201980099995.2A CN114341866A (en) | 2019-09-30 | 2019-09-30 | Simultaneous interpretation method, device, server and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/109677 WO2021062757A1 (en) | 2019-09-30 | 2019-09-30 | Simultaneous interpretation method and apparatus, and server and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021062757A1 true WO2021062757A1 (en) | 2021-04-08 |
Family
ID=75336728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/109677 WO2021062757A1 (en) | 2019-09-30 | 2019-09-30 | Simultaneous interpretation method and apparatus, and server and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114341866A (en) |
WO (1) | WO2021062757A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023317A (en) * | 2021-11-04 | 2022-02-08 | 五华县昊天电子科技有限公司 | Voice translation system based on cloud platform |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114818747A (en) * | 2022-04-21 | 2022-07-29 | 语联网(武汉)信息技术有限公司 | Computer-aided translation method and system of voice sequence and visual terminal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060143681A1 (en) * | 2004-12-29 | 2006-06-29 | Delta Electronics, Inc. | Interactive entertainment center |
CN101714140A (en) * | 2008-10-07 | 2010-05-26 | 英业达股份有限公司 | Instant translation system with multimedia display and method thereof |
CN109614628A (en) * | 2018-11-16 | 2019-04-12 | 广州市讯飞樽鸿信息技术有限公司 | A kind of interpretation method and translation system based on Intelligent hardware |
CN109696748A (en) * | 2019-02-14 | 2019-04-30 | 郑州诚优成电子科技有限公司 | A kind of augmented reality subtitle glasses for synchronous translation |
CN110121097A (en) * | 2019-05-13 | 2019-08-13 | 深圳市亿联智能有限公司 | Multimedia playing apparatus and method with accessible function |
-
2019
- 2019-09-30 CN CN201980099995.2A patent/CN114341866A/en active Pending
- 2019-09-30 WO PCT/CN2019/109677 patent/WO2021062757A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060143681A1 (en) * | 2004-12-29 | 2006-06-29 | Delta Electronics, Inc. | Interactive entertainment center |
CN101714140A (en) * | 2008-10-07 | 2010-05-26 | 英业达股份有限公司 | Instant translation system with multimedia display and method thereof |
CN109614628A (en) * | 2018-11-16 | 2019-04-12 | 广州市讯飞樽鸿信息技术有限公司 | A kind of interpretation method and translation system based on Intelligent hardware |
CN109696748A (en) * | 2019-02-14 | 2019-04-30 | 郑州诚优成电子科技有限公司 | A kind of augmented reality subtitle glasses for synchronous translation |
CN110121097A (en) * | 2019-05-13 | 2019-08-13 | 深圳市亿联智能有限公司 | Multimedia playing apparatus and method with accessible function |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023317A (en) * | 2021-11-04 | 2022-02-08 | 五华县昊天电子科技有限公司 | Voice translation system based on cloud platform |
Also Published As
Publication number | Publication date |
---|---|
CN114341866A (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111883123B (en) | Conference summary generation method, device, equipment and medium based on AI identification | |
US10282162B2 (en) | Audio book smart pause | |
CN108683937B (en) | Voice interaction feedback method and system for smart television and computer readable medium | |
WO2021109678A1 (en) | Video generation method and apparatus, electronic device, and storage medium | |
CN111050201B (en) | Data processing method and device, electronic equipment and storage medium | |
CN108012173B (en) | Content identification method, device, equipment and computer storage medium | |
CN104735468A (en) | Method and system for synthesizing images into new video based on semantic analysis | |
CN112653902A (en) | Speaker recognition method and device and electronic equipment | |
GB2535861A (en) | Data lookup and operator for excluding unwanted speech search results | |
WO2021062757A1 (en) | Simultaneous interpretation method and apparatus, and server and storage medium | |
CN104994404A (en) | Method and device for obtaining keywords for video | |
WO2021087665A1 (en) | Data processing method and apparatus, server, and storage medium | |
CN112581965A (en) | Transcription method, device, recording pen and storage medium | |
CN110992960A (en) | Control method, control device, electronic equipment and storage medium | |
WO2021102754A1 (en) | Data processing method and device and storage medium | |
KR20220130863A (en) | Apparatus for Providing Multimedia Conversion Content Creation Service Based on Voice-Text Conversion Video Resource Matching | |
US20230300429A1 (en) | Multimedia content sharing method and apparatus, device, and medium | |
CN111161710A (en) | Simultaneous interpretation method and device, electronic equipment and storage medium | |
US11874867B2 (en) | Speech to text (STT) and natural language processing (NLP) based video bookmarking and classification system | |
CN112818708B (en) | System and method for processing voice translation of multi-terminal multi-language video conference in real time | |
CN111580766B (en) | Information display method and device and information display system | |
WO2021120174A1 (en) | Data processing method, apparatus, electronic device, and storage medium | |
CN114503546A (en) | Subtitle display method, device, electronic equipment and storage medium | |
CN111161737A (en) | Data processing method and device, electronic equipment and storage medium | |
WO2023273667A1 (en) | Data processing method and apparatus, server, client, medium, and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19947810 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19947810 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19947810 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 071022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19947810 Country of ref document: EP Kind code of ref document: A1 |