WO2023098412A1 - 字幕控制方法、电子设备及计算机可读存储介质 - Google Patents

字幕控制方法、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2023098412A1
WO2023098412A1 PCT/CN2022/130303 CN2022130303W WO2023098412A1 WO 2023098412 A1 WO2023098412 A1 WO 2023098412A1 CN 2022130303 W CN2022130303 W CN 2022130303W WO 2023098412 A1 WO2023098412 A1 WO 2023098412A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtitle
software
media
result
display
Prior art date
Application number
PCT/CN2022/130303
Other languages
English (en)
French (fr)
Inventor
刘畅
姚望
张穗云
王笑
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023098412A1 publication Critical patent/WO2023098412A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the present application relates to the field of software control, in particular to a subtitle control method, electronic equipment and a computer-readable storage medium.
  • An important application scenario of speech recognition technology is to recognize the audio/video being played in real time, convert the speech signal in the audio/video being played into subtitles in the corresponding language, and display the subtitles to the user.
  • Embodiments of the present application provide a subtitle control method, an electronic device, and a computer-readable storage medium, which can solve the problems of high delay and poor user experience in existing subtitle recognition solutions to a certain extent.
  • the embodiment of the present application provides a subtitle control method, including:
  • the second subtitle software is used to recognize the voice signal of the media content played by the first media software as a corresponding subtitle result and display it when the first media software plays the media file;
  • a first subtitle result corresponding to the first media content is acquired from historically stored subtitle results for display.
  • the electronic device may be installed with the first media software and the second subtitle software.
  • the above-mentioned first media software can be used to play various media files, and the media files can include audio files, video files and other files with voice signals.
  • the above-mentioned first media software may be a system service of the electronic device, or may also be an application program.
  • the above-mentioned second subtitle software can be used to recognize and display the voice signal of the media content played by the first media software as a corresponding subtitle result when the first media software plays the media file.
  • the above-mentioned second subtitle software may be a system service of the electronic device, or may also be an application program.
  • the above-mentioned media content refers to the content displayed by the first media software when the first media software plays the media file.
  • the above-mentioned media file refers to the video file
  • the above-mentioned media content refers to the video picture played by the first media software and the voice signal corresponding to the video picture
  • the above-mentioned media file refers to the audio file
  • the above-mentioned media content refers to a voice signal played by the first media software.
  • the electronic device may detect the first media content currently played by the first media software through the second subtitle software.
  • the electronic device may obtain the first subtitle result corresponding to the first media content from the subtitle results saved in history for display.
  • the electronic device before the electronic device recognizes the first voice signal of the first media content as a subtitle result, it can directly display the first subtitle result corresponding to the first media content, saving the time consumed for recognizing the first voice signal , which reduces the delay in displaying subtitles by the second subtitle software.
  • the electronic device when the electronic device displays the first subtitle result, it can display the first subtitle result corresponding to the first voice signal of the whole sentence before the first voice signal of the whole sentence is collected, which is convenient for the user to check the first subtitle result and improves the user experience. experience.
  • the second subtitle software after using the second subtitle software to detect the first media content currently played by the first media software, further include:
  • the first voice signal of the first media content is recognized as a second subtitle result and displayed.
  • the electronic device may recognize the first voice signal of the first media content as the second subtitle result and display it.
  • the electronic device After the electronic device recognizes the second subtitle result, it can store the second subtitle result so that it can be called directly when necessary.
  • the detecting the first media content currently played by the first media software includes:
  • the second subtitle software may obtain the media parameters of the first media software.
  • the electronic device when the electronic device detects the first media content currently played by the first media software through the second subtitle software, the electronic device may acquire media parameters corresponding to the first media content.
  • the foregoing media parameters may include a time stamp corresponding to the first media content.
  • the electronic device may search for the first subtitle result corresponding to the media parameter.
  • the electronic device may determine that the first media content is the identified media content.
  • the electronic device may determine that the first media content is unrecognized media content.
  • First subtitle results are displayed, including:
  • the first subtitle result corresponding to the media parameter is found. If the first subtitle result corresponding to the media parameter is found, the first subtitle result is displayed.
  • the electronic device may determine that the first media content is the identified media content, and display the found first subtitle result.
  • the electronic device can quickly find the corresponding first subtitle result according to the media parameters corresponding to the first media content when playing the identified media content, without needing to identify the voice signal of the first media content, saving
  • the time of speech recognition improves the speed of subtitle display and reduces the delay of subtitle display, which can effectively improve user experience.
  • the detecting the first media content currently played by the first media software includes:
  • the first reference data includes the voice features of the first voice signal and the voice recognition of the first voice signal one or more of the recognized text and the translated text of the target language type corresponding to the recognized text;
  • the second subtitle software cannot obtain the media parameters of the first media software.
  • the electronic device may recognize the first voice signal of the first media content to obtain the first reference data.
  • the first reference data may include one or more of the speech features of the first speech signal, the recognition text obtained by performing speech recognition on the first speech signal, and the translation text of the target language type corresponding to the recognition text.
  • the above speech features may include one or more of features such as frequency cepstral coefficients, linear predictive cepstral coefficients, and phonemes.
  • the above-mentioned target language type can be set by default by the second subtitle software.
  • the second subtitle software can set the corresponding target language type by default according to the region where the electronic device is located; automatically set up.
  • the electronic device may search for second reference data that matches the first reference data in historically stored reference data.
  • the second reference data refers to reference data corresponding to subtitle results stored in history.
  • the second reference data may include one or more of historically stored speech features, historically stored recognized texts, and historically stored translated texts.
  • the electronic device can find the second reference data that matches the first reference data, the electronic device can determine that the above-mentioned first media content is the identified media content.
  • the electronic device may determine that the first media content is unrecognized media content.
  • First subtitle results are displayed, including:
  • the first subtitle result corresponding to the second reference data is obtained from historically saved subtitle results for display.
  • the electronic device can find the second reference data that matches the first reference data, the electronic device can determine that the above-mentioned first media content is the identified media content.
  • the electronic device may acquire the first subtitle result corresponding to the second reference data from the subtitle results saved in history for display.
  • the second subtitle software can identify whether the first media content is the identified media content when the media parameters of the first media software cannot be obtained.
  • the electronic device can start from the first subtitle result corresponding to the first media content, and display the subtitle results stored in history in sequence, saving the time consumed by speech recognition and improving the The speed of subtitle display reduces the delay of subtitle display, which can effectively improve user experience.
  • the obtaining and displaying the first subtitle result corresponding to the second reference data from the subtitle results saved in history includes:
  • the sentence segmentation result is used to indicate that the currently played sentence of the first voice signal has been collected, or indicates that the currently played sentence of the first voice signal has not been collected;
  • the sentence segmentation result starting from the first subtitle result, each time a sentence of the first voice signal is read, the next historically saved subtitle result is displayed.
  • the electronic device may sequentially display historically stored subtitle results starting from the first subtitle result corresponding to the first media content.
  • the electronic device displays the historically stored subtitle results, in order to ensure that the speed at which the second subtitle software displays the historically stored subtitle results is consistent with the speed at which the first media software plays media content, the electronic device can select an appropriate speed adjustment method according to actual needs Controls the display speed of historically saved subtitle results.
  • the electronic device may acquire a sentence segmentation result of the first voice signal.
  • the sentence segmentation result above can be used to indicate whether the currently played sentence of the first voice signal has been collected or not.
  • the electronic device may temporarily not display the subtitle result of the next sentence stored in history.
  • the electronic device may display the historically stored subtitle result of the next sentence.
  • the second subtitle software will display a subtitle result saved in history, so that the electronic device can display the subtitle result saved in history at a speed comparable to that played by the first media software The speed of the media content remains consistent.
  • the obtaining and displaying the first subtitle result corresponding to the second reference data from the subtitle results saved in history includes:
  • the electronic device may also adjust the speed at which the second subtitle software displays the subtitle result by comparing with reference data.
  • the electronic device When the electronic device displays the subtitle results stored in history, it can continue to identify the first reference data corresponding to the first voice signal, and compare the first reference data with the reference data of the whole sentence subtitle result being displayed.
  • the electronic device can display the next subtitle result saved in history.
  • the second subtitle software will display a subtitle result stored in history, so that the electronic device can display the subtitle result stored in history at a speed comparable to that played by the first media software The speed of the media content remains consistent.
  • the obtaining and displaying the first subtitle result corresponding to the second reference data from the subtitle results saved in history includes:
  • each historically stored subtitle result is sequentially displayed according to the actual display duration corresponding to each historically stored subtitle result.
  • the electronic device when storing the subtitle result, may also store the listening duration and display duration corresponding to the subtitle result.
  • the electronic device may also acquire the first listening duration corresponding to the first subtitle result.
  • the electronic device may determine the second listening duration corresponding to the first subtitle result according to the first voice signal.
  • the electronic device may determine the speed regulation parameter according to the first listening duration and the second listening duration.
  • the first listening duration is longer than the second listening duration, it indicates that the display speed of the historically saved subtitle results should be increased.
  • the first listening duration is shorter than the second listening duration, it indicates that the display speed of the historically saved subtitle results should be reduced.
  • the first listening duration is equal to the second listening duration, it means that there is no need to adjust the display speed of the historically saved subtitle results.
  • the electronic device can adjust the display duration of each history save according to the speed regulation parameter, and obtain the actual display duration corresponding to each history save subtitle result.
  • the electronic device can start from the first subtitle result, and according to the actual display duration corresponding to each historically saved subtitle result, sequentially display each historically saved subtitle result, thereby ensuring that the speed of the second subtitle software displaying the subtitle result is the same as that of the first media software The speed at which media content is played remains consistent.
  • a media playback interface is displayed, and the media playback interface is used to play media files;
  • a subtitle display frame is displayed, and the subtitle display frame is used to display the subtitle result recognized by the second subtitle software;
  • the subtitle display frame is displayed in a stacked manner with the media playback interface, and the subtitle display frame is located on an upper layer of the media playback interface.
  • the electronic device may display a media playing interface on the display screen, and the media playing interface may be used to play media files.
  • the electronic device After the electronic device starts the second subtitle software, it can display a subtitle display frame on the display screen, and the subtitle display frame is used to display the subtitle result recognized by the second subtitle software.
  • the subtitle result can include historically saved subtitle results and the first Two subtitle results.
  • the above-mentioned subtitle display frame and the above-mentioned media playback interface can be stacked and displayed, and the above-mentioned subtitle display frame can be located on the upper layer of the above-mentioned media playback interface, so as to prevent the subtitle result displayed by the subtitle display frame from being blocked by the media playback interface, so that the user can view the subtitles completely The subtitle result displayed in the display box.
  • the width of the subtitle display frame is the first width
  • the width of the subtitle display frame is the second width; the first width is greater than or equal to the second width.
  • the display direction of the subtitle display frame can follow the display direction of the media playback interface, ensuring that the user can view the media playback interface and the subtitle display frame in the same direction.
  • the electronic device can correspondingly adjust the display direction and width of the subtitle display frame.
  • the electronic device can adapt the width of the subtitle display frame to the long side of the display screen, and adjust the width of the subtitle display frame to the first width.
  • the electronic device can adapt the width of the subtitle display frame to the short side of the display screen, and adjust the width of the subtitle display frame to the second width.
  • the first width is greater than the second width.
  • the electronic device can increase the width of the subtitle display frame, so that more text can be displayed in a line of display area in the subtitle display frame.
  • the electronic device can reduce the width of the subtitle display frame to prevent the subtitle display frame from exceeding the display area of the display screen.
  • the embodiment of the present application provides a subtitle control device, including:
  • the first software module is used to start the first media software, and the first media software is used to play the media file;
  • the second software module is used to start the second subtitle software, and the second subtitle software is used to recognize the voice signal of the media content played by the first media software as corresponding when the first media software plays the media file. and display the subtitle results;
  • a media playing module configured to play a first media file through the first media software
  • a content detection module configured to detect the first media content currently played by the first media software through the second subtitle software
  • the first subtitle module is configured to acquire a first subtitle result corresponding to the first media content from historically stored subtitle results for display if it is determined that the first media content is identified media content.
  • the device further includes:
  • the second subtitle module is configured to recognize and display the first voice signal of the first media content as a second subtitle result if it is determined that the first media content is unrecognized media content.
  • the content detection module includes:
  • a media parameter submodule configured to acquire a media parameter corresponding to the first media content, where the media parameter includes a timestamp corresponding to the first media content;
  • the first search submodule is used to search for the first subtitle result corresponding to the media parameter from the subtitle results saved in history.
  • the first subtitle module includes:
  • the first display submodule is configured to display the first subtitle result if the first subtitle result corresponding to the media parameter is found.
  • the content detection module includes:
  • the reference data sub-module is configured to identify the first voice signal of the first media content to obtain first reference data;
  • the first reference data includes the voice features of the first voice signal and the first voice signal of the first media content
  • One or more of the recognized text obtained by performing speech recognition on the speech signal and the translated text of the target language type corresponding to the recognized text;
  • the second search submodule is used to search for the second reference data matching the first reference data from the reference data saved in history.
  • the first subtitle module includes:
  • the second display submodule is configured to obtain the first subtitle result corresponding to the second reference data from historically saved subtitle results and display it if the second reference data matching the first reference data is found.
  • the second display submodule includes:
  • the sentence segmentation result submodule is used to obtain the sentence segmentation result of the first voice signal, and the sentence segmentation result is used to indicate that the currently played sentence of the first voice signal has been collected, or indicate that the currently played sentence of the first voice signal has not yet been collected. Finish;
  • the sentence segmentation display sub-module is used for displaying the next historically saved subtitle result every time a sentence of the first voice signal is finished starting from the first subtitle result according to the sentence segmentation result.
  • the second display submodule includes:
  • Reference comparison sub-module for starting from the first subtitle result, comparing the first reference data with the reference data of the whole sentence subtitle result being displayed;
  • the reference display submodule is used to display the next subtitle result saved in history when the first reference data is consistent with the reference data of the whole sentence subtitle result being displayed.
  • the second display submodule includes:
  • the first listening sub-module is used to obtain the first listening duration corresponding to the first subtitle result from the historically saved listening duration;
  • the second sound collection sub-module is used to determine the second sound collection duration corresponding to the first subtitle result according to the first voice signal
  • the speed regulation parameter sub-module is used to determine the speed regulation parameter according to the first sound collection duration and the second sound reception duration;
  • the actual display sub-module is used to adjust the display duration of each historical preservation according to the speed regulation parameter, and obtain the actual display duration corresponding to the subtitle results of each historical preservation;
  • the speed control display sub-module is used to display the subtitle results stored in history sequentially according to the actual display duration corresponding to the subtitle results stored in history starting from the first subtitle result.
  • the first software module is further configured to display a media playback interface after the first media software is started, and the media playback interface is used to play media files;
  • the first software module is further configured to display a subtitle display frame after the second subtitle software is started, and the subtitle display frame is used to display subtitle results identified by the second subtitle software;
  • the subtitle display frame is displayed in a stacked manner with the media playback interface, and the subtitle display frame is located on an upper layer of the media playback interface.
  • the width of the subtitle display frame is the first width
  • the width of the subtitle display frame is the second width; the first width is greater than or equal to the second width.
  • an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, the electronic device is configured to execute the The computer program implements the method described in any one of the first aspect and the possible implementation manners of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium configured to store a computer program, wherein when the computer program is executed by a processor, the first The method described in any one of the aspects and possible implementations of the first aspect.
  • an embodiment of the present application provides a computer program product configured to, when running on an electronic device, make the electronic device execute any one of the first aspect and the possible implementation manners of the first aspect. the method described.
  • an embodiment of the present application provides a chip system, the chip system includes a memory and a processor, the processor is configured to execute the computer program stored in the memory, so as to implement the first aspect and the second In one aspect, the method described in any one of the possible implementation manners.
  • the electronic device after starting the first media software and the second subtitle software, the electronic device can play the first media file through the first media software, and detect the first media file through the second subtitle software The first media content currently played.
  • the electronic device can directly obtain the first subtitle result corresponding to the first media content from the subtitle results stored in history for display, reducing the time spent on subtitle identification. It consumes less time, reduces the delay of subtitle display, and has strong ease of use and practicality.
  • FIG. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a subtitle control method provided in an embodiment of the present application
  • FIG. 3 is a schematic diagram of a scene provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 10 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 11 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 12 is a schematic diagram of a subtitle file provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 14 is a schematic flowchart of another subtitle control method provided in the embodiment of the present application.
  • FIG. 15 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 16 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 17 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 18 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 19 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 20 is a schematic diagram of another subtitle file provided by the embodiment of the present application.
  • FIG. 21 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 22 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 23 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • FIG. 24 is a schematic diagram of another scenario provided by the embodiment of the present application.
  • the term “if” may be construed, depending on the context, as “when” or “once” or “in response to determining” or “in response to detecting “. Similarly, the phrases “if determined” or “if [the described condition or event] is detected” may be construed depending on the context to mean
  • references to "one embodiment” or “some embodiments” or the like in the specification of the present application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • Speech recognition technology refers to the technology of recognizing speech signals and converting speech signals into texts in corresponding languages.
  • An important application scenario of speech recognition technology is to recognize the audio/video being played in real time, convert the speech signal in the audio/video being played into subtitles in the corresponding language, and display the subtitles to the user.
  • subtitle generation software For example, for some English videos without subtitles, when users watch the video, they can use subtitle generation software to recognize the voice signal in the video in real time, and translate the voice signal into Chinese text, which is convenient for users to watch.
  • the electronic device usually only recognizes the voice signal acquired in real time, and does not care about the playback state of the audio/video.
  • the electronic device still only recognizes the voice signal obtained in real time, and cannot use the subtitles recognized in history.
  • the embodiment of the present application provides a subtitle control method.
  • the electronic device plays the identified media content
  • the electronic device can directly display the historical subtitle results corresponding to the media content, reduce the delay of subtitle display, and improve user experience. Excellent viewing experience, with strong ease of use and practicality.
  • the subtitle control method provided by the embodiment of the present application can be applied to an electronic device, and the electronic device can be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, or an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device , notebook computer, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), personal digital assistant (personal digital assistant, PDA), netbook and other electronic devices with display screens, the embodiment of this application does not make any specific types of electronic devices limit.
  • FIG. 1 exemplarily shows a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, camera 180, display screen 181, and subscriber identification module (subscriber identification module, SIM) card interface 182 etc.
  • a processor 110 an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, camera 180, display screen 181, and subscriber identification module (subscriber identification module, SIM) card
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I1C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I1S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I1C integrated circuit
  • I1S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the interface connection relationship between the modules shown in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is configured to receive a charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 can receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 is charging the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the display screen 181 , the camera 180 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be disposed in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be set in the same device.
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves and radiate them through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs sound signals through audio equipment (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 181 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent from the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite, etc. applied on the electronic device 100.
  • System global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • near field communication technology near field communication, NFC
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • code division multiple access code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • TD-SCDMA time-division code division multiple access
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 181 , and an application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 181 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 181 is used to display images, videos and the like.
  • the display screen 181 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), or an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix) organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED) and other materials.
  • the electronic device 100 may include 1 or N display screens 181 , where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG1, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other media files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the data storage area can store data (such as audio data, subtitle files, etc.) created during the use of the electronic device 100 .
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Electronic device 100 can listen to music through speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the receiver 170B can be placed close to the human ear to receive the voice.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In some other embodiments, the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc.
  • the earphone interface 170D is used for connecting wired earphones.
  • the earphone interface 170D can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the SIM card interface 182 is used for connecting a SIM card.
  • the SIM card can be connected and separated from the electronic device 100 by inserting it into the SIM card interface 182 or pulling it out from the SIM card interface 182 .
  • the electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • Step S101 Search for historical subtitle results corresponding to the media content being played, if not found, execute step S102, and if found, execute step S103.
  • the electronic device may be installed with media player software and subtitle software.
  • the media player software can be used to play various media files, and the media files can include audio files, video files and other files with voice signals.
  • the above-mentioned media player software can be understood as the above-mentioned first media software.
  • the above-mentioned media player software may be a system application program of the electronic device, such as a video application program that comes with the system of the electronic device; or, the above-mentioned media player software may also be an application program specially developed by a third-party manufacturer for playing media files, such as Tencent Video, iQiyi, etc.; or, the above-mentioned media player software can also be a comprehensive application program with a media player function, such as WeChat.
  • Captioning software can be used to manage the process of identifying captioning results and presenting captioning results.
  • the above-mentioned subtitle software can be understood as the above-mentioned first subtitle software.
  • the above-mentioned subtitle software may be a system service of the electronic device; or, the above-mentioned subtitle software may also be a system application program of the electronic device; or, the above-mentioned subtitle software may also be an application program specially developed by a third-party manufacturer for managing and displaying subtitles; Alternatively, the above-mentioned subtitle software may also be a comprehensive application program with a subtitle recognition function.
  • the electronic device may be configured with multiple trigger modes for starting the subtitle software.
  • the electronic device can start the subtitle software in response to the user's click operation on the icon of the subtitle software;
  • the voice command of the software for example, the user can say "Xiaoyi Xiaoyi, turn on the AI subtitle" to the electronic device, and the voice assistant of the electronic device can recognize the user's voice command and start the subtitle software; in other embodiments, the electronic device can also In response to other triggers, start the subtitle software.
  • the embodiment of the present application does not limit the triggering manner for the electronic device to start the subtitle software.
  • the above-mentioned media player software and the above-mentioned subtitle software can communicate with each other, and can obtain parameters of each other.
  • the above-mentioned subtitle software can also acquire parameters of the media player software in one direction.
  • the subtitle software and the media player software are enabled on the electronic device, if the media player software plays a media file with a voice signal (i.e. the above-mentioned first media file), the subtitle software can periodically or aperiodically obtain the information of the media player software. media parameters.
  • the above media parameters may include any one or more of parameters such as the file identifier of the media file being played, the time stamp corresponding to the media content being played (that is, the first media content) and the playing speed.
  • the above file identifiers are used to distinguish different media files.
  • the above-mentioned file identification may be represented by any one or a combination of multiple presentation elements such as numbers, characters, and punctuation marks.
  • the electronic device may use the number "124234421" as the file identifier of a certain media file; in other examples, the electronic device may use the characters "test file” as the file identifier of a certain media file; In some examples, the electronic device may use "a-12" as the file identifier of a certain media file; in other examples, the electronic device may also represent the file identifier of the media file through other presentation elements and combinations.
  • the embodiments of the present application do not limit the specific expression forms of the above-mentioned file identifiers.
  • the aforementioned media content refers to the content displayed by the media player software when the media player software plays the aforementioned media files.
  • the above-mentioned media file refers to the video file
  • the above-mentioned media content being played refers to the video picture being played by the media player software and the corresponding voice signal of the video picture
  • the above media file refers to the audio file
  • the above playing media content refers to the voice signal being played by the media playing software.
  • the subtitle software periodically acquires the above-mentioned media parameters according to a preset collection period
  • the above-mentioned preset collection period can be set according to actual needs.
  • the preset acquisition period can be set to 0.1 second, and the subtitle software can query the media parameters of the media player software 10 times within 1 second; in other embodiments, the preset acquisition period can be set 0.2 seconds, the subtitle software can query the media parameters of the media player software 5 times in 1 second; in other embodiments, the preset acquisition period can be set to 1 second, and the subtitle software can query the media player software once per second media parameters; in other embodiments, the preset collection period may also be set to other values, and the embodiment of the present application does not limit the specific value of the preset collection period.
  • the subtitle software may search the memory for historical subtitle results corresponding to the above-mentioned media parameters.
  • the subtitle software can search for corresponding historical subtitles according to the file identifier in the above-mentioned media parameters and the time stamp of the media content being played. result.
  • the subtitle software can search for the corresponding historical subtitle result according to the time stamp corresponding to the media content being played in the above media parameters.
  • the aforementioned storage may include any one or more of storages provided inside the electronic device, external storages connected to the electronic device, cloud storages and other storages.
  • the aforementioned historical subtitle results refer to subtitle results previously stored by the electronic device.
  • the subtitle software cannot find the historical subtitle result (that is, the first subtitle result) corresponding to the above-mentioned media parameters in the memory, it means that the media content being played is unrecognized media content, and the subtitle software can execute step S102.
  • the subtitle software can find the historical subtitle results corresponding to the above media parameters in the memory, it means that the media content being played is recognized media content, and the subtitle software can execute step S103.
  • Step S102 Recognize the voice signal of the media content being played, obtain and display the real-time subtitle result.
  • the subtitle software can display the subtitle result recognized in real time (hereinafter referred to as the real-time subtitle result, that is, the above-mentioned second subtitle result).
  • the subtitle software can acquire the voice signal being played (that is, the first voice signal) in real time.
  • the manner in which the subtitle software acquires the voice signal can be set according to actual requirements.
  • the subtitle software can directly obtain the above-mentioned voice signal from the above-mentioned media player software; in other embodiments, the subtitle software can also obtain the voice signal played by the speaker in real time through the speaker interface; in another In some embodiments, the subtitle software can also record the voice signal played by the loudspeaker in real time through the microphone; in some other embodiments, the subtitle software can also obtain the above-mentioned voice signal by other means.
  • the method is not limited.
  • the subtitle software may perform preprocessing on the above-mentioned voice signal to obtain the voice features corresponding to the above-mentioned voice signal.
  • the subtitle software can preprocess the above-mentioned voice signal through a frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC) algorithm to obtain the corresponding MFCC feature of the above-mentioned voice signal; in other embodiments, the subtitle The software can preprocess the above-mentioned voice signal through the linear prediction cepstral coefficient (Linear Prediction Cepstral Coefficients, LPCC) algorithm, and obtain the LPCC feature corresponding to the above-mentioned voice signal; in some other embodiments, the subtitle software can also perform the above-mentioned voice signal phoneme processing to obtain phonemes corresponding to the speech signal.
  • MFCC Frequency Cepstrum Coefficient
  • the above speech features may include any one or more of MFCC features, LPCC features, phonemes and other types of speech features.
  • the subtitle software can recognize the above-mentioned speech features through the Automatic Speech Recognition (ASR) model, and obtain the recognized text corresponding to the above-mentioned speech features.
  • ASR Automatic Speech Recognition
  • the type of the above ASR model can be set according to actual requirements.
  • the above ASR model may include Gaussian Mixed Model/Hidden Markov Model (GMM/HMM), Connectionist Temporal Classification (CTC) model, Transformer (Transducer) model, attention (Attention) model and other models in any one or more.
  • GMM/HMM Gaussian Mixed Model/Hidden Markov Model
  • CTC Connectionist Temporal Classification
  • Transformer Transformer
  • Attention Attention
  • the subtitle software may determine the recognized text as a real-time subtitle result.
  • the subtitle software may translate the identified text to obtain the translated text of the target language type, and determine the translated text as the real-time subtitle result.
  • the above-mentioned target language type can be set by default by the subtitle software.
  • the subtitle software can set the corresponding target language type by default according to the region where the electronic device is located; or, the above-mentioned target language type can also be actively set by the user on the subtitle software.
  • the subtitle software can directly use the English text "Good morning, sir" output by the ASR model as the real-time subtitle result.
  • the subtitle software can translate the English text "Good morning, sir” output by the ASR model into the Chinese text "Sir, good morning”, Take the Chinese text "Mr. Good Morning” as a live caption result.
  • the subtitle software can perform feature extraction on the above-mentioned recognized text to obtain text features corresponding to the above-mentioned recognized text.
  • the subtitle software can perform vectorization processing on the above-mentioned recognized text, and convert the above-mentioned recognized text into word vectors; in other embodiments, the subtitle software can also perform feature extraction on the above-mentioned recognized text in other ways , to get the corresponding type of text features.
  • the embodiment of the present application does not limit the feature types of the aforementioned text features and the specific manner of extracting the aforementioned features.
  • the subtitle software can process the above text features through the text translation model to obtain the translated text of the target language type, and determine the translated text as the real-time subtitle result.
  • the types of the above text translation models can be set according to actual needs.
  • the above-mentioned text translation model can include any one of models such as multi-task deep neural network (Multi-Task Learning in Deep Neural Networks, MT-DNN) model, sequence-to-sequence (sequence-to-sequence, seq2seq) model or Various.
  • MT-DNN multi-task deep neural network
  • sequence-to-sequence sequence-to-sequence-to-sequence, seq2seq
  • the embodiment of the present application does not limit the specific type of the above-mentioned text translation model.
  • the subtitle software can gradually display corresponding real-time subtitle results.
  • the subtitle software can display the real-time subtitle result "how”; as shown in the (b) scene in Figure 4 As shown, when the voice signal obtained by the subtitle software is "How are you?", the subtitle software can display the real-time subtitle result "How are you?"; as shown in (c) scene in Figure 4, when the subtitle software obtains When the next speech signal "I'm” is reached, the subtitle software can display the next real-time subtitle result "I”; as shown in the scene (d) in Figure 4, when the subtitle software obtains the speech signal "I'm fine” , the subtitle software can display the real-time subtitle result "I'm fine”.
  • the subtitle software may also save the above-mentioned real-time subtitle result in a memory, and establish an association relationship between the above-mentioned real-time subtitle result and the above-mentioned media parameter.
  • the subtitle software can display real-time subtitle results to meet the user's basic needs for viewing subtitles, and facilitate the user to understand the meaning expressed by the above media content.
  • the subtitle software may also save the above-mentioned real-time subtitle result, and establish an association relationship between the real-time subtitle result and the above-mentioned media parameter, so as to prepare for subsequent call requirements.
  • Step S103 showing historical subtitle results corresponding to the media content being played.
  • the subtitle software can directly display the historical subtitle results corresponding to the media content.
  • the subtitle software can find the corresponding subtitle file according to the file identifier of the video being played, and find the corresponding subtitle file in the subtitle file.
  • the historical subtitle result "How are you?" corresponding to the timestamp "13:01".
  • the subtitle software can directly call and display the historical subtitle results corresponding to the media content, saving the time consumed by real-time recognition of voice signals, reducing Delay in subtitle display.
  • the subtitle software When the subtitle software displays the historical subtitle results, it can directly display the complete historical subtitle results corresponding to the sentence voice signal before obtaining the entire sentence voice signal, which greatly improves the user's viewing experience.
  • subtitle software displays historical subtitle results, it can choose an appropriate speed adjustment method according to actual needs.
  • the subtitle software can obtain the time stamp of the media player software in real time, and display corresponding historical subtitle results in real time according to the time stamp of the media player software.
  • the subtitle software can acquire the time stamp corresponding to the media content being played in real time.
  • the subtitle software displays the historical subtitle results corresponding to the time stamp "00:01”; when the time stamp of the media content being played is "00:02", The subtitle software displays the historical subtitle results corresponding to the time stamp "00:02”; when the time stamp of the media content being played is "01:00”, the subtitle software displays the historical subtitle results corresponding to the time stamp "01:00”.
  • the playback speed of the historical subtitle result can always follow the playback speed of the media player software.
  • the playback speed of the media player software is accelerated, the change speed of the above timestamp is accelerated, and the playback speed of the historical subtitle results is also accelerated; when the playback speed of the media player software is slowed down, the change speed of the above timestamp is slowed down, and the history The playback speed of subtitle results will also be slowed down accordingly.
  • the subtitle software can also obtain the playback speed of the media player software, and start from the historical subtitle result corresponding to the timestamp of the media content currently being played, and display subsequent historical subtitles in sequence according to the playback speed of the media player software result.
  • the subtitle software can adjust the playback speed of historical subtitle results according to the changed playback speed of the media player software.
  • the time stamp of the media content being played is "03:00", and the playing speed is 1x.
  • the subtitle software After the subtitle software obtains the timestamp "03:00" and the playback speed "1x speed” corresponding to the media content being played, it can start from the historical subtitles corresponding to the timestamp "03:00" and play at the "1x speed" Subsequent historical subtitle results are displayed sequentially.
  • the subtitle software can display the next historical subtitle result after 2 seconds.
  • the media playback software can transmit the new playback speed to the subtitle software.
  • the subtitle software can adjust the playback speed of the historical subtitle results to the new playback speed.
  • the subtitle software is displaying the historical subtitle results corresponding to the time stamp "04:56"
  • the subtitle software acquires that the playback speed of the media player software is adjusted to "0.5 times speed”
  • the time stamp corresponding to the next historical subtitle result is "04:58" .
  • the subtitle software can adjust the playback speed of the historical subtitle result to "0.5 times speed", and display the next historical subtitle result after 4 seconds.
  • the subtitle software can acquire the playback speed of the media player software, and display historical subtitle results according to the playback speed of the media player software.
  • the playback speed of the media player software is accelerated, the playback speed of the historical subtitle results is also accelerated; when the playback speed of the media playback software is slowed down, the playback speed of the historical subtitle results is also slowed down. Therefore, the historical subtitle results displayed by the subtitle software can match the media content being played by the media player software, so as to avoid the disconnection between the historical subtitle results and the media content.
  • the subtitle software may also control the playback speed of historical subtitle results in other ways.
  • the embodiment of the present application does not limit the specific speed adjustment mode of the subtitle software.
  • step S102 or step S103 the electronic device may also execute step S104.
  • Step S104 when the playback progress of the media player software rolls back, find and display the historical subtitle results corresponding to the rolled back media content.
  • the subtitle software can search for historical subtitle results corresponding to the media content after rollback according to the media parameters of the media player software when the media player software rolls back .
  • the subtitle software can find and display the historical subtitle results corresponding to the rolled back media content.
  • the user in the process of viewing the subtitle results, may want to review the subtitle results displayed some time ago, and perform a rollback operation on the subtitle software.
  • the subtitle software may return to the previously displayed subtitle result in response to the user's return operation.
  • the form of the above rollback operation may be determined according to actual scenarios.
  • the above-mentioned rollback operation may be a user's downward sliding operation on the subtitle display frame.
  • the subtitle software detects that the user slides down the subtitle display frame, as shown in the scene (b) in Figure 7, the subtitle software can control the subtitle display frame to fall back to the previously displayed subtitle result "Good morning, Tom ".
  • the above-mentioned rollback operation may be a user's dragging operation on the progress bar of the subtitle display frame.
  • the real-time subtitle result displayed in the subtitle display frame is "I'm so happy today"-"We should go home”-"You're right Right” - "Let's go”.
  • the subtitle software can control the subtitle display frame to roll back to the previously displayed subtitle result "Good morning” , Tom"-"Good morning, Jack”-"The weather is so nice today"-"Let's hang out together", prompting the user to go back for 10 seconds.
  • the rollback operation may also be performed in other forms of operations.
  • the embodiment of the present application does not limit the specific form of the rollback operation.
  • the subtitle software can control the subtitle display box to maintain the current subtitle display interface; or, the subtitle software can also control the subtitle display box to display subsequent subtitle results in sequence at a preset scrolling speed until returning to the latest subtitle results.
  • the aforementioned preset scrolling speed can be set to a specific value according to actual needs, or can also be set to a specific double speed.
  • the above-mentioned preset scrolling speed can be set to 1 line/second; in other embodiments, the above-mentioned preset scrolling speed can be set to 2 lines/second;
  • the scrolling speed can be set to 5 lines/second; in other embodiments, the preset speed can also be set to 1.5 times the speed; in other embodiments, the preset speed can also be set to 2 times the speed;
  • the aforementioned preset scrolling speed may also be set to other values or multiple speeds, and the embodiment of the present application does not limit the specific setting method of the preset scrolling speed.
  • the aforementioned preset scrolling speed may be preset by the staff of the manufacturer, or the aforementioned preset scrolling speed may also be actively set by the user.
  • the embodiment of the present application does not limit the source of the preset scrolling speed.
  • the media player software when the media player software and the subtitle software can communicate with each other, the media player software can follow the operation of the subtitle software and roll back the playback progress of the media file.
  • the subtitle software responds to the user's operation and displays the subtitle result corresponding to the time stamp "10:54".
  • the subtitle software may send a rollback notification to the media player software, and the rollback notification may include the timestamp "10:54" corresponding to the subtitle result being displayed by the subtitle software.
  • the media player software may play back the media content corresponding to the timestamp "10:54" according to the timestamp "10:54".
  • the media player software may not follow the operation of the subtitle software, but continue to play the media file according to the current playback progress.
  • the media player software may not track the operation of the subtitle software and continue to play the media content after the timestamp "25:01" .
  • the subtitle software can stop identifying the subtitle results corresponding to the real-time played media content during the process of subtitle rollback, or , you can also continue to identify the subtitle result corresponding to the media content played in real time.
  • the subtitle software can display the subtitle results identified during the subtitle rollback to avoid faults in the subtitle results.
  • the subtitle software can continue to identify the media content played in real time, and get the subtitle result "I'm home” - "see you tomorrow".
  • the subtitle software may respond to the user's operation and display "You are right”-"Let's go”-"I'm home"-"See you tomorrow" in the subtitle display box.
  • the subtitle software can return to the previously displayed subtitle result in response to the user's return operation, which is convenient for the user to review the previously played media content.
  • the subtitle software can continue to identify the media content being played, and obtain the corresponding subtitle result.
  • the subtitle software can directly display the subtitle result recognized during the rollback to the user, so as to avoid dislocation of the subtitle result.
  • the electronic device is a tablet computer 1 on which media player software and subtitle software are installed.
  • the user can start the subtitle software installed on the tablet computer 1 .
  • a subtitle display frame 11 can be displayed on the display screen of the tablet computer 1.
  • the subtitle display frame 11 is used to display the subtitle results recognized by the subtitle software.
  • the subtitle results can include real-time subtitle results and historical subtitle results.
  • the media player software can display the software interface (i.e. the above-mentioned media player interface) of the media player software, play the video picture and the voice signal of the above-mentioned English video in the software interface of the media player software, and provide a progress bar 12, a progress bar 12 Can be used to control the playback progress of the above English video.
  • the software interface i.e. the above-mentioned media player interface
  • the media player software can display the software interface (i.e. the above-mentioned media player interface) of the media player software, play the video picture and the voice signal of the above-mentioned English video in the software interface of the media player software, and provide a progress bar 12, a progress bar 12 Can be used to control the playback progress of the above English video.
  • the subtitle display frame 11 can be stacked and displayed on the software interface of the media playing software, and the subtitle display frame 11 is located on the upper layer of the software interface of the media playing software.
  • the subtitle software can also adjust the shape of the subtitle display frame 11 according to the horizontal screen playback mode and the vertical screen playback mode of the media player software.
  • the subtitle software can periodically acquire media parameters from the media player software with a collection cycle of 0.5 seconds, and the media parameters include the file identifier of the above-mentioned English video and the time stamp of the real-time playback progress.
  • the subtitle software can search the memory of the tablet computer 1 for historical subtitle results corresponding to the above-mentioned file identifier and the above-mentioned time stamp.
  • the subtitle software can find the historical subtitle result corresponding to the above-mentioned file identifier and the above-mentioned time stamp in the memory of the tablet computer 1, then the subtitle software can display the historical subtitle result in the subtitle display frame 11.
  • the subtitle software can display the real-time subtitle result in the subtitle display frame 11.
  • the subtitle software cannot find the historical subtitle results corresponding to the above-mentioned file identifier and the above-mentioned time stamp in the memory, and the subtitle software can follow the voice signal played by the media player software, Continuously display the corresponding real-time subtitle results.
  • subtitle display frame 11 can display " good”
  • the voice signal acquired by the subtitle software is "Good morning, Tom”
  • the subtitle display frame 11 can display "Good morning, Tom”.
  • subtitle display frame 11 can display " good”
  • subtitle display frame 11 can display "Good morning, Jack”.
  • the subtitle software can also create a subtitle file A corresponding to the file identifier of the above-mentioned English video, and record the real-time subtitle result and the timestamp corresponding to the real-time subtitle result in the subtitle file A.
  • the subtitle software may record "00:00-00:01 Good morning, Tom” and "00:02-00:03 Good morning, Jack" in the subtitle file A.
  • the subtitle software can find the subtitle file A according to the file identification of the above-mentioned English video, and find the historical subtitle corresponding to the time stamp "00:00" in the subtitle file A File "Good morning, Tom".
  • the subtitle software can display the complete historical subtitle result "Good morning, Tom" in the subtitle display frame 11 .
  • the subtitle software can display the next historical subtitle result "Good morning, Jack” in the subtitle display frame 11.
  • the subtitle software can sequentially display the corresponding historical subtitle results according to the time stamp of the media player software.
  • the subtitle software can display the last historical subtitle result "see you tomorrow" in the subtitle file A.
  • the media file played by the media player software is embedded with the subtitle file, so the media player software can uniformly manage the played media content and the corresponding subtitle result.
  • the electronic device can detect in real time the media content being played by the media player software through the subtitle software independent of the media player software, and confirm whether the media content being played is recognized Media content, to determine whether to identify the media content being played in real time, or to display historical subtitle results corresponding to the media content being played.
  • the subtitle software can recognize the voice signal obtained in real time, obtain and display the real-time subtitle result, and meet the basic needs of users to view subtitles.
  • the subtitle software can directly display the historical subtitle results corresponding to the media content, saving the time consumed in real-time subtitle recognition and reducing the delay of subtitle display.
  • the subtitle software when the subtitle software displays the historical subtitle results, it can directly display the complete historical subtitle results corresponding to the sentence voice signal before the entire sentence voice signal is obtained, which greatly improves the user's viewing experience.
  • the subtitle software When the subtitle software detects the user's rollback operation, the subtitle software can flexibly display corresponding subtitle results in response to the rollback operation to meet the user's review needs.
  • the media playing software described in the above embodiments is not limited to a certain media playing software.
  • the above-mentioned media playing software may be one piece of media playing software, or may be multiple pieces of media playing software.
  • the above-mentioned subtitle software can recognize the video content played by the "Tencent Video” application program as a corresponding subtitle result and display it.
  • the above-mentioned subtitle software can recognize the video content played by the "Youtube” application program as the corresponding subtitle result and display it, and is not limited to identifying the video content played by the "Tencent Video” application program.
  • the steps in the above embodiments are not necessary in all the embodiments.
  • the subtitle control method implemented by the electronic device may have more or fewer steps than the subtitle control method described above.
  • the serial numbers of the steps in the above embodiments do not mean the order of execution, the execution order of each process should be determined by its function, internal logic and actual application scenarios, and should not constitute any obligation for the implementation process of the embodiment of the present application limited.
  • the subtitle software can implement the method described in the above steps S101 to S103 when the media player software plays the media file.
  • the subtitle software may implement the method described in step S102 when the media player software plays the media file.
  • the subtitle software implements the method described in step S104 again.
  • the subtitle software can obtain the media parameters of the media player software, and determine the media file being played and the playback progress according to the media parameters.
  • the subtitle software and the media player software may be independent modules, and the data may not communicate with each other. At this time, the subtitle software may not be able to obtain the media parameters of the media player software, and it is difficult to apply the above subtitle control method.
  • FIG. 14 exemplarily shows a flow chart of another subtitle control method provided by an embodiment of the present application.
  • another subtitle control method includes:
  • the electronic device may be installed with media player software and subtitle software.
  • the subtitle software can perform step S201.
  • Step S201 Identify the voice signal being played, display the real-time subtitle result corresponding to the voice signal, and acquire the first reference data corresponding to the real-time subtitle result.
  • the first reference data may include information related to the voice signal, and/or information related to the real-time subtitle result.
  • the subtitle software can acquire the voice signal of the media content being played during the process of the media player software playing the media file, identify the voice signal, and obtain the real-time subtitle result.
  • the method for the subtitle software to identify the real-time subtitle result can refer to the content described in step S102 in the previous embodiment, and will not be repeated here.
  • the subtitle software After the subtitle software recognizes the real-time subtitle result, it can display the real-time subtitle result and acquire the first reference data corresponding to the real-time subtitle result.
  • the content contained in the above-mentioned first reference data may be set according to actual requirements.
  • the above-mentioned first reference data may include speech features corresponding to the above-mentioned speech signal; in other embodiments, the above-mentioned first reference data may include recognized text obtained by performing speech recognition on the above-mentioned speech signal; In some other embodiments, the above-mentioned first reference data may include the translated text of the target language type corresponding to the above-mentioned recognition text; in some other embodiments, the above-mentioned first reference data may also include other content. The specific content included in the first reference data is not limited.
  • Step S202 Search for the second reference data matching the first reference data. If not found, execute step S203. If found, execute step S204.
  • the subtitle software may search the memory for second reference data matching the first reference data.
  • the second reference data is reference data corresponding to historical subtitle results.
  • the subtitle software cannot find the second reference data matching the first reference data in the memory, it means that the media content currently played by the media player software is unrecognized media content, and the subtitle software can execute step S203.
  • the subtitle software can find the second reference data matching the first reference data in the memory, it means that the media content currently played by the media player software is recognized media content, and the subtitle software can execute step S204.
  • Step S203 associating and storing the above-mentioned real-time subtitle result and the first reference data corresponding to the above-mentioned real-time subtitle result in a memory, and returning to step S201.
  • the subtitle software may associate and store the above-mentioned real-time subtitle result and the first reference data corresponding to the above-mentioned real-time subtitle result in the memory for future use Subsequent call requirements.
  • the subtitle software may return to step S201 to continue identifying and displaying real-time subtitle results.
  • Step S204 starting from the historical subtitle result corresponding to the above-mentioned second reference data, and sequentially displaying subsequent historical subtitle results.
  • the subtitle software may start from the historical subtitle results corresponding to the second reference data, and display subsequent historical subtitle results in sequence.
  • the electronic device is a mobile phone 2
  • a subtitle display frame 21 is set on the display interface of the mobile phone 2
  • a memory 22 is set inside the mobile phone 2 .
  • the media player software of mobile phone 2 is playing a video file, and the real-time subtitle result recognized by the subtitle software is "good morning, Tom"-"good morning, Jack"-"the weather is so nice today".
  • the subtitle software may use the real-time subtitle result as the first reference data, and search the memory 22 for historical subtitle results that match the real-time subtitle result.
  • the subtitle software can search the memory 22 for historical subtitle results matching the real-time subtitle results.
  • the subtitle software can determine that the media content being played by the media player software is the identified media content, and the subtitle software can start from the historical subtitle results “Good morning, Tom"-"Good morning, Jack”-”The weather is so nice today” , showing subsequent historical subtitle results “Let's hang out together”-"Sounds good”-"Let's go” in sequence.
  • the first reference data may be the first reference data corresponding to the latest real-time subtitle result, or the first reference data may be the most recently recognized real-time subtitle results Corresponding first reference data.
  • the above-mentioned first reference data is the first reference data corresponding to multiple real-time subtitle results identified recently, the possibility of wrong matching can be reduced, and the situation of subtitle errors can be reduced.
  • the subtitle software performs matching based on the real-time subtitle result, it may find multiple historical subtitle results that match the above-mentioned real-time subtitle result in the memory, resulting in a greater possibility that the subtitle software will match the wrong historical subtitle result.
  • the subtitle software performs matching according to the above three real-time subtitle results, the possibility of matching errors is greatly reduced, and the possibility of subtitle errors is greatly reduced.
  • the subtitle software can find the matching second reference data according to the first reference data, and start from the historical subtitle results corresponding to the second reference data, and display Subsequent historical subtitle results.
  • the subtitle software can directly display the corresponding historical subtitle results, saving the time consumed in real-time recognition of voice signals and reducing the delay of subtitle display.
  • the subtitle software when the subtitle software displays the historical subtitle results, it can directly display the complete historical subtitle results corresponding to the voice signal before the entire sentence voice signal is received, which is convenient for users to watch and greatly improves the user's viewing experience.
  • subtitle software When subtitle software displays historical subtitle results, it should ensure that the playback speed of historical subtitle results is consistent with that of the media player software, so as to avoid disconnection between the real-time display of historical subtitle results and the real-time playback of media content, affecting the user's viewing experience.
  • the subtitle software cannot obtain the media parameters of the media player software, so the subtitle software cannot directly obtain the playback speed of the media player software.
  • the subtitle software can select a suitable speed regulation mode, identify the playback speed of the media player software and adjust the playback speed of the historical subtitle results, so that the playback speed of the historical subtitle results is consistent with the playback speed of the media player software.
  • the subtitle software can adjust the playback speed of the historical subtitle results according to the sentence segmentation results of the voice signal.
  • the subtitle software displays the historical subtitle results, it can continue to obtain the real-time broadcast voice signal, and recognize the real-time subtitle result according to the voice signal.
  • the subtitle software can not only convert the speech signal into recognized text through the ASR model, but also segment the above speech signal through the ASR model.
  • the sentence segmentation mode of the voice signal is different from the sentence segmentation mode of the historical subtitle result Consistently, a sentence of speech signal corresponds to a sentence of historical subtitle results.
  • the subtitle software can judge the current display based on the sentence segmentation results fed back by the ASR model. Whether the audio signal corresponding to the historical subtitle result has been received.
  • the subtitle software may temporarily not display the next historical subtitle result.
  • the subtitle software can display the next historical subtitle result.
  • the electronic device is a tablet computer 31
  • a subtitle display frame 32 is set on the display interface of the tablet computer 31 .
  • the tablet's media player software is playing a video file.
  • the media content being played by the media player software is the identified media content, and the subtitle software is displaying historical subtitle results in sequence.
  • the audio signal currently acquired by the subtitle software is "Today is the day”
  • the historical subtitle result displayed by the subtitle software in the subtitle display frame 32 is "The weather is so nice today”.
  • the subtitle software can identify the above-mentioned speech signal through the ASR model, and determine that the above-mentioned speech signal is an incomplete sentence.
  • the subtitle software can judge that the audio signal corresponding to the currently displayed historical subtitle result has not been received according to the sentence segmentation result fed back by the ASR model, and temporarily does not display the next historical subtitle result.
  • the subtitle software continues to obtain the voice signal played in real time, as shown in Figure 17, when the voice signal acquired by the subtitle software is "I am so nice today", the subtitle software can recognize the above voice signal through the ASR model and determine "today The weather is fine” is a complete speech signal.
  • the subtitle software can determine that the voice signal corresponding to the currently displayed historical subtitle result has been received according to the sentence segmentation result fed back by the ASR model.
  • the subtitle software can display the next historical subtitle result "Let's go climbing the mountain together”.
  • the subtitle software can recognize the playback progress of the voice signal through the sentence segmentation result of the ASR model.
  • the subtitle software will display a historical subtitle result.
  • the subtitle software can dynamically adjust the playback speed of the historical subtitle results to ensure that the playback speed of the historical subtitle results is consistent with that of the media player software, and avoid the disjoint between the historical subtitle results displayed in real time and the media content played in real time , to ensure the user's viewing experience.
  • the subtitle software may adjust the playback speed of historical subtitle results according to the comparison result between the first reference data and the second reference data.
  • the subtitle software may compare the first reference data corresponding to the real-time subtitle result with the second reference data of the historical subtitle result being displayed.
  • the subtitle software can temporarily not display the next historical subtitle result.
  • the subtitle software can display the next historical subtitle result.
  • the media player software of the electronic device is playing an audio file
  • the media content being played by the media player software is already recognized media content
  • the subtitle software is displaying historical subtitle results in sequence.
  • the real-time subtitle result recognized by the subtitle software is "sounding", and the historical subtitle result being displayed by the subtitle software is "sounds good”.
  • the first reference data corresponding to the real-time subtitle result is "tingqilai”
  • the second reference data corresponding to the historical subtitle result is "tingqilaibucuo”.
  • the subtitle software compares the first reference data "tingqilai” with the second reference data “tingqilaibucuo", and the two are inconsistent. Therefore, the subtitle display module can determine that the voice signal corresponding to the historical subtitle result being displayed has not been received yet, and does not display the next piece of history. Subtitle results.
  • the subtitle software recognizes the real-time subtitle as "sounds good”.
  • the first reference data corresponding to the real-time subtitle result is "tingqilaibucuo”.
  • the subtitle software compares the first reference data "tingqilaibucuo" with the second reference data “tingqilaibucuo”, and they are consistent, so the subtitle display module can determine that the voice signal corresponding to the historical subtitle results being displayed has been received.
  • the subtitle display module can display the next historical subtitle result.
  • the subtitle software can identify the playback progress of the voice signal based on the comparison result between the first reference data and the second reference data.
  • the subtitle software will display a historical subtitle result.
  • the subtitle software can dynamically adjust the playback speed of the historical subtitle results to ensure that the playback speed of the historical subtitle results is consistent with that of the media player software, and avoid the disjoint between the historical subtitle results displayed in real time and the media content played in real time , to ensure the user's viewing experience.
  • the memory also stores the historical listening duration and historical display duration corresponding to the historical subtitle results.
  • the historical listening duration refers to the time span from the time when the subtitle software starts to receive the voice signal corresponding to the historical subtitle result to the time when it finishes receiving the voice signal corresponding to the historical subtitle result in the historical time period.
  • the historical display duration refers to the time span from the time when the historical subtitle result is displayed to the time when the next historical subtitle result is displayed in the historical time period.
  • the subtitle software When the subtitle software displays the historical subtitle result, it can obtain the historical listening duration corresponding to the historical subtitle result, and obtain the real-time listening duration of the sentence corresponding to the historical subtitle result.
  • the above-mentioned real-time listening duration refers to the listening duration of the real-time subtitle result as opposed to the historical subtitle result.
  • the subtitle software can adjust the historical display duration of subsequent historical subtitle results according to the above-mentioned historical radio recording time and real-time radio recording time.
  • the real-time audio recording time is longer than the historical audio recording time, it means that the media player software has reduced the playback speed, and the subtitle software can increase the historical display time of subsequent historical subtitle results and reduce the playback speed of historical subtitle results.
  • the real-time audio recording duration is shorter than the historical audio recording duration, it means that the media player software has increased the playback speed, and the subtitle software can reduce the historical display duration of subsequent historical subtitle results and increase the playback speed of historical subtitle results.
  • the real-time audio recording duration is equal to the historical audio recording duration, it means that the media player software has not changed the playback speed, and the subtitle software may not adjust the historical display duration of subsequent historical subtitle results.
  • the subtitle software can divide the historical display duration of each subsequent historical subtitle result by the ratio of the above-mentioned historical audio recording duration to the above-mentioned real-time audio recording duration to obtain the actual display duration corresponding to each subsequent historical subtitle result.
  • the subtitle software can sequentially display each subsequent historical subtitle result according to the actual display duration corresponding to each historical subtitle result.
  • the media player software of the electronic device is playing an audio file
  • the media content being played by the media player software is the media content that has been identified
  • the subtitle software is sequentially displaying the historical subtitle results "Good morning”-"
  • the weather is so nice today "-"Let's go climbing together”
  • the historical subtitle result "Good morning” has a historical radio time of 0.1 seconds
  • the historical subtitle result "Today's weather is really nice” has a historical display duration of 0.18 seconds
  • the historical subtitle result "Let's go climbing together Bar” history display duration is 0.2 seconds.
  • the subtitle software can continuously obtain the real-time playback voice signal, and recognize the real-time subtitle result according to the voice signal.
  • the real-time audio reception time corresponding to the real-time subtitle result "good morning” is 0.05 seconds.
  • the subtitle software can sequentially display "The weather is nice today” and “Let's go hiking together” according to the actual display duration corresponding to "The weather is nice today” and the actual display duration corresponding to "Let's go hiking together”.
  • the subtitle software displays "Let's go hiking together” 0.1 seconds after displaying "The weather is nice today”; the subtitle software displays the next history of "Let's go climbing together” 0.09 seconds after displaying "Let's go climbing together” Subtitle results.
  • the subtitle software can determine the playback speed of the media player software according to the historical audio recording duration and real-time audio recording duration corresponding to the historical subtitle results being displayed.
  • the subtitle software can adjust the historical display duration corresponding to each subsequent historical subtitle result according to the playback speed of the media player software, so that the playback speed of each subsequent historical subtitle result is consistent with the playback speed of the media player software, avoiding historical subtitles displayed in real time
  • the result is disconnected from the media content played in real time, ensuring the user's viewing experience.
  • the subtitle software can continue to identify the real-time subtitle results and the first reference data corresponding to the real-time subtitle results, and combine the first reference data with the second reference data of the currently displayed historical subtitle results. data for comparison.
  • first reference data is consistent with the above-mentioned second reference data, it means that there is no error in the currently displayed historical subtitle results, and the subtitle software can continue to perform step S204 to continue to display subsequent historical subtitle results.
  • the subtitle software may stop executing step S204, stop displaying subsequent historical subtitle results, and return to step S201 to identify and display real-time subtitle results, so as to ensure the accuracy of subtitle results viewed by users.
  • the subtitle software compares the first reference data with the second reference data, it may be based on the first reference data corresponding to the latest real-time subtitle result, or it may be based on the recently identified multiple real-time subtitle results.
  • the subtitle result corresponds to the first reference data for comparison.
  • the subtitle software performs comparison according to the first reference data corresponding to multiple recently recognized real-time subtitle results, the possibility of misidentification can be reduced.
  • the subtitle software may misidentify the historical subtitle results being displayed as wrong historical subtitle results, stop displaying subsequent historical subtitle results, and Display real-time subtitle results.
  • the subtitle software compares the latest three real-time subtitle results, the subtitle software can continue to display the subsequent historical subtitle results, and continue to recognize the real-time subtitle results.
  • the subtitle software can determine that only the first historical subtitle result is inconsistent with the first real-time subtitle result, and the subsequent two historical subtitle results are consistent with the subsequent two real-time subtitle results.
  • the subtitle software can determine that the comparison result between the first historical subtitle result and the first real-time subtitle result is not credible, and the subtitle software can continue to display subsequent historical subtitle results.
  • the subtitle software can compare the second reference data corresponding to the historical subtitle result with the first reference data corresponding to the real-time subtitle result, and identify whether an error occurs in the historical subtitle result being displayed.
  • the comparison may be performed according to the first reference data corresponding to the latest real-time subtitle result, or may be compared according to the first reference data corresponding to multiple recently recognized real-time subtitle results.
  • the subtitle software compares multiple real-time subtitle results recently recognized, the possibility of misidentification can be reduced and the user's viewing experience can be improved.
  • the subtitle software may return to step S201 to identify and display the real-time subtitle result, so as to ensure the accuracy of the subtitle result viewed by the user.
  • the electronic device is a tablet computer 4 on which media player software and subtitle software are installed.
  • the user can start the subtitle software installed on the tablet computer 4 .
  • a subtitle display frame 41 can be displayed on the display screen of the tablet computer 4.
  • the subtitle display frame 41 is used to display the subtitle results recognized by the subtitle software.
  • the subtitle results can include real-time subtitle results and historical subtitle results.
  • the media player software can display the software interface of the media player software, play the video picture and voice signal of the above-mentioned English video in the software interface of the media player software, and provide a progress bar 42, and the progress bar 42 can be used to control the above-mentioned English video playback progress.
  • the subtitle display frame 41 is stacked with the software interface of the media player software, and the subtitle display frame 41 is located on the upper layer of the software interface of the media player software.
  • the subtitle software can also adjust the shape of the subtitle display frame 41 according to the horizontal screen playback mode and the vertical screen playback mode of the media player software.
  • the subtitle software can acquire the voice signal being played in real time, recognize the voice signal, obtain the real-time subtitle result and display it in the subtitle display frame 41 .
  • the subtitle software may also determine the English text corresponding to the real-time subtitle result as the first reference data, and associate and store the real-time subtitle result and the first reference data in the memory of the tablet computer 4 .
  • subtitle display frame 41 can display " good”
  • the voice signal acquired by the subtitle software is "Good morning, Tom”
  • the subtitle display frame 41 can display "Good morning, Tom”.
  • subtitle display frame 41 can display " good”
  • the subtitle display frame 41 can display "Good morning, Jack”.
  • the subtitle software can store the real-time subtitle result "Good morning, Tom” in association with the first reference data "Good morning, Tom", and associate the real-time subtitle result "Good morning, Jack” with the first reference data
  • the data "Good morning, Jack” is stored associatively.
  • the subtitle software can also combine the first reference data of the three most recently recognized real-time subtitle results with the second reference data in the memory during the process of identifying and displaying real-time subtitle results. Match the reference data.
  • the subtitle software can continue to display the real-time subtitle result.
  • the media content being played by the media player software is the identified media content.
  • the subtitle software may display subsequent historical subtitle results in sequence starting from the historical subtitle results corresponding to the second reference data.
  • the above-mentioned English video is an unrecognized video. Therefore, the subtitle software can never find the second reference data matching the first reference data in the memory, and the subtitle software can continuously display the real-time subtitle results.
  • the user drags the progress bar 42 from "01:45” to "00:00” to replay the English video, and the subtitle software continues to display the real-time subtitle result.
  • the subtitle software detects that the second reference data matching the above three pieces of first reference data is stored in the memory. Therefore, the subtitle software may start from the historical subtitle result corresponding to the above-mentioned second reference data, and sequentially display the subsequent historical subtitle results "Let's hang out together"-"Sounds good"-"Let's go".
  • the subtitle software may continue to identify real-time subtitle results, and compare the first reference data corresponding to the real-time subtitle results with the second reference data of the historical subtitle results being displayed.
  • the subtitle software can continue to display the next historical subtitle result.
  • the subtitle software when the subtitle software displays the historical subtitle result "Let's hang out together", the subtitle software can combine the English text recognized in real time with the second reference data "Let's go” corresponding to the historical subtitle result "Let's hang out together”. out and play" for comparison.
  • the subtitle software can display the next historical subtitle result "Let's go”.
  • the subtitle software can display the next historical subtitle result.
  • the user drags the progress bar 42 of the media player software again to return to playing the unrecognized video screen, and the subtitle software continues to display historical subtitle results and identify real-time subtitle results.
  • the subtitle software recognizes the real-time subtitle result, it finds that the English text recognized in real time is inconsistent with the second reference data of the historical subtitle result being displayed.
  • the subtitle software may stop displaying subsequent historical subtitle results and display real-time subtitle results.
  • the historical subtitle result being played by the subtitle software is "The scenery is beautiful”
  • the second reference data corresponding to the historical subtitle result is "The scenery is beautiful”.
  • the subtitle software can determine that the historical subtitle result "The scenery is so beautiful" being displayed is a wrong subtitle result. Therefore, as shown in FIG. 24 , the subtitle software may stop displaying subsequent historical subtitle results, and display the real-time subtitle result "It's time to go home".
  • the subtitle software may search the memory for second reference data matching the first reference data according to the first reference data corresponding to the real-time subtitle result.
  • the subtitle software can obtain the first reference data corresponding to the real-time subtitle result, and search the memory for the second reference data matching the first reference data.
  • the memory cannot find the second reference data matching the first reference data, it means that the media content being played by the media player software is unrecognized media content, and the subtitle software can continue to display the real-time subtitle result.
  • the memory can find the second reference data matching the first reference data, it means that the media content being played by the media player software is the identified media content, and the subtitle software can start from the historical subtitle results matched by the second reference data , to display subsequent historical subtitle results in sequence.
  • the media file played by the media player software is embedded with the subtitle file, so the media player software can uniformly manage the played media content and the corresponding subtitle result.
  • the electronic device can use the subtitle software independent of the media player software to recognize in real time the voice signal of the media content being played by the media player software, and confirm whether the media content being played is based on the recognition result.
  • the identified media content is determined to display the real-time subtitle result recognized in real time, or to display the historical subtitle result corresponding to the media content being played.
  • the subtitle software can continuously recognize the voice signal of the media content being played by the media player software in real time, and display the recognized real-time subtitle result.
  • the subtitle software can display the historical subtitle results corresponding to the media content being played, saving the time consumed in real-time identification of subtitle results and reducing the delay of subtitle display.
  • the subtitle software displays the historical subtitle results, it can directly display the complete historical subtitle results corresponding to the voice signal before the entire sentence voice signal is received, which is convenient for users to watch and greatly improves the user's viewing experience.
  • the subtitle software can recognize the playback speed of the media player software through sentence recognition, reference data comparison, radio duration comparison, etc., and adjust the playback speed of historical subtitle results accordingly, so that the playback speed of each historical subtitle result and The playback of the media playback software is consistent, avoiding the disjoint between the historical subtitle results displayed in real time and the media content played in real time, and ensuring the user's viewing experience.
  • the subtitle software can also compare the first reference data recognized in real time with the second reference data of the historical subtitle results being displayed. When the first reference data is inconsistent with the second reference data, the subtitle software can stop displaying subsequent historical subtitle results, and display real-time subtitle results, correct wrong subtitle results in time, and ensure user viewing experience.
  • the subtitle software may also support the subtitle playback function, and its implementation may refer to the method described in the previous embodiment, and details will not be repeated in this embodiment.
  • the disclosed device/electronic equipment and method can be implemented in other ways.
  • the device/electronic device embodiments described above are only illustrative.
  • the division of the above-mentioned modules or units is only a logical function division.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above-mentioned integrated modules/units are realized in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing related hardware through computer programs.
  • the above-mentioned computer programs can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized.
  • the above-mentioned computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunication signal
  • software distribution medium etc.
  • the content contained in the computer-readable storage medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction.
  • computer-readable Storage media excludes electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请适用于软件控制领域,提供了一种字幕控制方法、电子设备及计算机可读存储介质。在本申请提供的字幕控制方法中,电子设备在启动了第一媒体软件和第二字幕软件之后,如果第一媒体软件播放第一媒体文件,则电子设备可以通过第二字幕软件检测第一媒体软件播放的第一媒体内容。如果电子设备检测到第一媒体内容为被识别过的媒体内容,则电子设备可以直接从历史保存的字幕结果中,获取与第一媒体内容相对应的第一字幕结果进行显示。通过上述方法,电子设备可以在播放被识别过的媒体内容时,直接展示相应的第一字幕结果,减少字幕识别所消耗的时间,降低字幕显示的延迟,具有较强的易用性和实用性。

Description

字幕控制方法、电子设备及计算机可读存储介质
本申请要求于2021年11月30日提交国家知识产权局、申请号为202111447527.5、申请名称为“字幕控制方法、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及软件控制领域,尤其涉及一种字幕控制方法、电子设备及计算机可读存储介质。
背景技术
语音识别技术的一个重要应用场景是对正在播放的音频/视频进行实时识别,将正在播放的音频/视频中的语音信号转换成相应语言的字幕,并将字幕展示给用户。
在当前的字幕识别方案中,电子设备通常只对实时获取到的语音信号进行识别。这种方案虽然可以满足用户查看字幕的基本需求,但是,字幕显示的延迟较高,用户体验不佳。
发明内容
本申请实施例提供了一种字幕控制方法、电子设备及计算机可读存储介质,可以在一定程度上解决现有的字幕识别方案延迟高,用户体验不佳的问题。
第一方面,本申请实施例提供了一种字幕控制方法,包括:
启动第一媒体软件,所述第一媒体软件用于播放媒体文件;
启动第二字幕软件,所述第二字幕软件用于在所述第一媒体软件播放媒体文件时,将所述第一媒体软件播放的媒体内容的语音信号识别成相应的字幕结果并进行显示;
通过所述第一媒体软件播放第一媒体文件;
通过所述第二字幕软件检测所述第一媒体软件当前播放的第一媒体内容;
若确定所述第一媒体内容为被识别过的媒体内容,则从历史保存的字幕结果中获取与所述第一媒体内容相对应的第一字幕结果进行显示。
需要说明的是,在本实施例中,电子设备可以安装有第一媒体软件和第二字幕软件。
上述第一媒体软件可以用于播放各类媒体文件,该媒体文件可以包括音频文件、视频文件等带有语音信号的文件。上述第一媒体软件可以是电子设备的***服务,或者,也可以是应用程序。
上述第二字幕软件可以用于在第一媒体软件播放媒体文件时,将第一媒体软件播放的媒体内容的语音信号识别成相应的字幕结果并进行显示。上述第二字幕软件可以是电子设备的***服务,或者,也可以是应用程序。
其中,上述媒体内容是指第一媒体软件播放媒体文件时,第一媒体软件所展示的内容。例如,当第一媒体软件播放视频文件时,上述媒体文件是指该视频文件,上述媒体内容是指第一媒体软件播放的视频画面以及该视频画面对应的语音信号;当第一媒体软件播放音频文件时,上述媒体文件是指该音频文件,上述媒体内容是指第一媒体软件播放的语音信号。
当第一媒体软件播放第一媒体文件时,电子设备可以通过第二字幕软件对第一媒体软件当前播放的第一媒体内容进行检测。
如果电子设备确定上述第一媒体内容为被识别过的媒体内容,则电子设备可以从历史保存的字幕结果中,获取与第一媒体内容相对于的第一字幕结果进行展示。
也即是说,电子设备可以在将第一媒体内容的第一语音信号识别成字幕结果之前,就直接显示第一媒体内容对应的第一字幕结果,节约了识别第一语音信号所消耗的时间,降低了第二字幕软件显示字幕的时延。
此外,电子设备在显示第一字幕结果时,可以在整句第一语音信号收音完成之前,就显示整句第一语音信号对应的第一字幕结果,方便用户查看第一字幕结果,提高了用户的使用体验。
在第一方面的一种可能的实现方式中,在所述通过所述第二字幕软件检测所述第一媒体软件当前播放的第一媒体内容之后,还包括:
若确定所述第一媒体内容为未被识别过的媒体内容,则将所述第一媒体内容的第一语音信号识别成第二字幕结果并进行显示。
需要说明的是,如果电子设备检测到上述第一媒体内容为未被识别过的媒体内容,则电子设备可以将上述第一媒体内容的第一语音信号识别成第二字幕结果并进行展示。
此外,电子设备在识别到第二字幕结果之后,可以存储第二字幕结果,以便在有需要时可以直接调用。
在第一方面的一种可能的实现方式中,所述检测所述第一媒体软件当前播放的第一媒体内容,包括:
获取所述第一媒体内容对应的媒体参数,所述媒体参数包括所述第一媒体内容对应的时间戳;
从历史保存的字幕结果中,查找与所述媒体参数相对应的第一字幕结果。
需要说明的是,在一些实施例中,第二字幕软件可以获取到第一媒体软件的媒体参数。
在这些实施例中,当电子设备通过第二字幕软件检测第一媒体软件当前播放的第一媒体内容时,电子设备可以获取第一媒体内容对应的媒体参数。
上述媒体参数可以包括第一媒体内容对应的时间戳。
电子设备在获取到第一媒体内容对应的媒体参数之后,可以查找与该媒体参数对应的第一字幕结果。
如果电子设备查找到与上述媒体参数对应的第一字幕结果,则电子设备可以确定上述第一媒体内容为被识别过的媒体内容。
如果电子设备未查找到与上述媒体参数对应的第一字幕结果,则电子设备可以确定上述第一媒体内容为未被识别过的媒体内容。
在第一方面的一种可能的实现方式中,所述若确定所述第一媒体内容为被识别过的媒体内容,则从历史保存的字幕结果中获取与所述第一媒体内容相对应的第一字幕结果进行显示,包括:
若查找到与所述媒体参数相对应的第一字幕结果,则显示所述第一字幕结果。
需要说明的是,当电子设备查找到与媒体参数对应的第一字幕结果时,电子设备可 以确定第一媒体内容为被识别过的媒体内容,显示查找到的第一字幕结果。
通过上述方法,电子设备可以在播放被识别过的媒体内容时,根据第一媒体内容对应的媒体参数快速查找到相应的第一字幕结果,无需对第一媒体内容的语音信号进行识别,节约了语音识别的时间,提高了字幕显示的速度,降低了字幕显示的延迟,可以有效提高用户的使用体验。
在第一方面的一种可能的实现方式中,所述检测所述第一媒体软件当前播放的第一媒体内容,包括:
对所述第一媒体内容的第一语音信号进行识别,得到第一参考数据;所述第一参考数据包括所述第一语音信号的语音特征、对所述第一语音信号进行语音识别得到的识别文本和所述识别文本对应的目标语言类型的翻译文本中的一项或多项;
从历史保存的参考数据中,查找与所述第一参考数据匹配的第二参考数据。
需要说明的是,在另一些实施例中,第二字幕软件无法获取到第一媒体软件的媒体参数。
在这些实施例中,电子设备可以对第一媒体内容的第一语音信号进行识别,得到第一参考数据。
该第一参考数据可以包括第一语音信号的语音特征、对第一语音信号进行语音识别得到的识别文本和该识别文本对应的目标语言类型的翻译文本中的一项或多项。
其中,上述语音特征可以包括频率倒谱系数、线性预测倒谱系数、音素等特征中的一种或多种。
上述目标语言类型可以第二字幕软件默认设置的,例如,第二字幕软件可以根据电子设备所在的地区,默认设置相应的目标语言类型;或者,上述目标语言类型也可以是用户在第二字幕软件上主动设置的。
电子设备在获取到第一参考数据之后,可以在历史保存的参考数据中,查找与第一参考数据匹配的第二参考数据。
第二参考数据是指历史保存的字幕结果对应的参考数据。相对应于第一参考数据,第二参考数据可以包括历史存储的语音特征、历史存储的识别文本以及历史存储的翻译文本等数据中的一项或多项。
如果电子设备可以查找到与第一参考数据匹配的第二参考数据,则电子设备可以确定上述第一媒体内容为被识别过的媒体内容。
如果电子设备无法查找到与第一参考数据匹配的第二参考数据,则电子设备可以确定上述第一媒体内容为未被识别过的媒体内容。
在第一方面的一种可能的实现方式中,所述若确定所述第一媒体内容为被识别过的媒体内容,则从历史保存的字幕结果中获取与所述第一媒体内容相对应的第一字幕结果进行显示,包括:
若查找到与所述第一参考数据匹配的第二参考数据,则从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示。
需要说明的是,当电子设备可以查找到与第一参考数据匹配的第二参考数据时,电子设备可以确定上述第一媒体内容为被识别过的媒体内容。
此时,电子设备可以从历史保存的字幕结果中,获取与第二参考数据对应的第一字 幕结果进行显示。
通过上述方法,第二字幕软件可以在无法获取到第一媒体软件的媒体参数的情况下,识别第一媒体内容是否为被识别过的媒体内容。
在识别到第一媒体内容为被识别过的媒体内容时,电子设备可以从第一媒体内容对应的第一字幕结果开始,依次显示历史存储的字幕结果,节约语音识别所消耗的时间,提高了字幕显示的速度,降低了字幕显示的延迟,可以有效提高用户的使用体验。
在第一方面的一种可能的实现方式中,所述从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示,包括:
获取所述第一语音信号的断句结果,所述断句结果用于指示当前播放的一句第一语音信号已收音完成,或者,指示当前播放的一句第一语音信号尚未收音完成;
根据所述断句结果,从所述第一字幕结果开始,每收音完一句第一语音信号,则展示下一条历史保存的字幕结果。
需要说明的是,当电子设备确定第一媒体内容为被识别过的媒体内容时,电子设备可以从第一媒体内容对应的第一字幕结果开始,依次显示历史存储的字幕结果。
电子设备在显示历史存储的字幕结果时,为了确保第二字幕软件显示历史保存的字幕结果的速度与第一媒体软件播放媒体内容的速度一致,电子设备可以根据实际需求,选择合适的调速方式控制历史保存的字幕结果的显示速度。
在一些实施例中,电子设备可以获取第一语音信号的断句结果。
上述断句结果可以用于指示当前播放的一句第一语音信号是否收音完成。
如果上述断句结果指示当前播放的一句第一语音信号尚未收音完成,则电子设备可以暂不显示下一句历史存储的字幕结果。
如果上述断句结果指示当前播放的一句第一语音信号已经收音完成,则电子设备可以显示下一句历史存储的字幕结果。
通过上述调速方式,第一媒体软件每播放一句第一语音信号,第二字幕软件就显示一条历史保存的字幕结果,从而使电子设备显示历史保存的字幕结果的速度可以与第一媒体软件播放媒体内容的速度保持一致。
在第一方面的一种可能的实现方式中,所述从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示,包括:
从所述第一字幕结果开始,将所述第一参考数据与正在展示的整句字幕结果的参考数据进行比对;
当所述第一参考数据与正在展示的整句字幕结果的参考数据一致时,展示下一条历史保存的字幕结果。
需要说明的是,在另一些实施例中,电子设备也可以通过比对参考数据的方式调整第二字幕软件显示字幕结果的速度。
电子设备在显示历史保存的字幕结果时,可以继续识别第一语音信号对应的第一参考数据,并将第一参考数据与正在展示的整句字幕结果的参考数据进行比对。
当第一参考数据与正在展示的整句字幕结果的参考数据一致时,表示本句第一语音信号已经收音完成,电子设备可以展示下一条历史保存的字幕结果。
通过上述调速方式,第一媒体软件每播放一句第一语音信号,第二字幕软件就显示 一条历史存储的字幕结果,从而使电子设备展示历史存储的字幕结果的速度可以与第一媒体软件播放媒体内容的速度保持一致。
在第一方面的一种可能的实现方式中,所述从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示,包括:
从历史保存的收音时长中,获取所述第一字幕结果对应的第一收音时长;
根据所述第一语音信号,确定所述第一字幕结果对应的第二收音时长;
根据所述第一收音时长和所述第二收音时长确定调速参数;
根据所述调速参数调整各历史保存的展示时长,得到各历史保存的字幕结果对应的实际展示时长;
从所述第一字幕结果开始,根据各历史保存的字幕结果对应的实际展示时长,依次展示各历史保存的字幕结果。
需要说明的是,在另一些实施例中,电子设备也可以在存储字幕结果时,存储该字幕结果对应的收音时长和展示时长。
当电子设备获取到第二参考数据对应的第一字幕结果时,电子设备还可以获取第一字幕结果对应的第一收音时长。
以及,电子设备可以根据第一语音信号确定第一字幕结果对应的第二收音时长。
然后,电子设备可以根据第一收音时长和第二收音时长确定调速参数。当第一收音时长大于第二收音时长时,表示应该提高历史保存的字幕结果的显示速度。
当第一收音时长小于第二收音时长时,表示应该降低历史保存的字幕结果的显示速度。
当第一收音时长等于第二收音时长时,表示不需要调整历史保存的字幕结果的显示速度。
所以,在确定调速参数之后,电子设备可以根据调速参数调整各历史保存的展示时长,得到各历史保存的字幕结果对应的实际展示时长。
然后,电子设备可以从第一字幕结果开始,根据各历史保存的字幕结果对应的实际展示时长,依次展示各历史保存的字幕结果,从而确保第二字幕软件显示字幕结果的速度与第一媒体软件播放媒体内容的速度保持一致。
在第一方面的一种可能的实现方式中,在所述第一媒体软件启动后,展示媒体播放界面,所述媒体播放界面用于播放媒体文件;
在所述第二字幕软件启动后,展示字幕展示框,所述字幕展示框用于展示所述第二字幕软件识别到的字幕结果;
所述字幕展示框与所述媒体播放界面层叠显示,且所述字幕展示框位于所述媒体播放界面的上层。
需要说明的是,电子设备在启动了第一媒体软件之后,可以在显示屏上显示媒体播放界面,该媒体播放界面可以用于播放媒体文件。
电子设备在启动了第二字幕软件之后,可以在显示屏上显示字幕展示框,该字幕展示框用于显示第二字幕软件识别到的字幕结果,该字幕结果可以包括历史保存的字幕结果和第二字幕结果。
上述字幕展示框和上述媒体播放界面可以层叠显示,并且,上述字幕展示框可以位 于上述媒体播放界面的上层,避免字幕展示框显示的字幕结果被媒体播放界面遮挡,使得用户可以完整地查看到字幕展示框显示的字幕结果。
在第一方面的一种可能的实现方式中,当所述媒体播放界面处于横屏播放状态时,所述字幕展示框的宽度为第一宽度;
当所述媒体播放界面处于竖屏播放状态时,所述字幕展示框的宽度为第二宽度;所述第一宽度大于或等于所述第二宽度。
需要说明的是,电子设备在显示字幕显示框时,字幕显示框的展示方向可以跟随媒体播放界面的展示方向,确保用户可以在同一方向上查看媒体播放界面和字幕展示框。
所以,如果媒体播放界面发生播放状态变化时,电子设备可以对应调整字幕展示框的展示方向和宽度。
当媒体播放界面处于横屏播放状态时,电子设备可以让字幕展示框的宽度适配显示屏的长边,将字幕展示框的宽度调整为第一宽度。
当媒体播放界面处于竖屏播放状态时,电子设备可以让字幕展示框的宽度适配显示屏的短边,将字幕展示框的宽度调整为第二宽度。第一宽度大于第二宽度。
也即是说,当媒体播放界面处于横屏播放状态时,电子设备可以增加字幕展示框的宽度,使得字幕展示框内的一行展示区域可以展示更多的文字。
当媒体播放界面处于竖屏播放状态时,电子设备可以缩减字幕展示框的宽度,避免字幕展示框超出显示屏的显示区域。
第二方面,本申请实施例提供了一种字幕控制装置,包括:
第一软件模块,用于启动第一媒体软件,所述第一媒体软件用于播放媒体文件;
第二软件模块,用于启动第二字幕软件,所述第二字幕软件用于在所述第一媒体软件播放媒体文件时,将所述第一媒体软件播放的媒体内容的语音信号识别成相应的字幕结果并进行显示;
媒体播放模块,用于通过所述第一媒体软件播放第一媒体文件;
内容检测模块,用于通过所述第二字幕软件检测所述第一媒体软件当前播放的第一媒体内容;
第一字幕模块,用于若确定所述第一媒体内容为被识别过的媒体内容,则从历史保存的字幕结果中获取与所述第一媒体内容相对应的第一字幕结果进行显示。
在第二方面的一种可能的实现方式中,所述装置还包括:
第二字幕模块,用于若确定所述第一媒体内容为未被识别过的媒体内容,则将所述第一媒体内容的第一语音信号识别成第二字幕结果并进行显示。
在第二方面的一种可能的实现方式中,所述内容检测模块,包括:
媒体参数子模块,用于获取所述第一媒体内容对应的媒体参数,所述媒体参数包括所述第一媒体内容对应的时间戳;
第一查找子模块,用于从历史保存的字幕结果中,查找与所述媒体参数相对应的第一字幕结果。
在第二方面的一种可能的实现方式中,所述第一字幕模块,包括:
第一展示子模块,用于若查找到与所述媒体参数相对应的第一字幕结果,则显示所述第一字幕结果。
在第二方面的一种可能的实现方式中,所述内容检测模块,包括:
参考数据子模块,用于对所述第一媒体内容的第一语音信号进行识别,得到第一参考数据;所述第一参考数据包括所述第一语音信号的语音特征、对所述第一语音信号进行语音识别得到的识别文本和所述识别文本对应的目标语言类型的翻译文本中的一项或多项;
第二查找子模块,用于从历史保存的参考数据中,查找与所述第一参考数据匹配的第二参考数据。
在第二方面的一种可能的实现方式中,所述第一字幕模块,包括:
第二展示子模块,用于若查找到与所述第一参考数据匹配的第二参考数据,则从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示。
在第二方面的一种可能的实现方式中,所述第二展示子模块,包括:
断句结果子模块,用于获取所述第一语音信号的断句结果,所述断句结果用于指示当前播放的一句第一语音信号已收音完成,或者,指示当前播放的一句第一语音信号尚未收音完成;
断句展示子模块,用于根据所述断句结果,从所述第一字幕结果开始,每收音完一句第一语音信号,则展示下一条历史保存的字幕结果。
在第二方面的一种可能的实现方式中,所述第二展示子模块,包括:
参考比对子模块,用于从所述第一字幕结果开始,将所述第一参考数据与正在展示的整句字幕结果的参考数据进行比对;
参考展示子模块,用于当所述第一参考数据与正在展示的整句字幕结果的参考数据一致时,展示下一条历史保存的字幕结果。
在第二方面的一种可能的实现方式中,所述第二展示子模块,包括:
第一收音子模块,用于从历史保存的收音时长中,获取所述第一字幕结果对应的第一收音时长;
第二收音子模块,用于根据所述第一语音信号,确定所述第一字幕结果对应的第二收音时长;
调速参数子模块,用于根据所述第一收音时长和所述第二收音时长确定调速参数;
实际展示子模块,用于根据所述调速参数调整各历史保存的展示时长,得到各历史保存的字幕结果对应的实际展示时长;
调速展示子模块,用于从所述第一字幕结果开始,根据各历史保存的字幕结果对应的实际展示时长,依次展示各历史保存的字幕结果。
在第二方面的一种可能的实现方式中,所述第一软件模块,还用于在所述第一媒体软件启动后,展示媒体播放界面,所述媒体播放界面用于播放媒体文件;
所述第一软件模块,还用于在所述第二字幕软件启动后,展示字幕展示框,所述字幕展示框用于展示所述第二字幕软件识别到的字幕结果;
所述字幕展示框与所述媒体播放界面层叠显示,且所述字幕展示框位于所述媒体播放界面的上层。
在第二方面的一种可能的实现方式中,当所述媒体播放界面处于横屏播放状态时,所述字幕展示框的宽度为第一宽度;
当所述媒体播放界面处于竖屏播放状态时,所述字幕展示框的宽度为第二宽度;所述第一宽度大于或等于所述第二宽度。
第三方面,本申请实施例提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述电子设备被配置为执行所述计算机程序时实现如第一方面和第一方面可能的实现方式中任一所述的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质被配置为存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如第一方面和第一方面可能的实现方式中任一所述的方法。
第五方面,本申请实施例提供了一种计算机程序产品,所述计算机程序产品被配置为在电子设备上运行时,使得电子设备执行如第一方面和第一方面可能的实现方式中任一所述的方法。
第六方面,本申请实施例提供了一种芯片***,所述芯片***包括存储器和处理器,所述处理器被配置为执行所述存储器中存储的计算机程序,以实现如第一方面和第一方面可能的实现方式中任一所述的方法。
本申请实施例与现有技术相比存在的有益效果是:
在本申请实施例提供的字幕控制方法中,在启动第一媒体软件和第二字幕软件之后,电子设备可以通过第一媒体软件播放第一媒体文件,并通过第二字幕软件检测第一媒体文件当前播放的第一媒体内容。
如果电子设备检测到第一媒体内容为被识别过的媒体内容,则电子设备可以从历史保存的字幕结果中,直接获取与上述第一媒体内容对应的第一字幕结果进行显示,减少字幕识别所消耗的时间,降低字幕显示的延迟,具有较强的易用性和实用性。
附图说明
图1为本申请实施例提供的一种电子设备的结构示意图;
图2为本申请实施例提供的一种字幕控制方法的流程示意图;
图3为本申请实施例提供的一种场景示意图;
图4为本申请实施例提供的另一种场景示意图;
图5为本申请实施例提供的另一种场景示意图;
图6为本申请实施例提供的另一种场景示意图;
图7为本申请实施例提供的另一种场景示意图;
图8为本申请实施例提供的另一种场景示意图;
图9为本申请实施例提供的另一种场景示意图;
图10为本申请实施例提供的另一种场景示意图;
图11为本申请实施例提供的另一种场景示意图;
图12为本申请实施例提供的一种字幕文件的示意图;
图13为本申请实施例提供的另一种场景示意图;
图14为本申请实施例提供的另一种字幕控制方法的流程示意图;
图15为本申请实施例提供的另一种场景示意图;
图16为本申请实施例提供的另一种场景示意图;
图17为本申请实施例提供的另一种场景示意图;
图18为本申请实施例提供的另一种场景示意图;
图19为本申请实施例提供的另一种场景示意图;
图20为本申请实施例提供的另一种字幕文件的示意图;
图21为本申请实施例提供的另一种场景示意图;
图22为本申请实施例提供的另一种场景示意图;
图23为本申请实施例提供的另一种场景示意图;
图24为本申请实施例提供的另一种场景示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定***结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的***、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指
“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
语音识别技术是指对语音信号进行识别,将语音信号转化成相应语言的文本的技术。
语音识别技术的一个重要应用场景是对正在播放的音频/视频进行实时识别,将正在播放的音频/视频中的语音信号转换成相应语言的字幕,并将字幕展示给用户。
例如,对于一些没有字幕的英语视频,用户在观看视频时,可以通过字幕生成软件对视频中的语音信号进行实时识别,将语音信号翻译成中文文本,便于用户观看。
在当前的字幕识别方案中,电子设备通常只对实时获取到的语音信号进行识别,不关心音频/视频的播放状态。
也即是说,即使用户播放的音频/视频已经被识别过,电子设备依然只对实时获取到的语音信号进行识别,无法利用历史识别到的字幕。
在这种场景中,上述方案固然可以满足用户查看字幕的基本需求,但是,字幕显示的延迟较高,用户体验不佳。
有鉴于此,本申请实施例提供了一种字幕控制方法,当电子设备播放已经识别过的媒体内容时,电子设备可以直接显示该媒体内容对应的历史字幕结果,减少字幕显示的延迟,提高用户的观看体验,具有较强的易用性和实用性。
本申请实施例提供的字幕控制方法可以应用于电子设备,该电子设备可以为手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、个人数字助理(personal digital assistant,PDA)、上网本等具有显示屏的电子设备,本申请实施例对电子设备的具体类型不作任何限制。
参考图1,图1示例性示出了本申请实施例提供的电子设备100的结构示意图。
如图1所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,摄像头180,显示屏181,以及用户标识模块(subscriber identification module,SIM)卡接口182等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了***的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I1C)接口,集成电路内置音频(inter-integrated circuit sound,I1S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器, 也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏181,摄像头180,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏181显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星***(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯***(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位***(global positioning system,GPS),全球导航卫星***(global navigation satellite system,GLONASS),北斗卫星导航***(beidou navigation satellite system,BDS),准天顶卫星***(quasi-zenith satellite system,QZSS)和/或星基增强***(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏181,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏181和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏181用于显示图像,视频等。显示屏181包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),或者采用有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等材料制成。在一些实施例中,电子设备100可以包括1个或N个显示屏181,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG1,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等媒体文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作***,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,字幕文件等)等。此 外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
SIM卡接口182用于连接SIM卡。SIM卡可以通过***SIM卡接口182,或从SIM卡接口182拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
以下,将根据图1所示的电子设备和图2所示的字幕控制方法的流程示意图,对本申请实施例提供的字幕控制方法进行详细说明。
步骤S101:查找与正在播放的媒体内容对应的历史字幕结果,若未查找到,则执行步骤S102,若查找到,则执行步骤S103。
在本申请实施例中,电子设备可以安装有媒体播放软件和字幕软件。
其中,媒体播放软件可以用于播放各类媒体文件,该媒体文件可以包括音频文件、视频文件等带有语音信号的文件。上述媒体播放软件可以理解为上述第一媒体软件。
上述媒体播放软件可以是电子设备的***应用程序,比如电子设备的***自带的视 频应用程序;或者,上述媒体播放软件也可以是第三方厂商开发的专门用于播放媒体文件的应用程序,比如腾讯视频、爱奇艺等;或者,上述媒体播放软件也可以是具有媒体播放功能的综合性应用程序,比如微信等。
字幕软件可以用于管理识别字幕结果以及展示字幕结果的过程。上述字幕软件可以理解为上述第一字幕软件。
上述字幕软件可以是电子设备的***服务;或者,上述字幕软件也可以是电子设备的***应用程序;或者,上述字幕软件也可以是第三方厂商开发的专门用于管理和展示字幕的应用程序;或者,上述字幕软件也可以是具有字幕识别功能的综合性应用程序。
在实际的应用场景中,电子设备可以设置有多种启动字幕软件的触发方式。例如,在一些实施例中,电子设备可以响应于用户对字幕软件的图标的点击操作,启动字幕软件;在另一些实施例中,电子设备上安装有语音助手,用户可以对电子设备下达启动字幕软件的语音指令,比如,用户可以对电子设备说“小艺小艺,打开AI字幕”,电子设备的语音助手可以识别用户的语音指令,启动字幕软件;在另一些实施例,电子设备也可以响应于其他触发方式,启动字幕软件。本申请实施例对电子设备启动字幕软件的触发方式不予限制。
在本申请实施例中,上述媒体播放软件和上述字幕软件可以数据互通,可以互相获取到对方的参数。或者,上述字幕软件也可以单向获取媒体播放软件的参数。
在电子设备启用了字幕软件和媒体播放软件之后,如果媒体播放软件播放带有语音信号的媒体文件(即上述第一媒体文件),则字幕软件可以周期性或非周期性地获取媒体播放软件的媒体参数。
上述媒体参数可以包括正在播放的媒体文件的文件标识、正在播放的媒体内容(即上述第一媒体内容)对应的时间戳和播放速度等参数中的任意一种或多种。
上述文件标识用于区分不同的媒体文件。上述文件标识可以用数字、字符、标点符号等类型的表现元素中的任意一种或多种的组合进行表示。
例如,在一些示例中,电子设备可以将数字“124234421”作为某一个媒体文件的文件标识;在另一些示例中,电子设备可以将字符“测试文件”作为某一个媒体文件的文件标识;在另一些示例中,电子设备可以将“a-12”作为某一个媒体文件的文件标识;在另一些示例中,电子设备也可以通过其他表现元素及组合表示媒体文件的文件标识。本申请实施例对上述文件标识的具体表现形式不予限制。
上述媒体内容是指媒体播放软件播放上述媒体文件时,媒体播放软件所展示的内容。
例如,当媒体播放软件播放视频文件时,上述媒体文件是指该视频文件,上述正在播放的媒体内容是指媒体播放软件正在播放的视频画面以及该视频画面对应的语音信号;
当媒体播放软件播放音频文件时,上述媒体文件是指该音频文件,上述正在播放的媒体内容是指媒体播放软件正在播放的语音信号。
此外,当字幕软件根据预设采集周期,周期性地获取上述媒体参数时,上述预设采集周期可以根据实际需求进行设置。
例如,在一些实施例中,预设采集周期可以被设置为0.1秒,字幕软件可以在1秒内查询10次媒体播放软件的媒体参数;在另一些实施例中,预设采集周期可以被设置为0.2秒,字幕软件可以在1秒内查询5次媒体播放软件的媒体参数;在另一些实施例 中,预设采集周期可以被设置为1秒,字幕软件可以每秒查询1次媒体播放软件的媒体参数;在另一些实施例中,预设采集周期也可以被设置为其他数值,本申请实施例对预设采集周期的具体取值不予限制。
在获取到上述媒体参数之后,字幕软件可以在存储器中查找与上述媒体参数对应的历史字幕结果。
示例性地,当存储器中存储有多个字幕文件,不同的字幕文件对应不同的文件标识时,字幕软件可以根据上述媒体参数中的文件标识和正在播放的媒体内容的时间戳查找相应的历史字幕结果。
当存储器中仅存储有当前播放的媒体文件对应的字幕文件时,字幕软件可以根据上述媒体参数中正在播放的媒体内容对应的时间戳查找相应的历史字幕结果。
上述存储器可以包括设置于电子设备内部的存储器、与电子设备连接的外部存储器、云存储器等存储器中的任意一种或多种。
上述历史字幕结果是指电子设备过往存储的字幕结果。
如果字幕软件不能在存储器查找到与上述媒体参数对应的历史字幕结果(即上述第一字幕结果),则表示正在播放的媒体内容为未被识别过的媒体内容,字幕软件可以执行步骤S102。
如果字幕软件可以在存储器上查找到上述媒体参数对应的历史字幕结果,则表示正在播放的媒体内容为被识别过的媒体内容,字幕软件可以执行步骤S103。
步骤S102、对正在播放的媒体内容的语音信号进行识别,得到实时字幕结果并进行展示。
当媒体播放软件播放未被识别过的媒体内容时,字幕软件可以展示实时识别到的字幕结果(以下简称实时字幕结果,即上述第二字幕结果)。
具体地,字幕软件在识别实时字幕结果的过程中,可以实时获取正在播放的语音信号(即上述第一语音信号)。
其中,字幕软件获取语音信号的方式可以根据实际需求进行设置。例如,在一些实施例中,字幕软件可以直接从上述媒体播放软件中获取到上述语音信号;在另一些实施例中,字幕软件也可以通过扬声器接口,获取到扬声器实时播放的语音信号;在另一些实施例中,字幕软件也可以通过麦克风实时录制扬声器播放的语音信号;在另一些实施例中,字幕软件也可以通过其他方式获取上述语音信号,本申请实施例对字幕软件获取语音信号的具体方式不予限制。
然后,字幕软件可以对上述语音信号进行预处理,得到上述语音信号对应的语音特征。
预处理的过程可以根据实际需求进行设置。例如,在一些实施例中,字幕软件可以通过频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)算法对上述语音信号进行预处理,得到上述语音信号对应的MFCC特征;在另一些实施例中,字幕软件可以通过线性预测倒谱系数(Linear Prediction Cepstral Coefficients,LPCC)算法对上述语音信号进行预处理,得到上述语音信号对应的LPCC特征;在另一些实施例中,字幕软件也可以对上述语音信号进行音素处理,得到上述语音信号对应的音素。
相应的,上述语音特征可以包括MFCC特征、LPCC特征、音素等类型的语音特征 中的任意一种或多种。
之后,字幕软件可以通过自动语音识别(Automatic Speech Recognition,ASR)模型对上述语音特征进行识别,得到上述语音特征对应的识别文本。
上述ASR模型的类型可以根据实际需求进行设置。例如,上述ASR模型可以包括高斯混合/隐马尔科夫模型(Gaussian Mixed Model/Hidden Markov Model,GMM/HMM)、连接时序分类(Connectionist Temporal Classification,CTC)模型、变换器(Transducer)模型、注意力(Attention)模型等模型中的任意一种或多种。本申请实施例对上述ASR模型的具体类型不予限制。
当上述识别文本的语言类型与目标语言类型相同时,字幕软件可以将上述识别文本确定为实时字幕结果。
当上述识别文本的语言类型与目标语言类型不同时,字幕软件可以对上述识别文本进行翻译,得到目标语言类型的翻译文本,并将该翻译文本确定为实时字幕结果。
上述目标语言类型可以字幕软件默认设置的,例如,字幕软件可以根据电子设备所在的地区,默认设置相应的目标语言类型;或者,上述目标语言类型也可以是用户在字幕软件上主动设置的。
示意性地,请参阅图3,假设媒体播放软件正在播放的音频为英语音频。
此时,如图3中的(a)场景所示,如果用户指定的目标语言类型为英语,则字幕软件可以直接将ASR模型输出的英语文本“Good morning,sir”作为实时字幕结果。
如图3中的(b)场景所示,如果用户指定的目标语言类型为中文,则字幕软件可以将ASR模型输出的英语文本“Good morning,sir”翻译成中文文本“先生,早上好”,将该中文文本“先生,早上好”作为实时字幕结果。
字幕软件在进行翻译时,可以对上述识别文本进行特征提取,得到上述识别文本对应的文本特征。
上述文本特征的类型和上述特征提取的方式可以根据实际需求进行设置。例如,在一些实施例中,字幕软件可以对上述识别文本进行向量化处理,将上述识别文本转化成词向量;在另一些实施例中,字幕软件也可以通过其他方式对上述识别文本进行特征提取,得到相应类型的文本特征。本申请实施例对上述文本特征的特征类型和上述特征提取的具体方式不予限制。
然后,字幕软件可以通过文本翻译模型对上述文本特征进行处理,得到目标语言类型的翻译文本,并将该翻译文本确定为实时字幕结果。
上述文本翻译模型的类型可以根据实际需求进行设置。例如,上述文本翻译模型可以包括多任务深度神经网络(Multi-Task Learning in Deep Neural Networks,MT-DNN)模型、序列到序列(sequence-to-sequence,seq2seq)模型等模型中的任意一种或多种。本申请实施例对上述文本翻译模型的具体类型不予限制。
当媒体播放软件正在播放的媒体内容为未被识别过的媒体内容时,字幕软件可逐渐展示相应的实时字幕结果。
示例性地,如图4中的(a)场景所示,当字幕软件获取到的语音信号为“How”时,字幕软件可以展示实时字幕结果“怎么”;如图4中的(b)场景所示,当字幕软件获取到的语音信号为“How are you?”时,字幕软件可以展示实时字幕结果“你好 吗?”;如图4中的(c)场景所示,当字幕软件获取到下一句语音信号“I'm”时,字幕软件可以展示下一句实时字幕结果“我”;如图4中的(d)场景所示,当字幕软件获取到语音信号“I'm fine”时,字幕软件可以展示实时字幕结果“我很好”。
此外,字幕软件还可以将上述实时字幕结果保存在存储器中,建立上述实时字幕结果与上述媒体参数的关联关系。
通过上述方法,当媒体播放软件正在播放的媒体内容为未被识别过的媒体内容时,字幕软件可以展示实时字幕结果,满足用户查看字幕的基本需求,便于用户了解上述媒体内容所表达的含义。
在一些实施例中,字幕软件还可以保存上述实时字幕结果,建立该实时字幕结果与上述媒体参数的关联关系,以备后续的调用需求。
步骤S103、展示与正在播放的媒体内容对应的历史字幕结果。
当媒体播放软件播放被识别过的媒体内容时,字幕软件可以直接展示与该媒体内容对应的历史字幕结果。
示例性的,如图5所示,当媒体播放软件回放一段被识别过的视频内容时,字幕软件可以根据正在播放的视频的文件标识查找到相应的字幕文件,并在字幕文件中查找到与时间戳“13:01”对应的历史字幕结果“你好吗?”。
此时,如图6中的(a)场景和(b)场景所示,即使字幕软件尚未获取到完整的语音信号“How are you?”,字幕软件也可以直接展示完整的历史字幕结果“你好吗?”。
如图6中的(c)场景和(d)场景所示,当媒体播放软件的时间戳为“13:03”时,字幕软件可以展示下一句历史字幕结果“我很好”。
通过上述示例可知,当媒体播放软件正在播放的媒体内容为被识别过的媒体内容时,字幕软件可以直接调用和展示该媒体内容对应的历史字幕结果,节约实时识别语音信号所消耗的时间,降低字幕显示的延迟。
字幕软件在展示历史字幕结果时,可以在获取到整句语音信号之前,直接展示该句语音信号对应的完整的历史字幕结果,极大地提高了用户的观看体验。
此外,字幕软件在展示历史字幕结果时,可以根据实际需求选择合适的调速方式。
在一些实施例中,字幕软件可以实时获取媒体播放软件的时间戳,根据媒体播放软件的时间戳实时展示相应的历史字幕结果。
示例性地,当媒体播放软件播放一个媒体文件时,字幕软件可以实时获取正在播放的媒体内容对应的时间戳。
当正在播放的媒体内容的时间戳为“00:01”时,字幕软件展示时间戳“00:01”对应的历史字幕结果;当正在播放的媒体内容的时间戳为“00:02”时,字幕软件展示时间戳“00:02”对应的历史字幕结果;当正在播放的媒体内容的时间戳为“01:00”时,字幕软件展示时间戳“01:00”对应的历史字幕结果。
在本实施例中,由于字幕软件是根据时间戳播放相应的历史字幕结果,所以,历史字幕结果的播放速度可以始终跟随媒体播放软件的播放速度。当媒体播放软件的播放速度加快时,上述时间戳的变化速度加快,历史字幕结果的播放速度也随之加快;当媒体播放软件的播放速度减慢时,上述时间戳的变化速度减慢,历史字幕结果的播放速度也随之减慢。
在另一些实施例中,字幕软件也可以获取媒体播放软件的播放速度,从当前正在播放的媒体内容的时间戳对应的历史字幕结果开始,根据媒体播放软件的播放速度,依次展示后续的历史字幕结果。
当媒体播放软件的播放速度发生变化时,字幕软件可以根据媒体播放软件变化后的播放速度,调整历史字幕结果的播放速度。
示例性地,当媒体播放软件播放一个媒体文件时,正在播放的媒体内容的时间戳为“03:00”,播放速度为1倍速。
字幕软件在获取到正在播放的媒体内容对应的时间戳“03:00”和播放速度“1倍速”之后,可以从时间戳“03:00”对应的历史字幕开始,按照播放速度“1倍速”依次展示后续的历史字幕结果。
假设下一条历史字幕结果对应的时间戳为“03:02”,则字幕软件可以在2秒后展示下一条历史字幕结果。
当媒体播放软件的播放速度发生变化时,媒体播放软件可以向字幕软件传递新的播放速度。
此时,字幕软件可以将历史字幕结果的播放速度调整为新的播放速度。
假设字幕软件正在展示时间戳“04:56”对应的历史字幕结果,字幕软件获取到媒体播放软件的播放速度调整为“0.5倍速”,下一条历史字幕结果对应的时间戳为“04:58”。
此时,字幕软件可以将历史字幕结果的播放速度调整为“0.5倍速”,在4秒后展示下一条历史字幕结果。
在本实施例中,字幕软件可以获取到媒体播放软件的播放速度,并按照媒体播放软件的播放速度展示历史字幕结果。当媒体播放软件的播放速度加快时,历史字幕结果的播放速度也随之加快;当媒体播放软件的播放速度减慢时,历史字幕结果的播放速度也随之减慢。所以,字幕软件展示的历史字幕结果可以与媒体播放软件正在播放的媒体内容匹配,避免出现历史字幕结果与媒体内容脱节的情况。
在另一些实施例中,字幕软件也可以通过其他方式控制历史字幕结果的播放速度。本申请实施例对字幕软件的具体调速方式不予限制。
在步骤S102或步骤S103之后,电子设备还可以执行步骤S104。
步骤S104、当媒体播放软件的播放进度发生回退时,查找到回退后的媒体内容对应的历史字幕结果并进行展示。
用户在观看媒体播放软件播放的媒体内容时,有可能会对媒体播放软件执行后退、跳转等操作,使得媒体播放软件的播放进度回退到某个播放进度,从该播放进度开始,播放回退后的媒体内容。
因此,为了使显示的字幕结果与回退后的媒体内容匹配,字幕软件可以在媒体播放软件发生回退时,根据媒体播放软件的媒体参数,查找与回退后的媒体内容对应的历史字幕结果。
由于回退后的媒体内容为被识别过的媒体内容,所以,字幕软件可以查找到回退后的媒体内容对应的历史字幕结果并进行展示。
此外,在一些实施例中,用户在查看字幕结果的过程中,有可能会想回顾之前一段 时间展示过的字幕结果,对字幕软件执行回退操作。
此时,字幕软件可以响应于用户的回退操作,回退到之前展示过的字幕结果。
上述回退操作的形式可以根据实际场景确定。例如,在一些实施例中,上述回退操作可以为用户对字幕展示框的向下滑动操作。
如图7中的(a)场景所示,假设字幕展示框中实时展示的字幕结果为“早上好,杰克”-“今天天气真好”-“我们一起出去玩吧”-“听起来不错”。
当字幕软件检测到用户对字幕展示框的向下滑动操作时,如图7中的(b)场景所示,字幕软件可以控制字幕展示框回退到之前展示过的字幕结果“早上好,汤姆”。
在另一些实施例中,上述回退操作可以为用户对字幕展示框的进度条的拖动操作。
如图8中的(a)场景所示,假设字幕展示框内设置有进度条,字幕展示框中实时展示的字幕结果为“今天真开心”-“我们该回家了”-“你说得对”-“走吧”。
当字幕软件检测到用户对字幕展示框的进度条的拖动操作时,如图8中的(b)场景所示,字幕软件可以控制字幕展示框回退到之前展示过的字幕结果“早上好,汤姆”-“早上好,杰克”-“今天天气真好”-“我们一起出去玩吧”,提示用户回退了10秒。
在另一些实施例中,回退操作也可以表现为其他形式的操作。本申请实施例对回退操作的具体形式不予限制。
在响应了用户的回退操作之后,字幕软件可以控制字幕展示框保持在当前的字幕展示界面;或者,字幕软件也可以控制字幕展示框以预设滚动速度,依次展示后续的字幕结果,直至返回到最新的字幕结果。
上述预设滚动速度可以根据实际需求设置为具体数值,或者,也可以设置为具体倍速。例如,在一些实施例中,上述预设滚动速度可以设置为1行/秒;在另一些实施例中,上述预设滚动速度可以设置为2行/秒;在另一些实施例中,上述预设滚动速度可以设置为5行/秒;在另一些实施例中,上述预设速度也可以是设置为1.5倍速;在另一些实施例中,上述预设速度也可以是设置为2倍速;在另一些实施例中,上述预设滚动速度也可以设置为其他数值或倍速,本申请实施例对预设滚动速度的具体设置方式不予限制。
上述预设滚动速度可以是厂商的工作人员预先设置的,或者,上述预设滚动速度也可以是用户主动设置的。本申请实施例对预设滚动速度的来源不予限制。
在一些实施例中,当媒体播放软件和字幕软件可以数据互通时,媒体播放软件可以跟随字幕软件的操作,回退媒体文件的播放进度。
例如,假设媒体播放软件的播放进度为“25:01”,字幕软件响应于用户的操作,回退展示时间戳“10:54”对应的字幕结果。此时,字幕软件可以向媒体播放软件发送回退通知,该回退通知可以包括字幕软件正在展示的字幕结果对应的时间戳“10:54”。
当媒体播放软件接收到回退通知后,媒体播放软件可以根据时间戳“10:54”,回退播放时间戳“10:54”对应的媒体内容。
在另一些实施例中,媒体播放软件也可以不跟随字幕软件的操作,根据当前的播放进度继续播放媒体文件。
例如,参照上一示例,字幕软件回退展示时间戳“10:54”对应的字幕结果时,媒体播放软件也可以不跟踪字幕软件的操作,继续播放时间戳“25:01”之后的媒体内容。
当媒体播放软件和字幕软件不可以数据互通,或者媒体播放软件不跟随字幕软件回退播放进度时,字幕软件在字幕回退的过程中,可以停止识别实时播放的媒体内容对应的字幕结果,或者,也可以继续识别实时播放的媒体内容对应的字幕结果。
如果字幕软件继续识别实时播放的媒体内容对应的字幕结果,则字幕软件返回显示最新的字幕结果时,字幕软件可以展示字幕回退期间识别到的字幕结果,避免字幕结果发生断层。
例如,参照图8中的(a)场景和(b)场景,假设字幕展示框回退到10秒之前的字幕展示界面,在该字幕展示界面停留了2秒。
在这2秒期间,字幕软件可以继续对实时播放的媒体内容进行识别,得到字幕结果“我到家了”-“明天见”。
在这2秒之后,如图9中的(a)场景和(b)场景所示,用户再次拖动字幕展示框中的进度条,返回查看最新的字幕结果。
此时,字幕软件可以响应于用户的操作,在字幕展示框中展示“你说得对”-“走吧”-“我到家了”-“明天见”。
通过上述方法,字幕软件可以响应于用户的回退操作,回退到之前展示过的字幕结果,便于用户回顾之前播放的媒体内容。
在字幕回退的过程中,字幕软件可以继续对正在播放的媒体内容进行识别,得到相应的字幕结果。
当字幕软件返回展示最新的字幕结果时,字幕软件可以直接向用户展示回退期间识别到的字幕结果,避免字幕结果发生断层。
为了便于理解,以下将通过具体的应用场景对上述字幕控制方法进行详细说明。
请参阅图10,在本示例中,假设电子设备为平板电脑1,平板电脑1上安装有媒体播放软件和字幕软件。
当用户想要启用字幕识别功能时,用户可以启动平板电脑1上安装的字幕软件。字幕软件在启动后,可以在平板电脑1的显示屏上显示字幕展示框11,字幕展示框11用于展示字幕软件识别到的字幕结果,该字幕结果可以包括实时字幕结果和历史字幕结果。
然后,用户使用媒体播放软件打开了一个未识别过的英语视频。
此时,媒体播放软件可以展示媒体播放软件的软件界面(即上述媒体播放界面),在媒体播放软件的软件界面中播放上述英语视频的视频画面和语音信号,并提供进度条12,进度条12可以用于控制上述英语视频的播放进度。
字幕展示框11可以在媒体播放软件的软件界面层叠显示,字幕展示框11位于媒体播放软件的软件界面的上层。此外,字幕软件还可以根据媒体播放软件的横屏播放模式和竖屏播放模式,对应调节字幕展示框11的形态。
字幕软件可以以0.5秒为采集周期,周期性地向媒体播放软件获取媒体参数,该媒体参数包括上述英语视频的文件标识和实时播放进度的时间戳。
在获取到媒体参数之后,字幕软件可以在平板电脑1的存储器中查找与上述文件标识和上述时间戳对应的历史字幕结果。
此时,如果字幕软件可以在平板电脑1的存储器中查找到与上述文件标识和上述时间戳对应的历史字幕结果,则字幕软件可以在字幕展示框11中展示该历史字幕结果。
如果字幕软件在平板电脑1的存储器中未查找到与上述文件标识和上述时间戳对应的历史字幕结果,则字幕软件可以在字幕展示框11中展示实时字幕结果。
如上所述,上述英语视频为未被识别过的视频,因此,字幕软件无法存储器中查找到与上述文件标识和上述时间戳对应的历史字幕结果,字幕软件可以跟随媒体播放软件播放的语音信号,持续地展示相应的实时字幕结果。
示例性地,如图11中的(a)场景所示,当字幕软件获取到的语音信号为“Good”时,字幕展示框11可以展示“好”;如图11中的(b)场景所示,当字幕软件获取到的语音信号为“Good morning,Tom”时,字幕展示框11可以展示“早上好,汤姆”。
如图11中的(c)场景所示,当字幕软件获取到下一句语音信号“Good”时,字幕展示框11可以展示“好”;如图11中的(d)场景所示,当字幕软件获取到的语音信号为“Good morning,Jack”时,字幕展示框11可以展示“早上好,杰克”。
此外,如图12所示,字幕软件还可以创建与上述英语视频的文件标识对应的字幕文件A,在字幕文件A中记录实时字幕结果和实时字幕结果对应的时间戳。
参考前述场景,字幕软件可以在字幕文件A中记录“00:00-00:01早上好,汤姆”和“00:02-00:03早上好,杰克”。
如图13中的(a)场景所示,在英语视频播放完毕,用户拖动进度条12到“00:00”,重新播放该英语视频。
此时,如图13中的(b)场景所示,字幕软件可以根据上述英语视频的文件标识查找到字幕文件A,并在字幕文件A中查找到时间戳“00:00”对应的历史字幕文件“早上好,汤姆”。
所以,即使此时字幕软件只获取到语音信号“Good”,尚未获取到该句完整的语音信号,字幕软件也可以在字幕展示框11中展示完整的历史字幕结果“早上好,汤姆”。
在媒体播放软件的时间戳为“00:02”时,字幕软件可以在字幕展示框11中展示下一句历史字幕结果“早上好,杰克”。
以此类推,字幕软件可以根据媒体播放软件的时间戳,依次展示相应的历史字幕结果。
当媒体播放软件的时间戳为“02:14”时,字幕软件可以展示字幕文件A中的最后一句历史字幕结果“明天见”。
综上所述,在现有的字幕显示方案中,媒体播放软件播放的媒体文件中嵌入有字幕文件,所以,媒体播放软件可以统一管理播放的媒体内容和相应的字幕结果。
在本申请实施例提供的字幕控制方法中,电子设备可以通过独立于媒体播放软件的字幕软件,对媒体播放软件正在播放的媒体内容进行实时检测,确认正在播放的媒体内容是否为被识别过的媒体内容,以确定是对正在播放的媒体内容进行实时识别,或展示与正在播放的媒体内容对应的历史字幕结果。
当媒体播放软件播放未被识别过的媒体内容时,字幕软件可以对实时获取到的语音信号进行识别,得到实时字幕结果并进行展示,满足用户查看字幕的基本需求。
当媒体播放软件播放被识别过的媒体内容时,字幕软件可以直接展示该媒体内容对应的历史字幕结果,节约实时识别字幕所消耗的时间,降低字幕展示的延迟。
此外,字幕软件展示历史字幕结果时,可以在获取到整句语音信号之前,直接展示 该句语音信号对应的完整的历史字幕结果,极大地提高了用户的观看体验。
当字幕软件检测到用户的回退操作时,字幕软件可以响应于该回退操作,灵活展示相应的字幕结果,满足用户的回顾需求。
应理解,上述实施例所描述的媒体播放软件并不限制于某一个媒体播放软件。在实际的应用场景中,上述媒体播放软件可以是一个媒体播放软件,或者,也可以是多个媒体播放软件。
示例性地,在用户启动了“腾讯视频”应用程序和上述字幕软件之后,上述字幕软件可以将“腾讯视频”应用程序播放的视频内容识别成相应的字幕结果并进行显示。
之后,假设用户又启动了“Youtube”应用程序,通过“Youtube”应用程序播放视频内容。此时,上述字幕软件可以将“Youtube”应用程序播放的视频内容识别成相应的字幕结果并进行显示,并不局限于识别“腾讯视频”应用程序播放的视频内容。
上述实施例中的各步骤不是在所有的实施例中都是必须的。在实际的应用场景中,电子设备所实施的字幕控制方法可以拥有比以上描述的字幕控制方法更多或更少的步骤。此外,上述实施例中各步骤的序号并不意味着执行顺序的先后,各过程的执行顺序应以其功能、内在逻辑以及实际的应用场景确定,而不应对本申请实施例的实施过程构成任何限定。
示例性地,在一些实施例中,字幕软件可以在媒体播放软件播放媒体文件时,实施上述步骤S101至步骤S103所描述的方法。在另一些实施例中,字幕软件可以在媒体播放软件播放媒体文件时,实施步骤S102所描述的方法。当媒体播放软件的播放进度发生回退时,字幕软件再实施步骤S104所描述的方法。
在以上实施例所描述的场景中,字幕软件可以获取到媒体播放软件的媒体参数,根据媒体参数确定正在播放的媒体文件和播放进度。
但是,在另一些场景中,字幕软件与媒体播放软件可能是相互独立的模块,数据不互通。此时,字幕软件可能无法获取媒体播放软件的媒体参数,难以应用上述字幕控制方法。
为此,以下将针对字幕软件无法获取媒体参数的场景,详细描述本申请实施例提供的另一种字幕控制方法。
请参阅图14,图14示例性地示出了本申请实施例提供的另一种字幕控制方法的流程示例图。如图14所示,另一种字幕控制方法包括:
在本申请实施例中,电子设备可以安装有媒体播放软件和字幕软件。
在电子设备启用了媒体播放软件和字幕软件之后,如果媒体播放软件播放带有语音信号的媒体文件,则字幕软件可以执行步骤S201。
步骤S201、对正在播放的语音信号进行识别,展示该语音信号对应的实时字幕结果,并获取该实时字幕结果对应的第一参考数据。
上述第一参考数据可以包括与上述语音信号相关的信息,和/或,与上述实时字幕结果相关的信息。
字幕软件可以在媒体播放软件播放媒体文件的过程中,获取正在播放的媒体内容的语音信号,对该语音信号进行识别,得到实时字幕结果。
其中,字幕软件识别实时字幕结果的方法可以参考上一实施例中步骤S102所描述 的内容,在此不重复赘述。
字幕软件在识别到实时字幕结果之后,可以展示该实时字幕结果,并获取该实时字幕结果对应的第一参考数据。
上述第一参考数据所包含的内容可以根据实际需求进行设置。
例如,在一些实施例中,上述第一参考数据可以包括上述语音信号对应的语音特征;在另一些实施例中,上述第一参考数据可以包括对上述语音信号进行语音识别所得到的识别文本;在另一些实施例中,上述第一参考数据可以包括上述识别文本对应的目标语言类型的翻译文本;在另一些实施例中,上述第一参考数据也可以包括其他内容,本申请实施例对上述第一参考数据所包含的具体内容不予限制。
步骤S202、查找与上述第一参考数据匹配的第二参考数据,若未查找到,则执行步骤S203,若查找到,则执行步骤S204。
在获取到第一参考数据之后,字幕软件可以在存储器中查询与上述第一参考数据匹配的第二参考数据。第二参考数据为历史字幕结果对应的参考数据。
如果字幕软件无法在存储器中查询到与上述第一参考数据匹配的第二参考数据,则表示媒体播放软件当前播放的媒体内容为未被识别过的媒体内容,字幕软件可以执行步骤S203。
如果字幕软件可以在存储器中查询到与上述第一参考数据匹配的第二参考数据,则表示媒体播放软件当前播放的媒体内容为被识别过的媒体内容,字幕软件可以执行步骤S204。
步骤S203、将上述实时字幕结果和上述实时字幕结果对应的第一参考数据关联存储在存储器中,并返回执行步骤S201。
如果字幕软件无法在存储器中查询到与上述第一参考数据匹配的第二参考数据,则字幕软件可以将上述实时字幕结果和上述实时字幕结果对应的第一参考数据关联存储在存储器中,以备后续的调用需求。
之后,字幕软件可以返回执行步骤S201,继续识别和展示实时字幕结果。
步骤S204、从上述第二参考数据对应的历史字幕结果开始,依次展示后续的历史字幕结果。
如果字幕软件无法在存储器中查询到与上述第一参考数据匹配的第二参考数据,则字幕软件可以从上述第二参考数据对应的历史字幕结果开始,依次展示后续的历史字幕结果。
例如,如图15所示,在本示例中,电子设备为手机2,手机2的显示界面上设置有字幕展示框21,手机2内部设置有存储器22。
在某一时刻,手机2的媒体播放软件正在播放一个视频文件,字幕软件识别到的实时字幕结果为“早上好,汤姆”-“早上好,杰克”-“今天天气真好”。
此时,字幕软件可以将上述实时字幕结果作为第一参考数据,在存储器22中查询与上述实时字幕结果匹配的历史字幕结果。
假设存储器22中存储的历史字幕结果为“早上好,汤姆”-“早上好,杰克”-“今天天气真好”-“我们一起出去玩吧”-“听起来不错”-“走吧”,字幕软件可以在存储器22中查询到与上述实时字幕结果匹配的历史字幕结果。
所以,字幕软件可以确定媒体播放软件正在播放的媒体内容为被识别过的媒体内容,字幕软件可以从历史字幕结果“早上好,汤姆”-“早上好,杰克”-“今天天气真好”开始,依次展示后续的历史字幕结果“我们一起出去玩吧”-“听起来不错”-“走吧”。
此外,字幕软件根据第一参考数据进行匹配时,该第一参考数据可以是最近一条实时字幕结果对应的第一参考数据,或者,该第一参考数据可以是最近识别到的多条实时字幕结果对应的第一参考数据。
当上述第一参考数据为最近识别到的多条实时字幕结果对应的第一参考数据时,可以降低错误匹配的可能性,减少字幕出错的情况。
例如,假设字幕软件根据最近一条实时字幕结果进行匹配,最近一条实时字幕结果为“早上好”。
此时,字幕软件如果根据该实时字幕结果进行匹配,可能会在存储器中查找到多处与上述实时字幕结果匹配的历史字幕结果,从而导致字幕软件有较大的可能性匹配到错误的历史字幕结果。
假设字幕软件根据最近三条实时字幕结果进行匹配,最近三条实时字幕结果为“早上好”-“今天天气真好”-“我们一起去爬山吧。”
此时,字幕软件根据上述三条实时字幕结果进行匹配,匹配错误的可能性大大降低,极大地减小了字幕出错的可能性。
通过上述方法,当媒体播放软件播放被识别过的媒体内容时,字幕软件可以根据第一参考数据查找到匹配的第二参考数据,并从该第二参考数据对应的历史字幕结果开始,依次展示后续的历史字幕结果。
也即是说,媒体播放软件在播放被识别过的媒体内容时,字幕软件可以直接展示相应的历史字幕结果,节约实时识别语音信号所消耗的时间,降低字幕显示的延迟。
此外,字幕软件在展示历史字幕结果时,可以在整句语音信号收音完成之前,直接展示该句语音信号对应的完整的历史字幕结果,便于用户观看,极大地提高了用户的观看体验。
字幕软件在展示历史字幕结果时,应当确保历史字幕结果的播放速度与媒体播放软件的播放速度一致,避免实时展示的历史字幕结果与实时播放的媒体内容脱节,影响用户的观看体验。
但是,在本实施例中,字幕软件无法获取到媒体播放软件的媒体参数,所以,字幕软件无法直接获取到媒体播放软件的播放速度。
为此,字幕软件可以选择合适的调速方式,识别媒体播放软件的播放速度和调整历史字幕结果的播放速度,使得历史字幕结果的播放速度与媒体播放软件的播放速度保持一致。
在一些可能的实现方式中,字幕软件可以根据语音信号的断句结果,调整历史字幕结果的播放速度。
字幕软件在展示历史字幕结果的时候,可以继续获取实时播放的语音信号,根据该语音信号识别实时字幕结果。
字幕软件在识别实时字幕结果的过程中,除了可以通过ASR模型将语音信号转化成识别文本之外,还可以通过ASR模型对上述语音信号进行断句。
可以理解的是,当正在播放的媒体文件的语言类型与目标语言类型一致,或者,正在播放的媒体文件的语言类型与目标语言类型较为相似时,语音信号的断句方式与历史字幕结果的断句方式一致,一句语音信号对应一句历史字幕结果。
所以,当正在播放的媒体文件的语言类型与目标语言类型一致,或者,正在播放的媒体文件的语言类型与目标语言类型较为相似时,字幕软件可以根据ASR模型反馈的断句结果,判断当前展示的历史字幕结果对应的语音信号是否收音完成。
如果上述断句结果指示当前展示的历史字幕结果对应的语音信号未收音完成,则字幕软件可以暂时不展示下一条历史字幕结果。
如果上述断句结果指示当前展示的历史字幕结果对应的语音信号已收音完成,则字幕软件可以展示下一条历史字幕结果。
例如,请参阅图16,在本示例中,电子设备为平板电脑31,平板电脑31的显示界面上设置有字幕展示框32。
在某一时刻,平板电脑的媒体播放软件正在播放一个视频文件。媒体播放软件正在播放的媒体内容为已被识别过的媒体内容,字幕软件正在依次展示历史字幕结果。
如图16所示,字幕软件当前获取到的语音信号为“今天天”,字幕软件在字幕展示框32中展示的历史字幕结果为“今天天气真好”。
此时,字幕软件可以通过ASR模型对上述语音信号进行识别,确定上述语音信号是不完整的语句。
然后,字幕软件可以根据ASR模型反馈的断句结果,判定当前展示的历史字幕结果对应的语音信号未收音完成,暂时不展示下一条历史字幕结果。
字幕软件继续获取实时播放的语音信号,如图17所示,当字幕软件获取到的语音信号为“今天天气真好我”时,字幕软件可以通过ASR模型对上述语音信号进行识别,确定“今天天气真好”为一句完整的语音信号。
所以,字幕软件可以根据ASR模型反馈的断句结果,判定当前展示的历史字幕结果对应的语音信号已收音完成。
此时,字幕软件可以展示下一条历史字幕结果“我们一起去爬山吧”。
通过上述示例可知,在本实现方式中,字幕软件可以通过ASR模型的断句结果识别语音信号的播放进度。
媒体播放软件每播放一句语音信号,字幕软件就展示一条历史字幕结果。
如果媒体播放软件的播放速度增快,则历史字幕结果的播放速度也随之增快。
如果媒体播放软件的播放速度减缓,则历史字幕结果的播放速度也随之减缓。
所以,通过上述实现方式,字幕软件可以动态调整历史字幕结果的播放速度,确保历史字幕结果的播放速度与媒体播放软件的播放速度保持一致,避免实时展示的历史字幕结果与实时播放的媒体内容脱节,保障用户的观看体验。
在另一些可能的实现方式中,字幕软件可以根据第一参考数据和第二参考数据的比对结果,调整历史字幕结果的播放速度。
字幕软件在识别实时字幕结果的过程中,可以将实时字幕结果对应的第一参考数据与正在展示的历史字幕结果的第二参考数据进行比对。
如果第一参考数据与第二参考数据不一致,则表示正在展示的历史字幕结果对应的 语音信号尚未收音完成,字幕软件可以暂时不展示下一条历史字幕结果。
如果第一参考数据与第二参考数据一致,则表示正在展示的历史字幕结果对应的语音信号已经收音完成,字幕软件可以展示下一条历史字幕结果。
例如,假设参考数据中包括音素,电子设备的媒体播放软件正在播放一个音频文件,媒体播放软件正在播放的媒体内容为已被识别过的媒体内容,字幕软件正在依次展示历史字幕结果。
在某一时刻,字幕软件识别到的实时字幕结果为“听起来”,字幕软件正在展示的历史字幕结果为“听起来不错”。
此时,上述实时字幕结果对应的第一参考数据为“tingqilai”,上述历史字幕结果对应的第二参考数据为“tingqilaibucuo”。
字幕软件将第一参考数据“tingqilai”与第二参考数据“tingqilaibucuo”进行比较,两者不一致,所以字幕展示模块可以判定正在展示的历史字幕结果对应的语音信号尚未收音完成,不展示下一条历史字幕结果。
过一段时间之后,字幕软件识别到的实时字幕结果为“听起来不错”。此时,该实时字幕结果对应的第一参考数据为“tingqilaibucuo”。
字幕软件将第一参考数据“tingqilaibucuo”与第二参考数据“tingqilaibucuo”进行比较,两者一致,所以字幕展示模块可以判定正在展示的历史字幕结果对应的语音信号已收音完成。
此时,字幕展示模块可以展示下一条历史字幕结果。
通过上述示例可知,在本实现方式中,字幕软件可以通过第一参考数据和第二参考数据的比对结果,识别语音信号的播放进度。
媒体播放软件每播放一句语音信号,字幕软件就展示一条历史字幕结果。
如果媒体播放软件的播放速度增快,则历史字幕结果的播放速度也随之增快。
如果媒体播放软件的播放速度减缓,则历史字幕结果的播放速度也随之减缓。
所以,通过上述实现方式,字幕软件可以动态调整历史字幕结果的播放速度,确保历史字幕结果的播放速度与媒体播放软件的播放速度保持一致,避免实时展示的历史字幕结果与实时播放的媒体内容脱节,保障用户的观看体验。
在另一些可能的实现方式中,存储器中还存储有历史字幕结果对应的历史收音时长和历史展示时长。
其中,历史收音时长是指历史时间段中,字幕软件开始接收该历史字幕结果对应的语音信号的时间到结束接收该历史字幕结果对应的语音信号的时间的时间跨度。
历史展示时长是指历史时间段中,开始展示该历史字幕结果的时间到开始展示下一条历史字幕结果的时间的时间跨度。
字幕软件在展示历史字幕结果时,可以获取该历史字幕结果对应的历史收音时长,以及,获取该历史字幕结果对应的语句的实时收音时长。
其中,上述实时收音时长是指与历史字幕结果相对的实时字幕结果的收音时长。
然后,字幕软件可以根据上述历史收音时长和实时收音时长,调整后续各条历史字幕结果的历史展示时长。
当实时收音时长大于历史收音时长时,表示媒体播放软件降低了播放速度,字幕软 件可以增大后续各条历史字幕结果的历史展示时长,降低历史字幕结果的播放速度。
当实时收音时长小于历史收音时长时,表示媒体播放软件提高了播放速度,字幕软件可以减少后续各条历史字幕结果的历史展示时长,提高历史字幕结果的播放速度。
当实时收音时长等于历史收音时长时,表示媒体播放软件没有改变播放速度,字幕软件可以不调整后续各条历史字幕结果的历史展示时长。
具体地,字幕软件可以分别用后续各条历史字幕结果的历史展示时长除以上述历史收音时长与上述实时收音时长的比值,得到后续各条历史字幕结果对应的实际展示时长。
然后,字幕软件可以根据各条历史字幕结果对应的实际展示时长,依次展示后续各条历史字幕结果。
例如,假设电子设备的媒体播放软件正在播放一个音频文件,媒体播放软件正在播放的媒体内容为已被识别过的媒体内容,字幕软件正在依次展示历史字幕结果“早上好”-“今天天气真好”-“我们一起去爬山吧”,历史字幕结果“早上好”的历史收音时间为0.1秒,历史字幕结果“今天天气真好”的历史展示时长为0.18秒,历史字幕结果“我们一起去爬山吧”的历史展示时长为0.2秒。
字幕软件在展示历史字幕结果“早上好”的过程中,可以持续获取实时播放的语音信号,根据该语音信号识别实时字幕结果。
假设字幕软件在识别到实时字幕结果“早上好”后,获取到实时字幕结果“早上好”对应的实时收音时间为0.05秒。
此时,字幕软件可以将上述实时收音时间和上述历史收音时间进行比对,确定上述历史收音时间和上述实时收音时间的比值为0.1/0.05=2,表示媒体播放软件的播放速度已调整为2倍速。
然后,字幕软件可以将“今天天气真好”对应的历史展示时长除以上述比值,得到“今天天气真好”对应的实际展示时长为0.2/2=0.1秒;
以及,字幕软件还可以将“我们一起去爬山吧”对应的历史展示时长除以上述比值,得到“我们一起去爬山吧”对应的实际展示时长为0.18/2=0.09秒。
之后,字幕软件可以根据“今天天气真好”对应的实际展示时长和“我们一起去爬山吧”对应的实际展示时长,依次展示“今天天气真好”和“我们一起去爬山吧”。
字幕软件在展示“今天天气真好”0.1秒之后,展示“我们一起去爬山吧”;字幕软件在展示“我们一起去爬山吧”0.09秒之后,展示“我们一起去爬山吧”的下一条历史字幕结果。
通过上述示例可知,在本实现方式中,字幕软件可以根据正在展示的历史字幕结果对应的历史收音时长和实时收音时长,确定媒体播放软件的播放速度。
然后,字幕软件可以根据媒体播放软件的播放速度,调整后续各条历史字幕结果对应的历史展示时长,使得后续各条历史字幕结果的播放速度与媒体播放软件的播放一致,避免实时展示的历史字幕结果与实时播放的媒体内容脱节,保障用户的观看体验。
另外,字幕软件在展示历史字幕结果的过程中,有可能出现历史字幕结果与正在播放的媒体内容不匹配的情况。所以,为了提高字幕软件展示的字幕结果的准确性,字幕软件可以继续根据识别实时字幕结果和实时字幕结果对应的第一参考数据,将第一参考数据与当前展示的历史字幕结果的第二参考数据进行比对。
如果上述第一参考数据与上述第二参考数据保持一致,则表示当前展示的历史字幕结果没有出错,字幕软件可以继续执行步骤S204,继续展示后续的历史字幕结果。
如果上述第一参考数据与上述第二参考数据不一致,则表示当前展示的历史字幕结果发生错误。
此时,字幕软件可以停止执行步骤S204,停止展示后续的历史字幕结果,并返回执行步骤S201,识别和展示实时字幕结果,以确保用户观看到字幕结果的准确性。
其中,字幕软件将第一参考数据与第二参考数据进行比对时,可以是根据最近一条实时字幕结果对应的第一参考数据进行比对,或者,也可以是根据最近识别到的多条实时字幕结果对应的第一参考数据进行比对。
当字幕软件根据最近识别到的多条实时字幕结果对应的第一参考数据进行比对时,可以降低误识别的可能性。
例如,假设字幕软件正在展示的历史字幕结果为“早上好”,最近一条实时字幕结果为“枣上好”。
如果字幕软件直接将上述实时字幕结果与正在展示的历史字幕结果进行比对,则字幕软件可能会将正在展示的历史字幕结果误识别为错误的历史字幕结果,停止展示后续的历史字幕结果,并展示实时字幕结果。
如果字幕软件根据最近三条实时字幕结果进行比对,则字幕软件可以继续展示后面的历史字幕结果,以及,继续识别实时字幕结果。
假设后面两条历史字幕结果为“今天天气真好”和“我们一起去爬山吧”,后面两条实时字幕结果为“今天天气真好”和“我们一起去爬山吧”。
此时,字幕软件可以确定只有第一条历史字幕结果和第一条实时字幕结果不一致,后续两条历史字幕结果和后续两条实时字幕结果一致。
所以,字幕软件可以确定第一条历史字幕结果和第一条实时字幕结果的比对结果不可信,字幕软件可以继续展示后续的历史字幕结果。
通过上述示例可知,字幕软件可以将历史字幕结果对应的第二参考数据和实时字幕结果对应的第一参考数据进行比对,识别正在展示的历史字幕结果是否发生错误。
字幕软件在比对的过程,可以根据最近一条实时字幕结果对应的第一参考数据进行比对,或者,也可以根据最近识别到的多条实时字幕结果对应的第一参考数据进行比对。
当字幕软件根据最近识别到的多条实时字幕结果进行比对时,可以降低误识别的可能性,提高用户的观看体验。
此外,当字幕软件确定当前展示的历史字幕结果发生错误时,字幕软件还可以返回执行步骤S201,识别和展示实时字幕结果,以确保用户观看到的字幕结果的准确性。
为了便于理解,以下将结合具体的应用场景对上述字幕控制方法进行详细描述。
请参阅图18,在本示例中,假设电子设备为平板电脑4,平板电脑4上安装有媒体播放软件和字幕软件。
当用户想要启用字幕识别功能时,用户可以启动平板电脑4上安装的字幕软件。字幕软件在启动后,可以在平板电脑4的显示屏上显示字幕展示框41,字幕展示框41用于展示字幕软件识别到的字幕结果,该字幕结果可以包括实时字幕结果和历史字幕结果。
然后,用户使用媒体播放软件打开了一个未识别过的英语视频。
此时,媒体播放软件可以展示媒体播放软件的软件界面,在媒体播放软件的软件界面中播放上述英语视频的视频画面和语音信号,并提供进度条42,进度条42可以用于控制上述英语视频的播放进度。
字幕展示框41与媒体播放软件的软件界面层叠显示,字幕展示框41位于媒体播放软件的软件界面的上层。此外,字幕软件还可以根据媒体播放软件的横屏播放模式和竖屏播放模式,对应调节字幕展示框41的形态。
在字幕识别的过程中,字幕软件可以实时获取正在播放的语音信号,对该语音信号进行识别,得到实时字幕结果并展示在字幕展示框41中。
此外,字幕软件还可以将实时字幕结果对应的英语文本确定为第一参考数据,将实时字幕结果和该第一参考数据关联存储到平板电脑4的存储器中。
示例性地,如图19中的(a)场景所示,当字幕软件获取到的语音信号为“Good”时,字幕展示框41可以展示“好”;如图19中的(b)场景所示,当字幕软件获取到的语音信号为“Good morning,Tom”时,字幕展示框41可以展示“早上好,汤姆”。
如图19中的(c)场景所示,当字幕软件获取到下一句语音信号“Good”时,字幕展示框41可以展示“好”;如图19中的(d)场景所示,当字幕软件获取到的语音信号为“Good morning,Jack”时,字幕展示框41可以展示“早上好,杰克”。
此时,如图20所示,字幕软件可以将实时字幕结果“早上好,汤姆”与第一参考数据“Good morning,Tom”关联存储,将实时字幕结果“早上好,杰克”与第一参考数据“Good morning,Jack”关联存储。
与此同时,为了合理利用存储器中存储的历史字幕结果,字幕软件在识别和展示实时字幕结果的过程中,还可以将最近识别到的三条实时字幕结果的第一参考数据与存储器中的第二参考数据进行匹配。
如果存储器中不存在与上述第一参考数据对应的第二参考数据,则表示媒体播放软件正在播放的媒体内容为未被识别过的媒体内容,字幕软件可以继续展示实时字幕结果。
如果存储器中存在与上述第一参考数据对应的第二参考数据,则表示媒体播放软件正在播放的媒体内容为被识别过的媒体内容。
此时,字幕软件可以从上述第二参考数据对应的历史字幕结果开始,依次展示后续的历史字幕结果。
如上所述,上述英语视频为未被识别过的视频,因此,字幕软件始终无法在存储器中查找到与第一参考数据匹配的第二参考数据,字幕软件可以持续展示实时字幕结果。
如图21中的场景所示,用户拖动进度条42,从“01:45”拖动到“00:00”,重新播放该英语视频,字幕软件继续展示实时字幕结果。
如图22所示,在媒体播放软件的播放进度为“00:05”时,字幕软件最近三条实时字幕结果为“早上好,汤姆”-“早上好,杰克”-“今天天气真好”,这三条实时字幕结果对应的第一参考数据为“Good morning,Tom”-“Good morning,Jack”-“It's a beautiful day”。
此时,字幕软件检测到存储器中存储有与上述三条第一参考数据匹配的第二参考数据。所以,字幕软件可以从上述第二参考数据对应的历史字幕结果开始,依次展示后续的历史字幕结果“我们一起出去玩吧”-“听起来不错”-“走吧”。
字幕软件在展示历史字幕结果的过程中,可以继续识别实时字幕结果,并将实时字幕结果对应的第一参考数据与正在展示的历史字幕结果的第二参考数据进行比对。
当实时字幕结果对应的第一参考数据与正在展示的历史字幕结果的第二参考数据一致时,字幕软件可以继续展示下一条历史字幕结果。
示例性地,字幕软件在展示历史字幕结果“我们一起出去玩吧”时,字幕软件可以将实时识别到的英语文本与历史字幕结果“我们一起出去玩吧”对应的第二参考数据“Let's go out and play”进行比对。
当实时识别到的英语文本与“Let's go out and play”一致时,表示历史字幕结果“我们一起出去玩吧”对应的语音信号已经收音完成,字幕软件可以展示下一条历史字幕结果“听起来不错”。
以此类推,当实时识别到的英语文本与“Sounds good”一致时,字幕软件可以展示下一条历史字幕结果“走吧”。当实时识别到的英语文本与“Let's go”一致时,字幕软件可以展示下一条历史字幕结果。
如图23所示,用户再次拖动了媒体播放软件的进度条42,返回播放未被识别过的视频画面,字幕软件继续展示历史字幕结果和识别实时字幕结果。
但是,字幕软件在识别实时字幕结果时,发现实时识别到的英文文本与正在展示的历史字幕结果的第二参考数据不一致。
此时,字幕软件可以停止展示后续的历史字幕结果,并展示实时字幕结果。
示例性地,假设字幕软件正在播放的历史字幕结果为“风景真漂亮”,该历史字幕结果对应的第二参考数据为“The scenery is beautiful”。
但是,字幕软件实时识别到的英语文本为“We should go home”,与第二参考数据“The scenery is beautiful”不一致。
此时,字幕软件可以判定正在显示的历史字幕结果“风景真漂亮”为错误的字幕结果。所以,如图24所示,字幕软件可以停止展示后续的历史字幕结果,并展示实时字幕结果“我们该回家了”。
而且,字幕软件可以根据实时字幕结果对应的第一参考数据,在存储器中重新查找与第一参考数据匹配的第二参考数据。
通过上述示例可知,在本实施例的字幕控制方法中,字幕软件可以获取实时字幕结果对应的第一参考数据,并在存储器中查找与第一参考数据匹配的第二参考数据。
如果存储器无法查找到与第一参考数据匹配的第二参考数据,则表示媒体播放软件正在播放的媒体内容为未被识别过的媒体内容,字幕软件可以继续展示实时字幕结果。
如果存储器可以查找到与第一参考数据匹配的第二参考数据,则表示媒体播放软件正在播放的媒体内容为被识别过的媒体内容,字幕软件可以从上述第二参考数据匹配的历史字幕结果开始,依次展示后续的历史字幕结果。
在现有的字幕显示方案中,媒体播放软件播放的媒体文件中嵌入有字幕文件,所以,媒体播放软件可以统一管理播放的媒体内容和相应的字幕结果。
在本实施例的字幕控制方法中,电子设备可以通过独立于媒体播放软件的字幕软件,对媒体播放软件正在播放的媒体内容的语音信号进行实时识别,根据识别结果确认正在播放的媒体内容是否为被识别过的媒体内容,以确定是展示实时识别到的实时字幕结果, 或者,展示与正在播放的媒体内容对应的历史字幕结果。
当媒体播放软件在播放未被识别过的媒体内容时,字幕软件可以持续性地对媒体播放软件正在播放的媒体内容的语音信号进行实时识别,并展示识别到的实时字幕结果。
当媒体播放软件在播放被识别过的媒体内容时,字幕软件可以展示正在播放的媒体内容对应的历史字幕结果,节约实时识别字幕结果所消耗的时间,降低字幕展示的延迟。
字幕软件在展示历史字幕结果时,可以在整句语音信号收音完成之前,直接展示该句语音信号对应的完整的历史字幕结果,便于用户观看,极大地提高了用户的观看体验。
而且,字幕软件可以通过断句识别、参考数据比对、收音时长比对等方式,识别媒体播放软件的播放速度,并相应调整历史字幕结果的播放速度,从而使各条历史字幕结果的播放速度与媒体播放软件的播放一致,避免实时展示的历史字幕结果与实时播放的媒体内容脱节,保障用户的观看体验。
此外,字幕软件还可以将实时识别到的第一参考数据与正在展示的历史字幕结果的第二参考数据进行比对。当第一参考数据与第二参考数据不一致时,字幕软件可以停止展示后续的历史字幕结果,并展示实时字幕结果,及时纠正错误的字幕结果,保障用户的观看体验。
另外,在本实施例中,字幕软件也可以支持字幕回放功能,其实现方式可以参考前一实施例所描述的方法,本实施例不再赘述。
应理解,上述实施例中各步骤的序号并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/电子设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/电子设备实施例仅仅是示意性的,例如,上述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请实现上述实施 例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,上述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,上述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读存储介质不包括电载波信号和电信信号。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种字幕控制方法,其特征在于,包括:
    启动第一媒体软件,所述第一媒体软件用于播放媒体文件;
    启动第二字幕软件,所述第二字幕软件用于在所述第一媒体软件播放媒体文件时,将所述第一媒体软件播放的媒体内容的语音信号识别成相应的字幕结果并进行显示;
    通过所述第一媒体软件播放第一媒体文件;
    通过所述第二字幕软件检测所述第一媒体软件当前播放的第一媒体内容;
    若确定所述第一媒体内容为被识别过的媒体内容,则从历史保存的字幕结果中获取与所述第一媒体内容相对应的第一字幕结果进行显示。
  2. 如权利要求1所述的方法,其特征在于,在所述通过所述第二字幕软件检测所述第一媒体软件当前播放的第一媒体内容之后,还包括:
    若确定所述第一媒体内容为未被识别过的媒体内容,则将所述第一媒体内容的第一语音信号识别成第二字幕结果并进行显示。
  3. 如权利要求1或2所述的方法,其特征在于,所述检测所述第一媒体软件当前播放的第一媒体内容,包括:
    获取所述第一媒体内容对应的媒体参数,所述媒体参数包括所述第一媒体内容对应的时间戳;
    从历史保存的字幕结果中,查找与所述媒体参数相对应的第一字幕结果。
  4. 如权利要求3所述的方法,其特征在于,所述若确定所述第一媒体内容为被识别过的媒体内容,则从历史保存的字幕结果中获取与所述第一媒体内容相对应的第一字幕结果进行显示,包括:
    若查找到与所述媒体参数相对应的第一字幕结果,则显示所述第一字幕结果。
  5. 如权利要求1或2所述的方法,其特征在于,所述检测所述第一媒体软件当前播放的第一媒体内容,包括:
    对所述第一媒体内容的第一语音信号进行识别,得到第一参考数据;所述第一参考数据包括所述第一语音信号的语音特征、对所述第一语音信号进行语音识别得到的识别文本和所述识别文本对应的目标语言类型的翻译文本中的一项或多项;
    从历史保存的参考数据中,查找与所述第一参考数据匹配的第二参考数据。
  6. 如权利要求5所述的方法,其特征在于,所述若确定所述第一媒体内容为被识别过的媒体内容,则从历史保存的字幕结果中获取与所述第一媒体内容相对应的第一字幕结果进行显示,包括:
    若查找到与所述第一参考数据匹配的第二参考数据,则从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示。
  7. 如权利要求6所述的方法,其特征在于,所述从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示,包括:
    获取所述第一语音信号的断句结果,所述断句结果用于指示当前播放的一句第一语音信号已收音完成,或者,指示当前播放的一句第一语音信号尚未收音完成;
    根据所述断句结果,从所述第一字幕结果开始,每收音完一句第一语音信号,则展示下一条历史保存的字幕结果。
  8. 如权利要求6所述的方法,其特征在于,所述从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示,包括:
    从所述第一字幕结果开始,将所述第一参考数据与正在展示的整句字幕结果的参考数据进行比对;
    当所述第一参考数据与正在展示的整句字幕结果的参考数据一致时,展示下一条历史保存的字幕结果。
  9. 如权利要求6所述的方法,其特征在于,所述从历史保存的字幕结果中获取所述第二参考数据对应的第一字幕结果进行显示,包括:
    从历史保存的收音时长中,获取所述第一字幕结果对应的第一收音时长;
    根据所述第一语音信号,确定所述第一字幕结果对应的第二收音时长;
    根据所述第一收音时长和所述第二收音时长确定调速参数;
    根据所述调速参数调整各历史保存的展示时长,得到各历史保存的字幕结果对应的实际展示时长;
    从所述第一字幕结果开始,根据各历史保存的字幕结果对应的实际展示时长,依次展示各历史保存的字幕结果。
  10. 如权利要求1至9中任一项所述的方法,其特征在于,在所述第一媒体软件启动后,展示媒体播放界面,所述媒体播放界面用于播放媒体文件;
    在所述第二字幕软件启动后,展示字幕展示框,所述字幕展示框用于显示所述第二字幕软件识别到的字幕结果;
    所述字幕展示框与所述媒体播放界面层叠显示,且所述字幕展示框位于所述媒体播放界面的上层。
  11. 如权利要求10所述的方法,其特征在于,当所述媒体播放界面处于横屏播放状态时,所述字幕展示框的宽度为第一宽度;
    当所述媒体播放界面处于竖屏播放状态时,所述字幕展示框的宽度为第二宽度;所述第一宽度大于或等于所述第二宽度。
  12. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器被配置为执行所述计算机程序时实现如权利要求1至11任一项所述的方法。
  13. 一种计算机可读存储介质,所述计算机可读存储介质被配置为存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至11任一项所述的方法。
  14. 一种计算机程序产品,其特征在于,所述计算机程序产品被配置为在电子设备上运行时,使得电子设备执行如权利要求1至11任一项所述的方法。
  15. 一种芯片***,其特征在于,所述芯片***包括存储器和处理器,所述处理器被配置为执行所述存储器中存储的计算机程序,以实现如权利要求1至11任一项所述的方法。
PCT/CN2022/130303 2021-11-30 2022-11-07 字幕控制方法、电子设备及计算机可读存储介质 WO2023098412A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111447527.5A CN116205216A (zh) 2021-11-30 2021-11-30 字幕控制方法、电子设备及计算机可读存储介质
CN202111447527.5 2021-11-30

Publications (1)

Publication Number Publication Date
WO2023098412A1 true WO2023098412A1 (zh) 2023-06-08

Family

ID=86515151

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130303 WO2023098412A1 (zh) 2021-11-30 2022-11-07 字幕控制方法、电子设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN116205216A (zh)
WO (1) WO2023098412A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458321A (zh) * 2012-06-04 2013-12-18 联想(北京)有限公司 一种字幕加载方法及装置
CN108259971A (zh) * 2018-01-31 2018-07-06 百度在线网络技术(北京)有限公司 字幕添加方法、装置、服务器及存储介质
US20210250660A1 (en) * 2020-02-12 2021-08-12 Shanghai Bilibili Technology Co., Ltd. Implementation method and system of real-time subtitle in live broadcast and device
CN113630620A (zh) * 2020-05-06 2021-11-09 阿里巴巴集团控股有限公司 多媒体文件播放***、相关方法、装置及设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458321A (zh) * 2012-06-04 2013-12-18 联想(北京)有限公司 一种字幕加载方法及装置
CN108259971A (zh) * 2018-01-31 2018-07-06 百度在线网络技术(北京)有限公司 字幕添加方法、装置、服务器及存储介质
US20210250660A1 (en) * 2020-02-12 2021-08-12 Shanghai Bilibili Technology Co., Ltd. Implementation method and system of real-time subtitle in live broadcast and device
CN113630620A (zh) * 2020-05-06 2021-11-09 阿里巴巴集团控股有限公司 多媒体文件播放***、相关方法、装置及设备

Also Published As

Publication number Publication date
CN116205216A (zh) 2023-06-02

Similar Documents

Publication Publication Date Title
US20220130360A1 (en) Song Recording Method, Audio Correction Method, and Electronic Device
CN111314775B (zh) 一种视频拆分方法及电子设备
WO2020119455A1 (zh) 视频播放过程实现单词或语句复读的方法及电子设备
CN114255745A (zh) 一种人机交互的方法、电子设备及***
CN112214636A (zh) 音频文件的推荐方法、装置、电子设备以及可读存储介质
CN115312068B (zh) 语音控制方法、设备及存储介质
CN112154431A (zh) 一种人机交互的方法及电子设备
CN113488042B (zh) 一种语音控制方法及电子设备
EP4343756A1 (en) Cross-device dialogue service connection method, system, electronic device, and storage medium
CN114449333B (zh) 视频笔记生成方法及电子设备
CN114694646A (zh) 一种语音交互处理方法及相关装置
WO2023098412A1 (zh) 字幕控制方法、电子设备及计算机可读存储介质
CN116052648B (zh) 一种语音识别模型的训练方法、使用方法及训练***
WO2023273904A1 (zh) 音频数据的存储方法及其相关设备
WO2023040658A1 (zh) 语音交互方法及电子设备
CN114390341B (zh) 一种视频录制方法、电子设备、存储介质及芯片
CN113056908B (zh) 视频字幕合成方法、装置、存储介质及电子设备
CN115995236A (zh) 音色提取、模型训练方法、装置、设备、介质及程序
CN117153166B (zh) 语音唤醒方法、设备及存储介质
WO2023065854A1 (zh) 分布式语音控制方法及电子设备
CN116665643B (zh) 韵律标注方法、装置和终端设备
WO2023142757A1 (zh) 语音识别方法、电子设备及计算机可读存储介质
CN112562687B (zh) 音视频处理方法、装置、录音笔和存储介质
CN114299923A (zh) 音频识别方法、装置、电子设备及存储介质
CN116665635A (zh) 语音合成方法、电子设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22900217

Country of ref document: EP

Kind code of ref document: A1