WO2020134896A1 - Procédé et dispositif destinés à invoquer un fichier de synthèse de parole - Google Patents

Procédé et dispositif destinés à invoquer un fichier de synthèse de parole Download PDF

Info

Publication number
WO2020134896A1
WO2020134896A1 PCT/CN2019/122545 CN2019122545W WO2020134896A1 WO 2020134896 A1 WO2020134896 A1 WO 2020134896A1 CN 2019122545 W CN2019122545 W CN 2019122545W WO 2020134896 A1 WO2020134896 A1 WO 2020134896A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
app
file
registered
synthesis file
Prior art date
Application number
PCT/CN2019/122545
Other languages
English (en)
Chinese (zh)
Inventor
韩喆
王磊
傅春霖
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020134896A1 publication Critical patent/WO2020134896A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/63Routing a service request depending on the request content or context

Definitions

  • This specification relates to the field of computers, and in particular to a method and device for calling a speech synthesis file.
  • the embodiments of the present specification provide a method and a device for calling a speech synthesis file, which solve the problems raised by the background art mentioned above.
  • the voice synthesis file is downloaded from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP.
  • the download address of the speech synthesis file
  • the voice synthesis file of the client is invoked for the registered APP to perform voice playback according to the voice synthesis file.
  • the method before detecting whether there is a voice synthesis file required for the registered APP on the client, the method further includes:
  • the distributed voice configuration file includes the server corresponding to the registered APP encrypting the voice configuration file delivered, and then assigning it to The first verification information corresponding to the registered APP;
  • determining whether the first verification information matches the second verification information pre-stored by the client specifically includes:
  • the method further includes:
  • the voice basic training model is based on the registered APP
  • the method further includes:
  • the registered APP performs voice playback according to the voice synthesis file.
  • the registered APP performs voice playback according to the voice synthesis file, specifically including: the server corresponding to the registered APP encrypts the voice synthesis file according to a preset rule; the encrypted voice synthesis After the file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
  • An apparatus for invoking a speech synthesis file provided by an embodiment of this specification, the apparatus includes:
  • the detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance;
  • the downloading unit is configured to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if it is detected that the voice synthesis file does not exist on the client.
  • the configuration file has a built-in download address for the speech synthesis file;
  • the calling unit is configured to call the voice synthesis file of the client if it is detected that the client has the voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • the device further includes:
  • a pulling unit configured to pull the voice configuration file from the server corresponding to the registered APP
  • a receiving unit configured to receive a voice configuration file delivered by a server corresponding to the registered APP, and the voice configuration file delivered includes the server corresponding to the registered APP performing the voice configuration file issued by the server After encryption, it is assigned to the first verification information corresponding to the registered APP;
  • the judging unit is used to judge whether the first verification information matches the second verification information pre-stored by the client;
  • the verification unit is configured to verify that the voice configuration file delivered is correct when it is determined that the first verification information matches the second verification information pre-stored by the client.
  • the judgment unit is specifically used to:
  • the device further includes:
  • the training unit is configured to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP can pass the built-in voice basic training
  • the model trains the APP developer's customized voice model, and generates a speech synthesis file corresponding to the registered APP from the APP developer's customized voice model according to the pre-stored text.
  • the voice basic training model is based on The need for the registered APP to play voice needs to be a model trained by several voice samples provided in advance and can be shared by the registered APP.
  • the device further includes:
  • a calculation unit configured to calculate a first summary value corresponding to the speech synthesis file
  • the judging unit is further used to judge whether the second summary value corresponding to the speech synthesis file previously stored in the speech configuration file is the same as the first summary value;
  • the judgment unit judges that the second digest value is the same as the first digest value, the registered APP performs voice playback according to the voice synthesis file.
  • the registered APP performs voice playback according to the voice synthesis file, specifically including: the server corresponding to the registered APP encrypts the voice synthesis file according to a preset rule; the encrypted voice synthesis After the file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
  • a voice system provided by an embodiment of this specification includes a terminal and a server, and the terminal includes a voice SDK running in the terminal, a registered APP, and an APP developer terminal;
  • the APP developer terminal is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP;
  • the server is used to train the APP developer's customized voice model through the built-in voice basic training model, and input the pre-stored text into the APP developer's customized voice model to generate the registered APP needs.
  • a voice synthesis file, the voice basic training model is a model that is obtained by training a number of voice samples provided in advance according to the needs of the registered APP to play voice and can be shared by registered APPs;
  • the voice SDK is used to pull the voice configuration file from the server corresponding to the registered APP; receiving the voice configuration file delivered by the server corresponding to the registered APP, the distributed voice configuration file includes all
  • the server corresponding to the registered APP encrypts the delivered voice configuration file and distributes it to the first verification information corresponding to the registered APP; judging the first verification information and the second pre-stored by the client Whether the verification information matches; when judging that the first verification information matches the second verification information pre-stored by the client, verify that the delivered voice configuration file is correct; detect whether the client has a registered APP that needs to be used Voice synthesis file, the registered APP is an APP that needs to be pre-registered and needs to use a voice synthesis file; if it is detected that the voice synthesis file does not exist on the client, the voice configuration file corresponding to the registered APP corresponds to the registered APP Server downloads the speech synthesis file, and the speech configuration file has a built-in download address for the speech synthesis file; if it
  • a computer-readable medium provided by an embodiment of the present specification has stored thereon computer-readable instructions, and the computer-readable instructions may be executed by a processor to perform the following steps:
  • the voice synthesis file is downloaded from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP.
  • the download address of the speech synthesis file
  • the voice synthesis file of the client is invoked for the registered APP to perform voice playback according to the voice synthesis file.
  • An apparatus for calling a speech synthesis file includes a memory for storing computer program instructions and a processor for executing program instructions, where, when the computer program instructions are executed by the processor, Trigger the device to perform the following steps:
  • the detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance;
  • the downloading unit is configured to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if it is detected that the voice synthesis file does not exist on the client.
  • the configuration file has a built-in download address for the speech synthesis file;
  • the calling unit is configured to call the voice synthesis file of the client if it is detected that the client has the voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • the APP developer can train the APP developer's customized voice model through the server corresponding to the registered APP, and then input the pre-stored text into the APP developer's customized voice model to generate the APP developer's voice synthesis file. , When the registered APP needs to use the speech synthesis file, download the corresponding speech synthesis file for the registered APP to play voice;
  • the voice system can support multiple registered APPs, so that the utilization rate of the voice system is fully utilized.
  • FIG. 1 is a schematic flowchart of a method for invoking a speech synthesis file provided in Embodiment 1 of the present specification;
  • FIG. 2 is a schematic flowchart of a method for invoking a speech synthesis file provided in Embodiment 2 of this specification;
  • FIG. 3 is a schematic structural diagram of an apparatus for invoking a speech synthesis file provided in Embodiment 3 of this specification;
  • FIG. 4 is a schematic structural diagram of a voice system provided in Embodiment 4 of the present specification.
  • FIG. 1 is a schematic flowchart of a method for calling a speech synthesis file provided by an embodiment of the present specification.
  • the schematic flowchart includes:
  • step S101 it is detected whether the client has a voice synthesis file required for the registered APP, if it exists, step S102 is executed, and if it does not exist, step S103 is executed.
  • step S101 of the embodiment of the present specification the step of detecting whether there is a voice synthesis file required by the registered APP on the client can be performed by the voice SDK.
  • the voice SDK is provided with an interface for connecting multiple APPs at the same time, that is, the APP performs to the voice SDK Registration is to connect the APP data to the voice SDK.
  • the registered APP is an application that is registered with the voice SDK in advance and requires a voice synthesis file.
  • the voice SDK is a framework for APP developers when developing software.
  • the speech synthesis file is trained by the server corresponding to the registered APP according to the needs of the APP developer.
  • the APP developer sends the voice data provided by the APP developer to reflect the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the APP developer to customize the custom through the built-in voice basic training model.
  • Voice model and input the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP.
  • the basic voice training model is a model that can be shared by registered APPs and is trained by using several voice samples provided in advance according to the needs of registered APPs to play voices. Among them, some voice samples are high-quality voice data stored on the server corresponding to the registered APP.
  • the voice basic training model determines the sampling time of high-quality voice data according to the accuracy of the entire voice system.
  • the The sampling time can be 300 hours, but when the accuracy required by the entire voice system is not high, the sampling time of high-quality voice data is selected to be 100 hours.
  • step S101 of the embodiment of the present specification after the server corresponding to the registered APP trains the voice basic training model, the APP developer uploads voice data reflecting the characteristics of the APP developer to the server corresponding to the registered APP, through the voice basis
  • the training model trains a customized voice model for APP developers.
  • the voice data reflecting the characteristics of the APP developer is the voice data recorded according to the language environment required by the APP developer. At this time, the APP developer only needs to upload a small amount of uploaded voice data to the server corresponding to the registered APP.
  • the voice basic training model can be understood as an intermediate model with a large data set provided by the server corresponding to the registered APP to the APP developer, and then the intermediate model is tuned for the voice data uploaded by the APP developer to obtain training A customized voice model reflecting the characteristics of APP developers.
  • step S101 of the embodiment of the present specification the voice data uploaded by the APP developer needs to be reviewed.
  • the management personnel of the voice system conducts the review.
  • the mechanism can be that the customized voice model that reflects the characteristics of the APP developer can be used normally after being approved. That is to say, even if a customized voice model that reflects the characteristics of the APP developer is generated but has not been approved by the reviewer, the The customized voice model reflecting the characteristics of the APP developer cannot be used normally; at the same time, the audit mechanism can also be that regardless of whether the audit result of the customized voice model reflecting the characteristics of the APP developer passes, the registered APP can be normal. Used, but once the reviewer detects that the customized voice model reflecting the characteristics of the APP developer is unqualified, the customized voice model reflecting the characteristics of the APP developer becomes invalid.
  • step S101 of the embodiment of the present specification if the APP developer does not adopt this solution, but uses a traditional method to achieve customization requirements.
  • One is that the APP developer directly uploads the voice data reflecting the characteristics of the APP developer. After any processing, this makes the robustness low; the second is that the APP developer separately produces a customized voice model that reflects the characteristics of the APP developer. This process takes a long time to execute, and it cannot guarantee the customized voice model. quality.
  • the voice system can also be applied to a video system, that is, the video basic training model is stored in the server corresponding to the registered APP.
  • Step S102 Invoking the speech synthesis file of the client.
  • the voice SDK when a registered voice SDK has an application with a voice synthesis file that needs to be used, the voice SDK first detects whether the client exists. When the client has a configuration file that needs to be called, the call is stored on the client. Voice synthesis file, registered APP can play voice according to the voice synthesis file.
  • Step S103 Download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP.
  • step S103 of the embodiment of the present specification the speech synthesis file is generated according to a pre-stored text and a customized speech model by the APP developer. If the speech synthesis file does not exist during the judgment in step S102, it means that the speech synthesis file has never been downloaded by the registered APP before.
  • the voice configuration file has a built-in download address of the voice synthesis file, and the registered APP downloads the required voice synthesis file according to the download address of the voice synthesis file for the registered APP to synthesize the voice File for voice playback.
  • step S103 of the embodiment of the present specification before the registered APP performs voice playback according to the voice synthesis file, the voice synthesis file also needs to be verified, and the specific steps may be:
  • Step 1 Calculate the first summary value corresponding to the speech synthesis file.
  • the first digest value corresponding to the speech synthesis file checks the parameter value of whether the downloaded speech synthesis file has an error, or whether the downloaded speech synthesis file has been tampered with.
  • MD5 digest can be used for implementation.
  • MD5 is a widely used cryptographic hash function that can generate a 128-bit (16-byte) hash value to ensure downloading. Whether the voice configuration file of the Internet is wrong, or whether the downloaded voice configuration file has been tampered with. For example, under Unix, many softwares have a file with the same file name and a file extension of .md5 when downloaded. There is usually only one line of text in this file, and the general structure is as follows:
  • MD5 treats the entire file as a large text message, and through its irreversible string transformation algorithm, produces this unique MD5 message digest.
  • anyone on the planet has their own unique fingerprint, which is often the most trusted method for the judiciary to identify criminals; similarly, MD5 can generate a file for any file (regardless of its size, format, or number)
  • the same unique "digital fingerprint” if anyone makes any changes to the file, its MD5 value, that is, the corresponding "digital fingerprint” will change.
  • the MD5 value of the file is like the "digital fingerprint" of the file.
  • the MD5 value of each file is different. If anyone makes any changes to the file, the MD5 value of the corresponding "digital fingerprint" will change.
  • the download server provides an MD5 value for a file in advance. After the user downloads the file, the MD5 value of the downloaded file is recalculated. By comparing whether the two values are the same, you can determine whether the downloaded file is wrong, or the downloaded file Has it been tampered with?
  • step 1 of the embodiment of the present specification calculating the first summary value is to check whether the downloaded speech synthesis file has an error, or whether the downloaded speech synthesis file has been tampered with, so as to realize real-time detection of the speech synthesis file error, once the speech synthesis file If an error occurs in the content, the error message will be reported intuitively to prevent the error from spreading in the application.
  • the check for detecting speech synthesis files can also be implemented using SHA256 digests.
  • Step 2 Determine whether the second digest value corresponding to the pre-stored voice synthesis file in the voice configuration file is the same as the first digest value. If they are the same, perform step 3; if they are not the same, return to step S103.
  • Step 3 The registered APP performs voice playback according to the voice synthesis file.
  • the server corresponding to the registered APP can be encrypted according to the built-in private key.
  • it needs to be decrypted according to the public key stored in the decryption module and then play the voice.
  • a general voice database is configured in the voice basic training model, and the general voice database includes voice broadcasts of transaction amount and time, that is, the APP developer customizes when entering numbers in the text.
  • the speech model can be directly converted into a transaction amount of speech or a time speech synthesis file, rather than a simple digital reading. For example, when the text is written at 5:00, the speech played in the speech synthesis file is 5 o'clock.
  • a registered APP when a registered APP needs to use a speech synthesis file, it detects whether the client caches the speech synthesis file, and preferentially calls the speech synthesis file cached by the client when the client has the speech synthesis file to reduce the response time of the entire speech system .
  • FIG. 2 is a schematic flowchart of a method for calling a speech synthesis file provided by an embodiment of the present specification.
  • the schematic flowchart includes:
  • Step S201 Pull the voice configuration file from the server corresponding to the registered APP.
  • step S201 of the embodiment of the present specification the customized speech model corresponding to the registered APP converts the pre-stored text into a speech synthesis file, and the speech configuration file corresponding to the registered APP includes the speech list of the speech synthesis file.
  • Step S202 Receive the voice configuration file delivered by the server corresponding to the registered APP.
  • the delivered voice configuration file includes the server corresponding to the registered APP encrypts the delivered voice configuration file and assigns it to the corresponding 1. Verification information.
  • step S202 of the embodiment of this specification the developer APP registers with the voice SDK, and the voice SDK is connected with a decryption module.
  • the decryption module can issue a decrypted public key through TSM.
  • the public key corresponds to the registered APP.
  • the unique public key the server is configured with a corresponding private key, and the server corresponding to the registered APP encrypts the voice configuration file delivered by the private key.
  • the public key and private key are a key pair, the public key is the public part of the key pair, and the private key is the non-public part.
  • the key pair composed of the public key and the private key can be guaranteed to be unique.
  • this key pair When using this key pair, if you use one of the keys to encrypt a piece of data, you must use the other key to decrypt it. For example, if the public key is used to encrypt data, the private key must be used to decrypt. If the private key is used to encrypt data, the public key must also be used to decrypt, otherwise the decryption will not succeed.
  • the decryption module may be an SE module, and the SE module is a module that ensures system security.
  • the security chip and the chip operating system (COS) are used to implement functions such as secure storage of data and encryption and decryption operations.
  • the main functions of the SE module in the security system include: secure storage of keys, data encryption operations, and secure storage of information.
  • the secure storage of keys can establish a relatively complete key management system to ensure that keys cannot be read.
  • Data encryption operations include support for reliable security algorithms, sensitive data ciphertext transmission, and data transmission tamper resistance.
  • the safe storage of information refers to a strict file access authority mechanism and reliable authentication algorithms and processes.
  • the public key is placed in the SE module.
  • SE modules can be packaged in various forms, common ones include smart cards and embedded security modules (eSE).
  • an embedded security module eSE
  • eSE embedded security module
  • the built-in security operating system satisfies the terminal’s security key storage and data encryption services. demand.
  • the voice system can be widely used in finance, map navigation, urban transportation, medical treatment, retail and other fields, and can protect the security of the system when it is used.
  • step S203 it is determined whether the first verification information matches the second verification information pre-stored by the client. If so, step S204 is executed, and if not, the process ends.
  • step S203 of the embodiment of the present specification according to the identifier of the registered APP, the second verification information corresponding to the registered APP pre-stored in the secure operating environment of the client is determined; it is determined whether the first verification information and the second verification information match.
  • the identity of the registered APP is the identity information of the registered APP.
  • Step S204 verify that the delivered voice configuration file is correct.
  • step S205 it is detected whether the client has a voice synthesis file required for the registered APP, if it exists, step S206 is executed, and if it does not exist, step S207 is executed.
  • step S205 in the embodiment of the present specification, it is the same as step S101 described above, and is not repeated here.
  • Step S206 calling the speech synthesis file of the client.
  • step S206 in the embodiment of the present specification it is the same as the above step S102, and is not repeated here.
  • Step S207 Download the voice synthesis file from the server corresponding to the registered APP according to the voice configuration file corresponding to the registered APP.
  • step S207 of the embodiment of the present specification it is the same as the above step S103, and is not repeated here.
  • the voice system in this embodiment also has a synchronization problem between the server and the registered APP.
  • the server can support the active push method, that is, when the client's voice synthesis file changes, the server actively sends The client pushes.
  • FIG. 3 is a schematic structural diagram of a calling device for a speech synthesis file provided by an embodiment of the present specification.
  • the schematic structural diagram includes: a detecting unit 1, a calling unit 2, a downloading unit 3, a pulling unit 4, a receiving unit 5, and a judging unit 6 , Verification unit 7, training unit 8 and calculation unit 9.
  • the detection unit 1 is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance.
  • the calling unit 2 is used to call the voice synthesis file of the client if it is detected that the client has a voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • the downloading unit 3 is used to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if the voice synthesis file does not exist on the client terminal. download link.
  • the pulling unit 4 is used to pull the voice configuration file from the server corresponding to the registered APP.
  • the receiving unit 5 is used to receive the voice configuration file delivered by the server corresponding to the registered APP.
  • the delivered voice configuration file includes the server corresponding to the registered APP encrypting the delivered voice configuration file and assigning it to the registered APP. First verification information.
  • the judging unit 6 is used to judge whether the first verification information matches the second verification information pre-stored by the client;
  • the verification unit 7 is configured to verify that the delivered voice configuration file is correct when it is determined that the first verification information matches the second verification information pre-stored by the client.
  • the judgment unit 6 is specifically used for:
  • the second verification information corresponding to the registered APP pre-stored in the secure operating environment of the client according to the identifier of the registered APP;
  • the training unit 8 is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the APP developer to customize through the built-in voice basic training model
  • the voice model, and the voice synthesis file corresponding to the registered APP is generated by the APP-developed voice model according to the pre-stored text.
  • the basic voice training model is based on the needs of the registered APP to play voice.
  • the obtained model can be used by registered APPs.
  • the calculation unit 9 is used to calculate the first summary value corresponding to the speech synthesis file
  • the judging unit 6 is also used to judge whether the second digest value corresponding to the pre-stored voice synthesis file in the voice configuration file is the same as the first digest value;
  • the registered APP performs voice playback according to the voice synthesis file.
  • the registered APP performs voice playback according to the speech synthesis file, including: the server corresponding to the registered APP encrypts the speech synthesis file according to the preset rules; the encrypted speech synthesis file is decrypted according to the built-in decryption module, and the registered APP performs speech Play.
  • the embodiments of the present specification also provide a computer-readable medium on which computer-readable instructions are stored.
  • the computer-readable instructions can be executed by a processor to perform the following steps:
  • the client does not have a voice synthesis file, download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP, and the voice configuration file has a built-in download address for the voice synthesis file;
  • the client's voice synthesis file is called to allow the registered APP to perform voice playback according to the voice synthesis file.
  • An embodiment of the present specification also provides a calling device for a speech synthesis file.
  • the device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein, when the computer program instructions are executed by the processor, Trigger the device to perform the following steps:
  • the detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client, and the registered APP is an APP that needs to use the voice synthesis file in advance;
  • the downloading unit is used to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if the voice synthesis file does not exist on the client terminal. download link;
  • the calling unit is used to call the voice synthesis file of the client if it is detected that the client has a voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • a voice system provided by an embodiment of this specification includes a terminal and a server, and the terminal includes a voice SDK running in the terminal, a registered APP, and an APP developer terminal;
  • the APP developer terminal is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP;
  • the server is used to train the APP developer's customized voice model through the built-in voice basic training model, and enter the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP.
  • the training model is a model that can be shared by the registered APPs and trained by using several voice samples provided in advance according to the needs of the registered APP to play voices;
  • the voice SDK is used to pull the voice configuration file from the server corresponding to the registered APP; receive the voice configuration file delivered by the server corresponding to the registered APP.
  • the issued voice configuration file includes the server to the corresponding APP
  • the voice configuration file is encrypted, it is assigned to the first verification information corresponding to the registered APP; it is determined whether the first verification information matches the second verification information pre-stored by the client; When the second verification information matches, verify that the delivered voice configuration file is correct; detect whether the client has the voice synthesis file required for the registered APP, and the registered APP is an application that requires the voice synthesis file to be registered in advance; if it is detected There is no voice synthesis file on the client.
  • the voice configuration file has a built-in download address for the voice synthesis file; if it is detected that there is voice synthesis on the client File, call the voice synthesis file of the client for the registered APP to play voice according to the voice synthesis file
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions.
  • These computer program instructions can be provided to the processor of a computer, dedicated computer, embedded processor, or other programmable data processing device to produce a machine so that the instructions executed by the processor of the computer or other programmable data processing device produce instructions for A device for realizing the functions specified in one block or multiple blocks in one flow or multiple flows in a flowchart
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory, random access memory (RAM) and/or non-volatile memory in computer-readable media, such as read only memory (ROM) or flash memory (flashRAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • ROM read only memory
  • flashRAM flash memory
  • Computer readable media including permanent and non-permanent, removable and non-removable media, can store information by any method or technology.
  • the information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Transfer Between Computers (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention concerne un procédé et un dispositif destinés à invoquer un fichier de synthèse de parole. Ledit procédé consiste : à détecter si un client possède un fichier de synthèse de parole devant être utilisé par une application enregistrée (S101), l'application enregistrée étant une application préenregistrée requérant l'utilisation d'un fichier de synthèse de parole ; s'il est détecté que le client ne possède pas de fichier de synthèse de parole, à télécharger, selon un fichier de configuration de parole pré-stocké correspondant à l'application enregistrée, un fichier de synthèse de parole à partir d'un côté serveur correspondant à l'application enregistrée (S103), le fichier de configuration de parole comprenant l'adresse de téléchargement du fichier de synthèse de parole ; et s'il est détecté que le client possède le fichier de synthèse de parole, à invoquer le fichier de synthèse de parole du client (S102), de sorte que l'application enregistrée effectue une lecture de parole selon le fichier de synthèse de parole. Lorsqu'une application enregistrée doit utiliser un fichier de synthèse de parole, s'il est détecté qu'un client possède un fichier de synthèse de parole, et si le client possède un fichier de synthèse de parole, le fichier de synthèse de parole mis en cache dans le client est de préférence invoqué, réduisant le temps de réponse de l'ensemble du système de parole.
PCT/CN2019/122545 2018-12-26 2019-12-03 Procédé et dispositif destinés à invoquer un fichier de synthèse de parole WO2020134896A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811596879.5 2018-12-26
CN201811596879.5A CN110021291B (zh) 2018-12-26 2018-12-26 一种语音合成文件的调用方法及装置

Publications (1)

Publication Number Publication Date
WO2020134896A1 true WO2020134896A1 (fr) 2020-07-02

Family

ID=67188692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/122545 WO2020134896A1 (fr) 2018-12-26 2019-12-03 Procédé et dispositif destinés à invoquer un fichier de synthèse de parole

Country Status (3)

Country Link
CN (1) CN110021291B (fr)
TW (1) TW202027027A (fr)
WO (1) WO2020134896A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421542A (zh) * 2021-06-22 2021-09-21 广州小鹏汽车科技有限公司 语音交互方法、服务器、语音交互***和存储介质
US20230051062A1 (en) * 2020-05-26 2023-02-16 Apple Inc. Personalized voices for text messaging

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021291B (zh) * 2018-12-26 2021-01-29 创新先进技术有限公司 一种语音合成文件的调用方法及装置
CN111953853A (zh) * 2020-07-31 2020-11-17 中国工商银行股份有限公司 一种语音报读的处理方法及装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187658A1 (en) * 2002-03-29 2003-10-02 Jari Selin Method for text-to-speech service utilizing a uniform resource identifier
WO2005002160A1 (fr) * 2003-06-30 2005-01-06 Nortel Networks Limited Procede et systeme de mise en oeuvre d'une messagerie instantanee texte-parole
CN101098507A (zh) * 2007-06-29 2008-01-02 中兴通讯股份有限公司 一种提供语音合成应用统一开发平台的***和方法
CN104992703A (zh) * 2015-07-24 2015-10-21 百度在线网络技术(北京)有限公司 语音合成方法和***
CN105161091A (zh) * 2015-08-24 2015-12-16 北京开元智信通软件有限公司 一种车载tts语音播报方法、***及车载终端
CN105354096A (zh) * 2015-10-29 2016-02-24 中国电子科技集团公司第二十八研究所 一种基于bs架构的语音自动生成播报方法
US20180075838A1 (en) * 2015-11-10 2018-03-15 Paul Wendell Mason Method and system for Using A Vocal Sample to Customize Text to Speech Applications
CN110021291A (zh) * 2018-12-26 2019-07-16 阿里巴巴集团控股有限公司 一种语音合成文件的调用方法及装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000021232A2 (fr) * 1998-10-02 2000-04-13 International Business Machines Corporation Navigateur interactif et systemes interactifs
US6901431B1 (en) * 1999-09-03 2005-05-31 Cisco Technology, Inc. Application server providing personalized voice enabled web application services using extensible markup language documents
CN102968992B (zh) * 2012-11-26 2014-11-05 北京奇虎科技有限公司 用于浏览器的语音识别处理方法和浏览器
CN103118002A (zh) * 2012-12-21 2013-05-22 北京飞漫软件技术有限公司 一种以语音为密钥实现数据资源云存储管理的方法
US9430465B2 (en) * 2013-05-13 2016-08-30 Facebook, Inc. Hybrid, offline/online speech translation system
US9444935B2 (en) * 2014-11-12 2016-09-13 24/7 Customer, Inc. Method and apparatus for facilitating speech application testing
CN107315958A (zh) * 2016-04-26 2017-11-03 北京京东尚科信息技术有限公司 数据对象的合法性验证方法和装置
US20170345410A1 (en) * 2016-05-26 2017-11-30 Tyler Murray Smith Text to speech system with real-time amendment capability
KR101806499B1 (ko) * 2016-06-10 2017-12-07 주식회사 지어소프트 파일 관리 방법 및 이를 이용한 파일 관리 장치
CN107123424B (zh) * 2017-04-27 2022-03-11 腾讯科技(深圳)有限公司 音频文件处理方法及装置
CN107391168B (zh) * 2017-06-08 2018-07-03 腾讯科技(深圳)有限公司 动画加载方法和装置以及请求处理方法和装置
CN107517252A (zh) * 2017-08-22 2017-12-26 福建中金在线信息科技有限公司 一种文件下载控制方法、装置及***
CN108234636A (zh) * 2017-12-29 2018-06-29 阿里巴巴集团控股有限公司 语音播报方法、装置、***以及智能播报设备
CN108809960A (zh) * 2018-05-23 2018-11-13 北京五八信息技术有限公司 一种文件上传及下载方法、装置、设备、***及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187658A1 (en) * 2002-03-29 2003-10-02 Jari Selin Method for text-to-speech service utilizing a uniform resource identifier
WO2005002160A1 (fr) * 2003-06-30 2005-01-06 Nortel Networks Limited Procede et systeme de mise en oeuvre d'une messagerie instantanee texte-parole
CN101098507A (zh) * 2007-06-29 2008-01-02 中兴通讯股份有限公司 一种提供语音合成应用统一开发平台的***和方法
CN104992703A (zh) * 2015-07-24 2015-10-21 百度在线网络技术(北京)有限公司 语音合成方法和***
CN105161091A (zh) * 2015-08-24 2015-12-16 北京开元智信通软件有限公司 一种车载tts语音播报方法、***及车载终端
CN105354096A (zh) * 2015-10-29 2016-02-24 中国电子科技集团公司第二十八研究所 一种基于bs架构的语音自动生成播报方法
US20180075838A1 (en) * 2015-11-10 2018-03-15 Paul Wendell Mason Method and system for Using A Vocal Sample to Customize Text to Speech Applications
CN110021291A (zh) * 2018-12-26 2019-07-16 阿里巴巴集团控股有限公司 一种语音合成文件的调用方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230051062A1 (en) * 2020-05-26 2023-02-16 Apple Inc. Personalized voices for text messaging
CN113421542A (zh) * 2021-06-22 2021-09-21 广州小鹏汽车科技有限公司 语音交互方法、服务器、语音交互***和存储介质

Also Published As

Publication number Publication date
CN110021291B (zh) 2021-01-29
CN110021291A (zh) 2019-07-16
TW202027027A (zh) 2020-07-16

Similar Documents

Publication Publication Date Title
WO2020134896A1 (fr) Procédé et dispositif destinés à invoquer un fichier de synthèse de parole
US20210064784A1 (en) Managing a smart contract on a blockchain
US10200198B2 (en) Making cryptographic claims about stored data using an anchoring system
US10958436B2 (en) Methods contract generator and validation server for access control of contract data in a distributed system with distributed consensus
US20190116038A1 (en) Attestation With Embedded Encryption Keys
US10410018B2 (en) Cryptographic assurances of data integrity for data crossing trust boundaries
CN111767578B (zh) 一种数据检验方法、装置及设备
US11263632B2 (en) Information sharing methods, apparatuses, and devices
CN110199284A (zh) 交叉平台包围区身份
WO2020253469A1 (fr) Procédé et appareil de mise à jour à chaud pour un paquet de fichier de script
US9954900B2 (en) Automating the creation and maintenance of policy compliant environments
US11979498B2 (en) System and method for securely transferring data using generated encryption keys
US10554663B2 (en) Self-destructing smart data container
CN110214323A (zh) 包围区抽象模型
CN110264194B (zh) 基于事件函数类型的收据存储方法和节点
CN115033919A (zh) 一种基于可信设备的数据获取方法、装置及设备
CN115580413B (zh) 一种零信任的多方数据融合计算方法和装置
CN112131595A (zh) 一种SQLite数据库文件安全存取方法及装置
US8745375B2 (en) Handling of the usage of software in a disconnected computing environment
CN111930846B (zh) 一种数据处理方法、装置及设备
CN113129017B (zh) 一种信息共享方法、装置及设备
WO2019210471A1 (fr) Procédé d'invocation de données et appareil d'invocation de données
CN110602051B (zh) 基于共识协议的信息处理方法及相关装置
US11138319B2 (en) Light-weight context tracking and repair for preventing integrity and confidentiality violations
US20210226771A1 (en) Method and system for authentication seal deployment in networked immutable transactions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903203

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19903203

Country of ref document: EP

Kind code of ref document: A1