CN109036374B

CN109036374B - Data processing method and device

Info

Publication number: CN109036374B
Application number: CN201810720403.1A
Authority: CN
Inventors: 于丽娜
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2018-07-03
Filing date: 2018-07-03
Publication date: 2019-12-03
Anticipated expiration: 2038-07-03
Also published as: CN109036374A

Abstract

The embodiment of the present application provides a kind of data processing method and device, includes the information of content to be played and the type of playing request in the playing request this method comprises: receiving the playing request of user's input；Using speech synthesis model corresponding with the type of playing request, content to be played is subjected to voice conversion, obtains voice；Speech synthesis model is to carry out the audio model that analyzing and training is established to the voice data of the kinsfolk for the children being collected into；Voice is played out.The speech synthesis model that the application passes through the corresponding kinsfolk of the different playing request types of acquisition, again because different playing requests corresponds to different scenes, therefore content transformation to be played at kinsfolk's and can be met into the sound of scene at that time, it can be applied to parent-child interaction, parent-offspring reads.

Description

Data processing method and device

Technical field

The invention relates to computer technology more particularly to a kind of data processing method and device.

Background technique

Nearly 2 years, with popularizing for artificial intelligence interaction technique, intelligent robot product was rapidly developed.Wherein, family It is released one after another like the mushrooms after rain with the service robot high fire of type in particular for the company humanoid robot of children.

Existing company humanoid robot, can not issue the sound the same or similar with kinsfolk, can not be according to not It requests to issue the sound for meeting scene with the user under scene.

Summary of the invention

The embodiment of the present application provides a kind of data processing method and device, cannot be issued in the prior art and family with overcoming Member is the same and the technical issues of meeting the sound of scene.

In a first aspect, the embodiment of the present application provides a kind of data processing method, comprising:

The playing request of user's input is received, includes that the information of content to be played and the broadcasting are asked in the playing request The type asked；

Using speech synthesis model corresponding with the type of the playing request, the content to be played is subjected to voice and is turned It changes, obtains voice；The speech synthesis model is to carry out analyzing and training to the voice data of the kinsfolk for the children being collected into The audio model established, the voice data are sound of the kinsfolk under scene corresponding with the type of the playing request Sound data；

The voice is played out.

In a kind of possible design, the type of the playing request is story playing request, then the content to be played Information include needed for play narration information；

It is described that the content is carried out by voice conversion using speech synthesis model corresponding with the type of the playing request, Obtain voice, comprising:

Voice conversion is carried out to story content corresponding to the narration information using the first speech synthesis model, obtains event Thing voice, wherein the first speech synthesis model is carried out to the first voice data of the kinsfolk for the children being collected into The audio model that analyzing and training is established；First voice data is that kinsfolk's sound is being told a story under scene for children Voice data；

The voice is played out, comprising:

The story voice is played out.

It is described to use the first speech synthesis model to story corresponding to the narration information in a kind of possible design Content carries out voice conversion, obtains story voice, comprising:

Voice is carried out to story content corresponding to the narration information using the first speech synthesis model being locally stored Conversion, obtains story voice；

Correspondingly, before the playing request for receiving user's input, further includes:

Receive the first speech synthesis model that cloud server is sent.

Cloud server is sent by story content corresponding to the narration information, so that the cloud server uses First speech synthesis model carries out voice conversion to the story content, obtains story voice；

Receive the story voice that the cloud server is sent.

In a kind of possible design, the required narration information played, comprising: the mark of the story of required broadcasting is believed Breath；

Alternatively, the text information of the story of required broadcasting.

In a kind of possible design, in the playing request further include: corresponding to each kinsfolk of children The selection information of one speech synthesis model；

Correspondingly, described carry out voice to story content corresponding to the narration information using the first speech synthesis model Conversion, comprising:

Using with the corresponding first speech synthesis model of selection information, to story content corresponding to the narration information into The conversion of row voice.

In a kind of possible design, in the playing request further include: the first voice corresponding for the father closes At the selection information of model and the first speech synthesis model corresponding with mother；

In a kind of possible design, the type of the playing request is daily voice playing request, described to be played interior The information of appearance includes text to be played；

It is described to use speech synthesis model corresponding with the type of the playing request, the content to be played is subjected to language Sound conversion, obtains voice, comprising:

Voice conversion is carried out to the text using the second speech synthesis model, obtains daily voice, wherein described second Speech synthesis model is to carry out the audio that analyzing and training is established to the second sound data of the kinsfolk for the children being collected into Model；The second sound data are second sound data of the kinsfolk under common dialogue scene；

The voice is played out, comprising:

The daily voice is played out.

It is described to be carried out the content using speech synthesis model corresponding with the type in a kind of possible design Voice conversion, before obtaining voice, further includes:

Collect each of third voice data and the children of the kinsfolk of the children under common dialogue scene A kinsfolk is in the first voice data told a story under scene for the children；

First voice data and second sound data are sent to cloud server, so that the cloud server: Clustering is carried out to the third voice data, obtains second sound data corresponding with each kinsfolk, establish with The corresponding generic sound database of each kinsfolk, establishes individualized voice data corresponding with each kinsfolk Library；And for each kinsfolk, the second sound data for including to the corresponding generic sound database of the kinsfolk It is trained, obtains the second speech synthesis model of the kinsfolk, individualized voice number corresponding to the kinsfolk It is trained according to the first voice data that library includes, obtains the first speech synthesis model of the kinsfolk.

Second aspect, the embodiment of the present application provide a kind of device of data processing, comprising:

Receiving module includes the letter of content to be played for receiving the playing request of user's input, in the playing request The type of breath and the playing request；

Literary periodicals module, for using speech synthesis model corresponding with the type of the playing request, will it is described to Broadcasting content carries out voice conversion, obtains voice；The speech synthesis model is the sound to the kinsfolk for the children being collected into Sound data carry out the audio model that analyzing and training is established, and the voice data is kinsfolk in the class with the playing request Voice data under the corresponding scene of type；

Playing module, for being played out to the voice.

In a kind of possible design, the type for stating playing request is story playing request, then the content to be played The narration information that information plays needed for including；

The literary periodicals module is specifically used for:

The playing module, is specifically used for:

The story voice is played out.

In a kind of possible design, the literary periodicals module is specifically used for:

The receiving module, is also used to: receiving the first speech synthesis model that cloud server is sent.

In a kind of possible design, further includes: sending module is used for: will be in story corresponding to the narration information Appearance is sent to cloud server, so that the cloud server carries out language to the story content using the first speech synthesis model Sound conversion, obtains story voice；

The receiving module, is also used to: receiving the story voice that the cloud server is sent.

Alternatively, the text information of the story of required broadcasting.

Correspondingly, the literary periodicals module is specifically used for:

In a kind of possible design, which is characterized in that in the playing request further include: corresponding for the father The selection information of first speech synthesis model and the first speech synthesis model corresponding with mother；

Correspondingly, the literary periodicals module is specifically used for:

The literary periodicals module is specifically used for:

The playing module, is specifically used for:

The daily voice is played out.

In a kind of possible design, further includes:

Collection module, for using speech synthesis model corresponding with the type described in the literary periodicals module Content carries out voice conversion, before obtaining voice, collects third sound of the kinsfolk of the children under common dialogue scene Each kinsfolk of sound data and the children are in the first voice data told a story under scene for the children；

Sending module, for first voice data and second sound data to be sent to cloud server, so that institute It states cloud server: clustering being carried out to the third voice data, obtains the rising tone corresponding with each kinsfolk Sound data establish generic sound database corresponding with each kinsfolk, establish corresponding with each kinsfolk Property audio database；And for each kinsfolk, include to the corresponding generic sound database of the kinsfolk Second sound data are trained, and obtain the second speech synthesis model of the kinsfolk, corresponding to the kinsfolk The first voice data that individualized voice database includes is trained, and obtains the first speech synthesis mould of the kinsfolk Type.

The third aspect, the embodiment of the present application provide a kind of computer readable storage medium, on computer readable storage medium It is stored with computer program, when the computer program is executed by processor, first aspect is executed and first aspect is any Method described in possible design.

Fourth aspect, a kind of device of data processing of the embodiment of the present application, including processor and memory, wherein

Memory, for storing program；

Processor, for executing the described program of memory storage, when described program is performed, the processor For executing method described in first aspect and any possible design of first aspect.

Voice data of the user under different scenes is had collected for different scenes in the application, under each scene Voice data is trained, and obtains the corresponding speech synthesis model of each scene.In this way, being inputted not in user by terminal device With type corresponding under scene playing request when (different types of playing request corresponding different scenes), so that it may using with Content to be played is converted to voice and played by the corresponding speech synthesis model of the type of playing request.That is, the application Data processing method, can be by content transformation to be played at the sound for meeting scene at that time.

And speech synthesis model is the corresponding speech synthesis model of kinsfolk, then the data processing method of the present embodiment, Content transformation to be played at kinsfolk's and can be met into the sound of scene at that time, can be applied to parent-child interaction, parent Son is read.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this Shen Some embodiments please for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.

Fig. 1 is system architecture diagram provided by the embodiments of the present application；

Fig. 2 is the flow chart one of data processing method provided by the embodiments of the present application；

Fig. 3 is the flowchart 2 of data processing method provided by the embodiments of the present application；

Fig. 4 is the flow chart 3 of data processing method provided by the embodiments of the present application；

Fig. 5 is the structural schematic diagram one of the device of data processing provided by the embodiments of the present application；

Fig. 6 is the structural schematic diagram two of the device of data processing provided by the embodiments of the present application；

Fig. 7 is the structural schematic diagram of terminal device provided by the embodiments of the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall in the protection scope of this application.

Fig. 1 is system architecture diagram provided by the embodiments of the present application, and referring to Fig. 1, the system architecture of the present embodiment includes: terminal Equipment 11 and cloud server 12.

Wherein, what terminal device was used to receive user please play the voice data for asking and collecting user, cloud server 12 for according to the training of the voice data of user and storaged voice synthetic model.

Speech synthesis model can be also sent to terminal device 11 after obtaining speech synthesis model by cloud server 12.

The data processing method of the embodiment of the present application is described in detail using specific embodiment below.

Fig. 2 is the flow chart one of data processing method provided by the embodiments of the present application, as shown in Fig. 2, the side of the present embodiment Method may include:

Step S101, the playing request of user's input is received, includes the information and broadcasting of content to be played in playing request The type of request；

Step S102, using speech synthesis model corresponding with the type of playing request, content to be played is subjected to voice Conversion, obtains voice；The speech synthesis model is to carry out analyzing and training to the voice data of the kinsfolk for the children being collected into The audio model established, voice data are voice data of the kinsfolk under scene corresponding with the type of playing request；

Step S103, the voice is played out.

Specifically, the executing subject of the present embodiment can be terminal device, and terminal device can be children-story machine.

The playing request that corresponding step S101, user input can include: story playing request, daily voice playing request； It is understood that inputting story playing request if user wants to listen story；If other users are sent out by the terminal device of itself It send to one Duan Wenben of terminal device (text can be described as text to be played), then when user determines and plays the text, quite Daily voice playing request is had input in user.

Daily voice playing request refers to that request plays the voice after conversion using the sound characteristic under common dialogue scene.

That is story playing request and daily voice playing request is two distinct types of playing request, each type of to broadcast It puts type and corresponds to a kind of scene.

For step S102~step S103, using speech synthesis model corresponding with the type of playing request, will be wait broadcast It puts content and carries out voice conversion, obtain voice, and play out to voice.

Corresponding sound characteristic is different when specifically, due to by story and common dialogue.Such as needs of telling a story Emotion with the corresponding scene of story, and word speed is slower, and common dialogue does not need excessive emotion then, word speed is normal；If It, all will be wait broadcast using same speech synthesis model either for story playing request, or for daily voice playing request Content transformation is put into voice, then the voice obtained can not meet the corresponding scene of one of playing request: tell a story Scene or common dialogue scene.

Therefore, according to the type for the playing request for including in playing request in the present embodiment, using the class with playing request Content to be played is carried out voice conversion, obtains voice by the corresponding speech synthesis model of type.

For example, using the first language corresponding with story playing request if the type of playing request is story playing request Content to be played is carried out voice conversion, obtains story voice, play the story voice by sound synthetic model；

If the type of playing request is daily voice playing request, use and daily voice playing request corresponding second Content to be played is carried out voice conversion, obtains daily voice, play the daily voice by speech synthesis model.

The voice obtained in this way can meet the corresponding scene of playing request, improve the experience of user.

Wherein, terminal device can with enough first voice data of the collector A under scene of telling a story, by this first Audio data transmitting is to cloud server, and cloud server establishes the corresponding individuation data library people A, and cloud server is to people A The first voice data that corresponding individuation data library includes carries out analyzing and training and obtains the corresponding first speech synthesis mould of people A Type.

Terminal device can be with enough first voice datas of collector B under scene of telling a story, by first sound Sound data are sent to cloud server, and cloud server establishes the corresponding individuation data library people B, and cloud server is to B pairs of people The first voice data that the individuation data library answered includes carries out analyzing and training and obtains the corresponding first speech synthesis model of people B.

That is, the corresponding first speech synthesis model of one or more people can be obtained according to demand, if cloud End server gets the corresponding first speech synthesis model of multiple people, and playing request is story playing request, then broadcasts It puts in request further include: the selection information of the first speech synthesis model；I.e. user inputs selection information by terminal device, if with Family wants that A is listened to tell a story, then select include in information A mark.When playing story, terminal device or cloud server are used The first speech synthesis model corresponding with selection information converts story text to voice, that is, selects the first voice corresponding with A Synthetic model converts story text to voice.

It is understood that user is generally child for children-story machine, in order to enable in parent or other Kinsfolk not when, child still can hear the sound of kinsfolk, in other words Story machine use kinsfolk sound It tells a story, then for each kinsfolk, terminal device can collect enough of the house person under scene of telling a story One voice data, by first audio data transmitting to cloud server, cloud server founds a family the personalized sound of member Sound database, cloud server carry out analyzing and training to first voice data and obtain the first speech synthesis mould of house person Type.It is, the corresponding first speech synthesis model of each kinsfolk.

Acquisition for the second speech synthesis model, it is enough under common dialogue scene that terminal device collects kinsfolk More third voice datas；Clustering is carried out to the third voice data, obtains corresponding with each kinsfolk second Voice data, and generic sound database corresponding with each kinsfolk is established, that is, each kinsfolk corresponding one A generic sound database, for each kinsfolk, the rising tone for including to the corresponding generic sound database of kinsfolk Sound data are trained, and obtain the second speech synthesis model of kinsfolk.It is, each kinsfolk is one second corresponding Speech synthesis model.

If cloud server gets the corresponding second speech synthesis model of multiple kinsfolks, playing request is day Chang Yuyin playing request then further includes the selection information of the second speech synthesis model in playing request；I.e. user is set by terminal Standby input selection information selects the mark in information including A, terminal device or cloud service if user wants to listen the sound of A Device uses the second speech synthesis model corresponding with selection information by text conversion to be played to voice, that is, selects corresponding with A Second speech synthesis model is by text conversion to be played to daily voice.

Wherein, clustering algorithm, such as K-means clustering algorithm can be used in clustering.

After cloud server is according to clustering is carried out by third voice data, it must attend the meeting to obtain multiple universal phonetic collection, Each universal phonetic collection corresponds to one family member, can be used following mode be each universal phonetic collection add corresponding family at The mark of member:

A kind of achievable mode are as follows: for each all-purpose language collection, cloud server is by one section of language of all-purpose language collection Sound is sent to terminal device, according to this section of speech recognition is which kinsfolk for user, after identification, user passes through terminal device Input the mark of the universal phonetic collection, the mark for the universal phonetic collection that cloud server receiving terminal apparatus is sent.

Another achievable mode are as follows: for each all-purpose language collection, cloud server is that the addition of universal phonetic collection is pre- One section of voice data that pre-selection mark and corresponding universal phonetic are concentrated is sent to terminal device and shown, for user by choosing mark Judgement pre-selection identifies whether correctly, if incorrect, receives the correct mark of user equipment input, and will correctly identify hair It send to cloud server, if correctly, input validation instruction.

It is understood that cloud server can be to third language after the third voice data for collecting preset duration Sound data carry out clustering；After the mark for adding kinsfolk for each universal phonetic collection by the above method, terminal is set It is standby to will continue to collect third voice data, using clustering by the second of the different home member for including in third voice data Voice data is referred to corresponding universal phonetic and concentrates.

Cloud server can establish universal phonetic database, the universal phonetic collection of each kinsfolk for each kinsfolk In voice data be stored in corresponding universal phonetic database, the mark in universal phonetic database is exactly corresponding family The mark of member.

Voice data of the user under different scenes is had collected for different scenes in the present embodiment, under each scene Voice data be trained, obtain the corresponding speech synthesis model of each scene.In this way, being inputted in user by terminal device Under different scenes when the playing request of corresponding type (different types of playing request corresponding different scenes), so that it may use Content to be played is converted to voice and played by speech synthesis model corresponding with the type of playing request.That is, this implementation The data processing method of example, can be by content transformation to be played at the sound for meeting scene at that time.

The corresponding data processing method of different types of playing request is illustrated using specific embodiment below.

Fig. 3 is the flowchart 2 of data processing method provided by the embodiments of the present application, referring to Fig. 3, the method for the present embodiment, Include:

Step S201, the story playing request of user's input is received, includes the story of required broadcasting in story playing request Information；

Step S202, voice conversion is carried out to story content corresponding to narration information using the first speech synthesis model, Obtain story voice, wherein the first speech synthesis model be to the first voice data of the kinsfolk for the children being collected into The audio model that row analyzing and training is established；First voice data is kinsfolk's sound in the sound told a story under scene for children Sound data；

Step S203 plays out story voice.

Specifically, the executing subject of the present embodiment can be terminal device, and terminal device can be company humanoid robot, such as youngster Virgin Story machine.

Story playing request is inputted by terminal device for step S201, user, comprising:

The story playing request of user's input is received, the mark letter of the story comprising required broadcasting in story playing request Breath；Alternatively,

The story playing request of user's input is received, the text envelope of the story comprising required broadcasting in story playing request Breath.

Wherein, when terminal device be children-story machine when, user input story playing request mode have it is following several, But it is not limited to following several:

First way are as follows: user clicks directly on the button input story playing request of the broadcasting story on Story machine, this When, Story machine is the required story played according to the story that the preset sequence of Story machine should currently play.

The second way are as follows: story list is shown on the display screen of Story machine, user is by choosing institute in story list The story that need to be played inputs story playing request, at this time the identification information of the story in story playing request comprising required broadcasting.

The third mode are as follows: the display screen display input frame of Story machine, user needed for input frame input institute by playing Story name and input story playing request, at this time in story playing request comprising required broadcasting story mark letter Breath.

4th kind of mode are as follows: the display screen display input frame of Story machine, user needed for input frame input institute by playing Story text information, at this time in story playing request comprising required broadcasting story text information.

Fifth procedure are as follows: Story machine, which has the function of to sweep, to be swept, and by the corresponding text of story played needed for scanning, is obtained The text information of the story played needed for taking, that is, have input story playing request.It is broadcast in story playing request comprising required at this time The text information for the story put.

6th kind of mode are as follows: user inputs story playing request using voice, for example, user inputs voice " pulling out radish ", The identification information of story in story playing request comprising required broadcasting at this time.

Further, if cloud server obtains the corresponding first speech synthesis model of multiple kinsfolks, story In playing request further include: the selection information of the first speech synthesis model corresponding for each kinsfolk.It is understood that Select which kinsfolk corresponding first speech synthesis model that the corresponding story content of narration information is converted to voice, just The sound who tells a story can be played out.

For example, if cloud server obtains and the corresponding first speech synthesis model of father and corresponding with mother One speech synthesis model then includes: the first speech synthesis model corresponding for father and and mother in story playing request The selection information of corresponding first speech synthesis model.

It is to be appreciated that but being not limited to following several at this point, the mode that user inputs story playing request has following several Kind:

First way are as follows: story list is shown on the display screen of Story machine, user, which first passes through, to be chosen in story list The story of required broadcasting, then by choosing a certain kinsfolk in kinsfolk's selective listing, asked to input story broadcasting Ask, at this time the identification information of the story in story playing request comprising required broadcasting, and to the selection information of kinsfolk (or Person says the selection information to the first speech synthesis model).

The second way are as follows: at least two input frames are shown on the display screen of Story machine, user passes through the first input frame The name of the story played needed for input, the address of kinsfolk is inputted by the second input frame, is asked to input story broadcasting It asks；Wherein, address i.e. the mark of the corresponding individualized voice database of corresponding kinsfolk.At this time in story playing request The identification information of story comprising required broadcasting, and the selection information to the first speech synthesis model.

The third mode are as follows: at least two input frames are shown on the display screen of Story machine, user passes through the first input frame Input institute needed for play story text information, by the second input frame input kinsfolk address, to input story Playing request；The text information of story in story playing request comprising required broadcasting at this time, and to the first speech synthesis mould The selection information of type.

4th kind of mode are as follows: Story machine, which has the function of to sweep, to be swept, and by the corresponding text of story played needed for scanning, is obtained The former text information played needed for taking inputs the address of kinsfolk by input frame, to input story playing request；At this time The text information of story in story playing request comprising required broadcasting, and the selection information to the first speech synthesis model.

Fifth procedure are as follows: user inputs story playing request using voice, " father is listened to say for example, user inputs voice Pull out radish ", it at this time include the identification information of the story of required broadcasting in story playing request, and to the first speech synthesis model Selection information

Corresponding step S202: voice is carried out to story content corresponding to narration information using the first speech synthesis model and is turned It changes, obtains story voice, in a kind of possible embodiment, comprising:

Voice conversion is carried out to story content corresponding to narration information using the first speech synthesis model being locally stored, Obtain story voice；

In this embodiment, before the story playing request for receiving user's input, further includes:

The first speech synthesis model that cloud server is sent is received, that is, each first will obtained in cloud server Speech synthesis model is sent to terminal device, stores at terminal device.After terminal device receives story playing request, according to The identification information or text information of the story of the required broadcasting carried in story playing request, the story played needed for obtaining Content (the namely content of the corresponding story of narration information) then directlys adopt and is stored in local speech synthesis model to event Story content corresponding to thing information carries out voice conversion, obtains story voice.

If including in story playing request " the selection information of the first speech synthesis model corresponding for each kinsfolk ", Voice conversion is then carried out to story content corresponding to narration information using the first speech synthesis model being locally stored, obtains event Thing voice, comprising:

Then using the first speech synthesis model corresponding with selection information being locally stored to event corresponding to narration information Thing content carries out voice conversion, obtains story voice.

In alternatively possible embodiment, using the first speech synthesis model to story content corresponding to narration information Voice conversion is carried out, story voice is obtained, comprising:

Cloud server is sent by story content corresponding to narration information, so that cloud server uses the first voice Synthetic model carries out voice conversion to the story content, obtains story voice；

Receive the story voice that cloud server is sent.

In this embodiment, the required broadcasting carried in the story playing request that terminal device can be inputted according to user The identification information or text information of story, content (the namely corresponding story of narration information of the story played needed for obtaining Content)；Then, cloud server is sent by story content corresponding to narration information, so that cloud server is using the One speech synthesis model carries out voice conversion to the story content, obtains story voice.

If including in story playing request " the selection information of the first speech synthesis model corresponding for each kinsfolk ", Then receive the story voice of cloud server transmission, comprising:

Receiving cloud server uses the first speech synthesis model corresponding with selection information to carry out language to the story content Sound conversion, obtained story voice.

In the present embodiment, when the type of playing request is the type of story playing request, using corresponding with the type Story plays the corresponding first speech synthesis model of scene and carries out voice conversion to story content, and obtained story voice meets The scene that current story plays, and obtained story voice is the sound of kinsfolk, realizes parent-offspring's reading, improves use Family uses the experience of Story machine.

Fig. 4 is the flow chart 3 of data processing method provided by the embodiments of the present application, referring to fig. 4, the method for the present embodiment, Include:

Step S301, the daily voice playing request of user's input is received, includes to be played in daily voice playing request Text；

Step S302, voice conversion is carried out to the text using the second speech synthesis model, obtains daily voice, wherein Second speech synthesis model is to carry out analyzing and training to the second sound data of the kinsfolk for the children being collected into be established Audio model；Second sound data are voice data of kinsfolk's sound under common dialogue scene；

Step S303 plays out daily voice.

Specifically, when the executing subject of the present embodiment is Story machine, for step S301, when some kinsfolk use After the terminal device of oneself sends one section of text to Story machine, Story machine can show or issue prompt information, prompt information instruction Receive text, if need to play, if user agrees to play, user can be inputted daily voice broadcasting by Story machine and be asked It asks.

Further, daily if cloud server obtains the corresponding second speech synthesis model of multiple kinsfolks In voice playing request further include: the selection information of the second speech synthesis model corresponding for each kinsfolk.It is understood that , the corresponding second speech synthesis model of which kinsfolk has been selected, whose sound will be the text be played using.

For example, if cloud server obtains and the corresponding second speech synthesis model of father and corresponding with mother Two speech synthesis models, then in daily voice playing request further include: the second speech synthesis model corresponding for father and The selection information of the second speech synthesis model corresponding with mother.

For step S302: carrying out voice conversion to the text using the second speech synthesis model, obtain daily voice, In In a kind of possible embodiment, comprising:

Voice conversion is carried out to the text using the second speech synthesis model being locally stored, obtains daily voice；

In this embodiment, before the daily voice playing request for receiving user's input, further includes:

Receive the second speech synthesis model that cloud server is sent, that is, each second language that cloud server will obtain Sound synthetic model is sent to terminal device, stores at terminal device.Terminal device receives straight after daily voice playing request It connects using the second local speech synthesis model is stored in text progress voice conversion to be played, obtains daily voice.

If including that " selection of the second speech synthesis model corresponding for each kinsfolk is believed in daily voice playing request Breath " then carries out voice conversion to text to be played using the second speech synthesis model being locally stored, obtains daily voice, Include:

Language is then carried out to text to be played using the second speech synthesis model corresponding with selection information being locally stored Sound conversion, obtains daily voice.

In alternatively possible embodiment, voice is carried out to text to be played using the second speech synthesis model and is turned It changes, obtains daily voice, comprising:

Cloud server is sent by text to be played, so that cloud server is treated using the second speech synthesis model The text of broadcasting carries out voice conversion, obtains daily voice；

Receive the daily voice that cloud server is sent.

In this embodiment, terminal device can send cloud server for text to be played, so that cloud service Device carries out voice conversion to text to be played using the second speech synthesis model, obtains daily voice.

If including that " selection of the first speech synthesis model corresponding for each kinsfolk is believed in daily voice playing request Breath " then receives the daily voice of cloud server transmission, comprising:

Receiving cloud server uses the second speech synthesis model corresponding with selection information to carry out text to be played Voice conversion, obtained daily voice.

In the present embodiment, under the type of this playing request of daily voice playing request, using the type corresponding day The corresponding second speech synthesis model of Chang Yuyin scene carries out voice conversion to text to be played, and obtained daily voice meets The scene of current daily voice or common dialogue, and obtained daily voice is the sound of kinsfolk, realizes parent-offspring Interaction improves the experience that user uses Story machine.

Fig. 5 is the structural schematic diagram one of the device of data processing provided by the embodiments of the present application, as shown in figure 5, this implementation The device of example may include: receiving module 41, literary periodicals module 42 and playing module 43；

Receiving module 41 includes content to be played in the playing request for receiving the playing request of user's input The type of information and the playing request；

Literary periodicals module 42 will be described for using speech synthesis model corresponding with the type of the playing request Content to be played carries out voice conversion, obtains voice；The speech synthesis model is the kinsfolk to the children being collected into Voice data carries out the audio model established of analyzing and training, the voice data be kinsfolk with the playing request Voice data under the corresponding scene of type；

Playing module 43, for being played out to the voice.

The device of the present embodiment can be used for executing the technical solution of above method embodiment, realization principle and technology Effect is similar, and details are not described herein again.

The literary periodicals module 42 is specifically used for:

The playing module 43, is specifically used for:

The story voice is played out.

In a kind of possible design, the literary periodicals module 42 is specifically used for:

The receiving module 41, is also used to: receiving the first speech synthesis model that cloud server is sent.

Alternatively, the text information of the story of required broadcasting.

Correspondingly, the literary periodicals module 42 is specifically used for:

The literary periodicals module 42 is specifically used for:

The playing module 43, is specifically used for:

The daily voice is played out.

Fig. 6 is the structural schematic diagram two of the device of data processing provided by the embodiments of the present application, as shown in fig. 6, this implementation It can also include: sending module 44, collection module 45 further on the basis of the device apparatus structure shown in Fig. 5 of example；

Collection module 45, for using speech synthesis model corresponding with the type by institute in the literary periodicals module It states content and carries out voice conversion, before obtaining voice, collect third of the kinsfolk of the children under common dialogue scene Each kinsfolk of voice data and the children are in the first voice data told a story under scene for the children；

Sending module 44, for first voice data and second sound data to be sent to cloud server, so that The cloud server: clustering is carried out to the third voice data, obtains corresponding with each kinsfolk second Voice data establishes generic sound database corresponding with each kinsfolk, establishes corresponding with each kinsfolk Individualized voice database；And for each kinsfolk, include to the corresponding generic sound database of the kinsfolk Second sound data be trained, obtain the second speech synthesis model of the kinsfolk, it is corresponding to the kinsfolk Individualized voice database the first voice data for including be trained, obtain the first speech synthesis mould of the kinsfolk Type.

Sending module 45, is also used to: cloud server is sent by story content corresponding to the narration information, so that The cloud server carries out voice conversion to the story content using the first speech synthesis model, obtains story voice；

Correspondingly, the receiving module 41 is also used to: receiving the story voice that the cloud server is sent.

The embodiment of the present application provides a kind of computer readable storage medium, and calculating is stored on computer readable storage medium Machine program executes the method in above method embodiment when the computer program is executed by processor.

Fig. 7 is the structural schematic diagram of terminal device provided by the embodiments of the present application, and referring to Fig. 7, the device of this implementation includes Processor 71, memory 72 and communication bus 73, communication bus 73 are used for the connection of an electronic device, wherein

Memory 71, for storing program；

Processor 72, for executing the described program of memory storage, when described program is performed, the processing Device is used to execute the method in above method embodiment.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence；And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.

Finally, it should be noted that the above various embodiments is only to illustrate the technical solution of the application, rather than its limitations；To the greatest extent Pipe is described in detail the application referring to foregoing embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, each embodiment technology of the application that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of data processing method characterized by comprising

The playing request of user's input is received, includes the information and the playing request of content to be played in the playing request Type；

Using speech synthesis model corresponding with the type of the playing request, the content to be played is subjected to voice conversion, Obtain voice；The speech synthesis model is to carry out analyzing and training to the voice data of the kinsfolk for the children being collected into be built Vertical audio model；The voice data is sound number of the kinsfolk under scene corresponding with the type of the playing request According to；

The voice is played out；

It is described that the content is carried out by voice conversion using speech synthesis model corresponding with the type, before obtaining voice, Further include:

Collect each family of third voice data and the children of the kinsfolk of the children under common dialogue scene Front yard member is in the first voice data told a story under scene for the children；

First voice data and third audio data transmitting are given to cloud server, so that the cloud server: to institute It states third voice data and carries out clustering, obtain second sound data corresponding with each kinsfolk, establish and each family The corresponding generic sound database of front yard member establishes individualized voice database corresponding with each kinsfolk；And And for each kinsfolk, the second sound data for including to the corresponding generic sound database of the kinsfolk are instructed Practice, obtains the second speech synthesis model of the kinsfolk, individualized voice database packet corresponding to the kinsfolk The first voice data included is trained, and obtains the first speech synthesis model of the kinsfolk.

2. the method according to claim 1, wherein

The type of the playing request is story playing request, then the story played needed for the information of the content to be played includes Information；

It is described that the content is carried out by voice conversion using speech synthesis model corresponding with the type of the playing request, it obtains Voice, comprising:

Voice conversion is carried out to story content corresponding to the narration information using the first speech synthesis model, obtains story language Sound, wherein the first speech synthesis model is carried out to the first voice data of the kinsfolk for the children being collected into The audio model that analyzing and training is established；First voice data is kinsfolk in the sound told a story under scene for children Data；

The voice is played out, comprising:

The story voice is played out.

3. according to the method described in claim 2, it is characterized in that, described believe the story using the first speech synthesis model The corresponding story content of breath carries out voice conversion, obtains story voice, comprising:

Voice conversion is carried out to story content corresponding to the narration information using the first speech synthesis model being locally stored, Obtain story voice；

Receive the first speech synthesis model that cloud server is sent.

4. according to the method described in claim 2, it is characterized in that, described believe the story using the first speech synthesis model The corresponding story content of breath carries out voice conversion, obtains story voice, comprising:

Receive the story voice that the cloud server is sent.

5. the method according to any one of claim 2~4, which is characterized in that the narration information played needed for described, packet It includes: the identification information of the story of required broadcasting；

Alternatively, the text information of the story of required broadcasting.

6. the method according to any one of claim 2~4, which is characterized in that in the playing request further include: to institute State the selection information of the corresponding first speech synthesis model of each kinsfolk of children；

Correspondingly, described carry out voice turn to story content corresponding to the narration information using the first speech synthesis model It changes, comprising:

Using the first speech synthesis model corresponding with selection information, language is carried out to story content corresponding to the narration information Sound conversion.

7. the method according to any one of claim 2~4, which is characterized in that in the playing request further include: for The selection information of the corresponding first speech synthesis model of father and the first speech synthesis model corresponding with mother；

8. the method according to claim 1, wherein

The type of the playing request is daily voice playing request, and the information of the content to be played includes text to be played This；

It is described to use speech synthesis model corresponding with the type of the playing request, the content to be played is subjected to voice and is turned It changes, obtains voice, comprising:

Voice conversion is carried out to the text to be played using the second speech synthesis model, obtains daily voice, wherein described Second speech synthesis model is to carry out analyzing and training to the second sound data of the kinsfolk for the children being collected into be established Audio model；The second sound data are second sound data of the kinsfolk under common dialogue scene；

The voice is played out, comprising:

The daily voice is played out.

9. a kind of device of data processing characterized by comprising

Receiving module, include for receiving the playing request of user's input, in the playing request content to be played information and The type of the playing request；

Literary periodicals module will be described to be played for using speech synthesis model corresponding with the type of the playing request Content carries out voice conversion, obtains voice；The speech synthesis model is the sound number to the kinsfolk for the children being collected into The audio model established according to analyzing and training is carried out, the voice data are kinsfolk in the type pair with the playing request Voice data under the scene answered；

Playing module, for being played out to the voice；

Collection module, for using speech synthesis model corresponding with the type by the content in the literary periodicals module Voice conversion is carried out, before obtaining voice, collects third sound number of the kinsfolk of the children under common dialogue scene Accordingly and each kinsfolk of the children is in the first voice data told a story under scene for the children；

Sending module, for giving first voice data and third audio data transmitting to cloud server, so that the cloud It holds server: clustering being carried out to the third voice data, obtains second sound number corresponding with each kinsfolk According to foundation generic sound database corresponding with each kinsfolk establishes personalization corresponding with each kinsfolk Audio database；And for each kinsfolk, include to the corresponding generic sound database of the kinsfolk second Voice data is trained, and obtains the second speech synthesis model of the kinsfolk, individual character corresponding to the kinsfolk Change the first voice data that audio database includes to be trained, obtains the first speech synthesis model of the kinsfolk.

10. device according to claim 9, which is characterized in that

The literary periodicals module is specifically used for:

Voice conversion is carried out to story content corresponding to the narration information using the first speech synthesis model, obtains story language Sound, wherein the first speech synthesis model is analyzed the first voice data of the kinsfolk for the children being collected into The established audio model of training；First voice data is kinsfolk's sound in the sound told a story under scene for children Data；

The playing module, is specifically used for:

The story voice is played out.

11. device according to claim 10, which is characterized in that the literary periodicals module is specifically used for:

12. device according to claim 10, which is characterized in that the sending module is also used to: by the narration information Corresponding story content is sent to cloud server, so that the cloud server is using the first speech synthesis model to described Story content carries out voice conversion, obtains story voice；

13. device described in any one of 0~12 according to claim 1, which is characterized in that the story letter played needed for described Breath, comprising: the identification information of the story of required broadcasting；

Alternatively, the text information of the story of required broadcasting.

14. device described in any one of 0~12 according to claim 1, which is characterized in that in the playing request further include: To the selection information of the corresponding first speech synthesis model of each kinsfolk of children；

Correspondingly, the literary periodicals module is specifically used for:

15. device described in any one of 0~12 according to claim 1, which is characterized in that in the playing request further include: The selection information of the first speech synthesis model corresponding for father and the first speech synthesis model corresponding with mother；

Correspondingly, the literary periodicals module is specifically used for:

16. device according to claim 9, which is characterized in that

The literary periodicals module is specifically used for:

Voice conversion is carried out to the text using the second speech synthesis model, obtains daily voice, wherein second voice Synthetic model is to carry out the audio model that analyzing and training is established to the second sound data of the kinsfolk for the children being collected into； The second sound data are second sound data of the kinsfolk under common dialogue scene；

The playing module, is specifically used for:

The daily voice is played out.

17. a kind of computer readable storage medium, which is characterized in that be stored with computer journey on computer readable storage medium Sequence, when the computer program is executed by processor, method described in any one of perform claim requirement 1 to 8.

18. a kind of device of data processing, which is characterized in that including processor and memory, wherein

Memory, for storing program；

Processor, for executing the described program of the memory storage, when described program is performed, the processor is used for Perform claim requires any method in 1 to 8.