CN110427459A - Visualized generation method, system and the platform of speech recognition network - Google Patents
Visualized generation method, system and the platform of speech recognition network Download PDFInfo
- Publication number
- CN110427459A CN110427459A CN201910719492.2A CN201910719492A CN110427459A CN 110427459 A CN110427459 A CN 110427459A CN 201910719492 A CN201910719492 A CN 201910719492A CN 110427459 A CN110427459 A CN 110427459A
- Authority
- CN
- China
- Prior art keywords
- language model
- general
- speech recognition
- corpus
- wfst
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000003993 interaction Effects 0.000 claims abstract description 34
- 238000004519 manufacturing process Methods 0.000 claims abstract description 33
- 238000012360 testing method Methods 0.000 claims description 50
- 238000011161 development Methods 0.000 claims description 30
- 230000000007 visual effect Effects 0.000 claims description 30
- 230000009193 crawling Effects 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 8
- 235000013399 edible fruits Nutrition 0.000 claims 2
- 238000012549 training Methods 0.000 abstract description 19
- 230000015572 biosynthetic process Effects 0.000 abstract description 6
- 238000003786 synthesis reaction Methods 0.000 abstract description 5
- 230000015654 memory Effects 0.000 description 17
- 238000003860 storage Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 10
- 238000004590 computer program Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 239000000463 material Substances 0.000 description 7
- 238000003672 processing method Methods 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000013515 script Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0638—Interactive procedures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The present invention discloses the visualized generation method of speech recognition network, this method comprises: receiving keyword by human-computer interaction interface.Current area field is chosen from preset multiple general field fields, each general field field corresponds to multiple default crawlers and corresponding multiple default Web crawl the page.Obtain general corpus.Obtain specific corpus.The general corpus of training obtains general language model and particular language model.After the WFST speech recognition network parallel connection of the WFST speech recognition network of general language model and particular language model, in conjunction with acoustic model and Pronounceable dictionary, passes through combination, determinization, minimizes operation synthesis WFST speech recognition network.By configuring system in identical platform, accelerates the training speed of language model, shorten production life cycle, shorten manpower consumption, save human cost.Simultaneously by the merging of general language model network and particular language model, the accuracy and efficiency of language identification are improved.
Description
Technical field
The invention belongs to the visualized generation method of the technical field of speech recognition more particularly to speech recognition network, it is
System and platform.
Background technique
Associated visualization language model manufacturing system is seldom currently on the market, and most language model production is all to order
Row level is enabled to be customized.The speech recognition aspect that is produced on of language model is held the balance, and each voice company has oneself
Team be responsible for model, but most of made under order line.In the prior art, model is carried out under order line
Customization process is uncontrollable, and the bad management of version, risk is uncontrollable, and process not enough simplifies.The reason of leading to drawbacks described above is,
Caused by order training pattern is manually entered with various scripts under order line.The manually training under order line lacks and continues
Effective supervision and check, cause process uncontrollable, risk is uncontrollable.The inefficient operation of order line is not able to satisfy the language of multitask
Say model training, process is complicated.Meanwhile in the prior art, the visualization in modelling is poor, is not easy to the production of model.
In view of the above-mentioned problems, the method solved these problems currently on the market have it is as follows: language model training standard process
Formulation, script specification management, the unified management of data, develop more effective scripts, make each automate, arrange step by step
More people carry out intersecting the modes such as discs.These solutions above-mentioned and unresolved the problem of being merged and not from entirety
It is upper to go to solve the problems, such as with a complete system.
It follows that in the prior art speech recognition when used visual speech recognition network, in generating process
In respectively to customize process uncontrollable, and version is not easy to manage, and is not able to satisfy the language model training of multitask.Meanwhile model system
Visuality in work is poor, editor while multiple users is not easy to, to reduce speech recognition modeling formation efficiency and standard
True property.
Summary of the invention
Embodiment of the present invention provide language model generation method and unit, at least solve above-mentioned technical problem it
One.
In a first aspect, providing a kind of visualized generation method of speech recognition network, this method can operate in the end Web,
Method includes:
Step S101 receives keyword by human-computer interaction interface.It chooses and works as from preset multiple general field fields
Preceding field field, each general field field, which corresponds to, multiple default crawls word and corresponding multiple default Web crawl the page.
Step S102, according to current area field obtain it is corresponding it is default crawl word, crawl word according to default and working as front neck
The corresponding multiple default Web of domain field crawl the page swash win the first place crawl as a result, according to first crawl result obtain common language
Material collection.
Keyword is set as current crawler and crawls word by step S103, is crawled word according to current crawler and is searched at the end Web from setting
It indexes to climb to win the second place in the back page held up and crawl as a result, crawling result according to second obtains specific corpus.
Step S104 is trained the general language model for generating arpa format based on general corpus, is based on specific language
Material collection is trained the particular language model for generating arpa format.The file information of general language model and particular language model
It include the version number with mark action in the file information.
Step S105 merges general language model and particular language model, in conjunction with acoustic model and Pronounceable dictionary number
According to rear synthesis WFST speech recognition network.
It further include step S106 after step S105, according to multiple configurations in a kind of preferred embodiment of the present invention
The setting test set of interface tests WFST speech recognition network respectively, obtains the test identification data of the interface of multiple configurations, shows
Show the test identification data of the interface of multiple configurations, includes the identification information of the interface of corresponding configuration in test identification data.
In a kind of preferred embodiment of the present invention, in step S102 further include: step S1021 passes through the language that scores
Model gives a mark to the entry that general corpus is concentrated, and obtains the corresponding scoring of entry, if the scoring of entry is greater than setting threshold values, protects
Entry is stayed, deletes entry if it is not, then concentrating from general corpus.
It further include that step S1031 obtains specific corpus in step S103 in a kind of preferred embodiment of the present invention
It concentrates each entry to sort in setting search engine, sets item from first backward sequence in interception setting search engine sequence
Several entries updates specific corpus.
In a kind of preferred embodiment of the present invention, it is trained in step S104 based on general corpus and generates arpa
The step of general language model of format includes that addition setting mandatory parameter button, sets if receiving on human-computer interaction interface
Determine the selection information of mandatory parameter button, is then trained the general language model for generating arpa format based on general corpus.
Test the step of WFST speech recognition network in step S106 respectively according to the setting test set of the interface of multiple configurations
It suddenly include the addition setting mandatory parameter button on human-computer interaction interface, if receiving the selection letter of setting mandatory parameter button
Breath then tests WFST speech recognition network according to setting test set.
In a kind of preferred embodiment of the present invention, general language model and particular language model merge in step S105
The step of are as follows: general language model is converted into WFST form, particular language model is converted into WFST form, is being converted to
The general language model of WFST form and be converted to WFST form particular language model first node before increase starting and save
Point merges general language model and particular language model.
In a kind of preferred embodiment of the present invention, wherein step S102 further includes generating on human-computer interaction interface
The operation key of step S102, if the end of run of step S101, the operation key of starting step S102.Step S103 is also wrapped
It includes, the operation key of generation step S103 on human-computer interaction interface, if the end of run of step S102, starting step S103
Operation key.
Step S104 further includes the operation key of generation step S104 on human-computer interaction interface, if the fortune of step S103
Row terminates, then the operation key of starting step S104.Step S105 further includes the generation step S105 on human-computer interaction interface
Run key, if the end of run of step S104, the operation key of starting step S105.
Second aspect provides a kind of Visual Production system of speech recognition network, including, it is user interaction unit, general
Corpus acquiring unit, specific corpus acquiring unit, language model acquiring unit and WFST speech recognition network acquiring unit.
User interaction unit is configured to receive keyword by human-computer interaction interface.From preset multiple general field words
Choose current area field in section, each general field field, which correspond to, multiple default crawls word and the multiple default Web of correspondence are crawled
The page.
General corpus acquiring unit is configured to crawl word according to corresponding preset of current area field acquisition, according to default
It crawls word and crawls the page in the corresponding multiple default Web of current area field and swash to win the first place and crawl as a result, being crawled according to first
As a result general corpus is obtained.
Specific corpus acquiring unit, is configured to for keyword to be set as current crawler to crawl word, crawls word according to current crawler
It climbs to win the second place from the back page of setting search engine at the end Web and crawl as a result, crawling result according to second obtains specific language
Material collection.
Language model acquiring unit is configured to general corpus and is trained the all-purpose language mould for generating arpa format
Type is trained the particular language model for generating arpa format based on specific corpus.The file information of general language model and
It include the version number with mark action in the file information of particular language model.
WFST speech recognition network acquiring unit merges general language model and particular language model, in conjunction with acoustic mode
WFST speech recognition network is synthesized after type and Pronounceable dictionary data.
In a kind of preferred embodiment of Visual Production system of the present invention, including, test cell.
Test cell is configured to test WFST speech recognition net respectively according to the setting test set of the interface of multiple configurations
Network obtains the test identification data of the interface of multiple configurations, shows the test identification data of the interface of multiple configurations, test identification
The identification information of interface in data including corresponding configuration.
The third aspect, the Visual Production platform for providing speech recognition network of the invention load the present invention on platform
In speech recognition network Visual Production system, system can make multiple development groups while operate, multiple development groups it is every
It include multiple developers in group, each developer is able to use a separate unit.Separate unit is that the voice in the present invention is known
The single unit in Visual Production system in the Visual Production system of other network.
Visual Production platform is configured to store the general language model generated or used in multiple development groups and spy
Determine language model, version number and spy of the Visual Production platform according to the general language model generated or used in multiple development groups
The version number for determining language model establishes multiple version number's corresponding relationships.
Current development group can be selected from the general language model and particular language model that Visual Production platform is stored
Take "current" model.If "current" model is deleted, replaces or edited to current development group, Visual Production platform is according to multiple version numbers
Corresponding relationship notifies corresponding development group, current development group to operate "current" model according to the return information of corresponding development group.
Fourth aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one
Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute
It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment party of the present invention
The step of method of formula.
5th aspect, embodiment of the present invention also provide a kind of computer program product, the computer program product packet
The computer program being stored on non-volatile computer readable storage medium storing program for executing is included, the computer program includes program instruction,
When described program instruction is computer-executed, the step of making the computer execute the method for any embodiment of the present invention.
The present invention shortens product by system configuration is carried out in identical platform, accelerating the training speed of language model
Period, and the product between multi-user is helped to be isolated.Manpower consumption can be shortened, save human cost.Pass through common language simultaneously
The merging for saying prototype network and particular language model, improves the accuracy and efficiency of language identification.
Detailed description of the invention
It, below will be to required in embodiment description in order to illustrate more clearly of the technical solution of embodiment of the present invention
The attached drawing used is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present invention, right
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings
His attached drawing.
Fig. 1 is the flow chart of the visualized generation method for the speech recognition network that an embodiment of the present invention provides.
Fig. 2 is the flow chart of the visualized generation method for the speech recognition network that another embodiment of the present invention provides.
Fig. 3 is the subdivision flow chart in the step S102 that an embodiment of the present invention provides.
Fig. 4 is the subdivision flow chart in the step S103 that an embodiment of the present invention provides.
Fig. 5 is the combination for the Visual Production system for additionally providing speech recognition network that an embodiment of the present invention provides
Schematic diagram.
Fig. 6 is the group for the Visual Production system for additionally providing speech recognition network that another embodiment of the present invention provides
Close schematic diagram.
Fig. 7 is the flow chart of the visualized generation method for the speech recognition network that another embodiment of the invention provides.
Fig. 8 is the structural schematic diagram for the electronic equipment that an embodiment of the present invention provides.
Specific embodiment
To keep the purposes, technical schemes and advantages of embodiment of the present invention clearer, implement below in conjunction with the present invention
The technical solution in embodiment of the present invention is clearly and completely described in attached drawing in mode, it is clear that described reality
The mode of applying is some embodiments of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ability
Domain those of ordinary skill every other embodiment obtained without creative efforts, belongs to the present invention
The range of protection.
One aspect of the present invention, improves the visualized generation method of speech recognition network, and this method can operate in Web
End, as shown in Figure 1, the visualized generation method of the speech recognition network in the present invention includes:
Step S101 obtains keyword and general field field.
In this step, keyword is received by human-computer interaction interface;It chooses and works as from preset multiple general field fields
Preceding field field, each general field field, which corresponds to, multiple default crawls word and corresponding multiple default Web crawl the page.
For example, multiple general field fields include " circuit ", " chemistry ", " machinery " 3 general field fields.By above-mentioned 3
A field is shown on the interactive interface of user terminal.Such as, user selects " electronics " on interactive interface, as current general neck
Domain field.The equipment that its user terminal interacts interface display is intelligent terminal or touch-screen equipment.User terminal it is local or
It locally can be realized the remote port remotely connecting with user terminal, prestore " electronics ", " chemistry ", multiple default corresponding to " machinery "
Crawlers or information, and page info is crawled with Web corresponding to " electronics ", " chemistry ", " machinery ".As with " electronics " institute
Corresponding Web crawl page info be electronics industry science popularization, using etc. the website Web that uses under occasions crawl webpage.
In addition, user inputs keyword in interactive interface by user.The keyword refer to user with general field word
In the corresponding field of section, the corresponding field for especially needing to identify.Such as when the user field field that user selects is " circuit "
When, input keyword can be special circuits terms such as " discrete device circuits ", " integrated device circuit " and " analog circuit ".
To be conducive to improve the preparatory of corpus.
Step S102 obtains general corpus.
In this step, according to current area field obtain it is corresponding it is default crawl word, crawl word according to default and working as front neck
The corresponding multiple default Web of domain field crawl the page swash win the first place crawl as a result, according to first crawl result obtain common language
Material collection.
Step S103 obtains specific corpus.
In this step, keyword is set as current crawler and crawls word, word is crawled according to current crawler and is searched at the end Web from setting
It indexes to climb to win the second place in the back page held up and crawl as a result, crawling result according to second obtains specific corpus.
Step S104 obtains general language model and particular language model.
In this step, it is trained the general language model for generating arpa format based on general corpus, is based on specific language
Material collection is trained the particular language model for generating arpa format;The file information of general language model and particular language model
It include the version number with mark action in the file information.
Step S105 synthesizes WFST speech recognition network.
General language model and particular language model are merged, in conjunction with being synthesized after acoustic model and Pronounceable dictionary data
WFST speech recognition network.
To by by the WFST speech recognition of the WFST speech recognition network of general language model and particular language model
The parallel connection of network can take into account all-purpose language identification and language-specific identification, and can be in same identification net in speech recognition
It polymerize two kinds of identification methods in network, improves in a certain specific area, the accuracy of language identification.
In a preferred embodiment, as shown in Fig. 2, further including after step S105,
Step S106 tests WFST speech recognition network.
In this step, WFST speech recognition network is tested according to the setting test set of the interface of multiple configurations respectively, obtained
The test of the interface of multiple configurations identifies data, shows the test identification data of the interface of multiple configurations, and test identifies in data
The identification information of interface including corresponding configuration.
In a preferred embodiment, as shown in figure 3, in step S102 further include:
Step S1021 gives a mark to the entry that general corpus is concentrated.
In this step, is given a mark by scoring language model to the entry that general corpus is concentrated, obtains the corresponding scoring of entry,
If the scoring of entry is greater than setting threshold values, is concentrated from general corpus and retain the entry, deleted if it is not, then being concentrated from general corpus
The entry.To the screening for the entry concentrated to general corpus, reduces the deviation ratio of entry and entry memory space can be reduced, mention
High entry arithmetic speed.
In a preferred embodiment, as shown in figure 4, in step S103 further include:
Step S1031, specific corpus entry optimization.
In this step, obtains each entry in specific corpus and sort in setting search engine, interception setting search engine
The entry for setting item number in sequence from first backward sequence updates specific corpus.To the word in specific corpus
Item optimizes, and selects the higher word of frequency of use, improves entry versatility, reduces entry memory space, and then improves entry
Arithmetic speed.
In a preferred embodiment, it is trained in step S104 based on general corpus and generates arpa format
The step of general language model includes,
The addition setting mandatory parameter button on human-computer interaction interface, if receiving the selection letter of setting mandatory parameter button
Breath is then trained the general language model for generating arpa format based on general corpus.
Test the step of WFST speech recognition network in step S106 respectively according to the setting test set of the interface of multiple configurations
Suddenly include,
The addition setting mandatory parameter button on human-computer interaction interface, if receiving the selection letter of setting mandatory parameter button
Breath then tests WFST speech recognition network according to setting test set.
By setting " mandatory parameter ", to provide the error rate of developer in the process of development, and then exploitation matter is improved
Amount.
In a preferred embodiment, in step S105 by by the WFST speech recognition network of general language model and
The step of the WFST speech recognition network parallel connection of particular language model are as follows: general language model is converted into WFST form, it will be special
Attribute says that model conversion is WFST form, is being converted to the general language model of WFST form and is being converted to the specific of WFST form
Increase a start node before the first node of language model, merges general language model and particular language model.
In a preferred embodiment, step S102 further includes the generation step S102 on human-computer interaction interface
Run key, if the end of run of step S101, the operation key of starting step S102.Step S103 further includes, man-machine
The operation key of generation step S103 on interactive interface, if the end of run of step S102, the operation of starting step S103 is pressed
Key.
Step S104 further includes the operation key of generation step S104 on human-computer interaction interface, if the fortune of step S103
Row terminates, then the operation key of starting step S104.
Step S105 further includes the operation key of generation step S105 on human-computer interaction interface, if the fortune of step S104
Row terminates, then the operation key of starting step S105.
On the one hand the visualization of operating process is improved, Limit exploitation person improves voice knowledge by execution or development scheduling
The consistency and normalization that other network model generates, while development efficiency is improved because reducing maloperation.
In terms of another kind of the invention, as shown in figure 5, the present invention also provides the Visual Productions of speech recognition network
System.The system includes user interaction unit 101, general corpus acquiring unit 201, specific corpus acquiring unit 301, language
Model acquiring unit 401 and WFST speech recognition network acquiring unit 501.
User interaction unit 101 is configured to receive keyword by human-computer interaction interface;From preset multiple general fields
Choose current area field in field, each general field field, which correspond to, multiple default crawls word and the multiple default Web of correspondence are climbed
Take the page.
General corpus acquiring unit 201 is configured to crawl word according to corresponding preset of current area field acquisition, according to pre-
It crawls if crawling word and crawling the page in the corresponding multiple default Web of current area field and swash to win the first place as a result, being climbed according to first
Result is taken to obtain general corpus.
Specific corpus acquiring unit 301, is configured to for keyword to be set as current crawler to crawl word, is crawled according to current crawler
Word the end Web from setting search engine back page in climb win the second place crawl as a result, according to second crawl result obtain it is specific
Corpus.
Language model acquiring unit 401 is configured to general corpus and is trained the common language for generating arpa format
It says model, the particular language model for generating arpa format is trained based on specific corpus;The file of general language model is believed
It include the version number with mark action in the file information of breath and particular language model.
WFST speech recognition network acquiring unit 501 is configured to that general language model and particular language model will be merged,
In conjunction with synthesis WFST speech recognition network after acoustic model and Pronounceable dictionary data.
In a kind of embodiment of the Visual Production system of speech recognition network of the invention, as shown in fig. 6, also wrapping
It includes, test cell 601.Test cell 601 is configured to test WFST language respectively according to the setting test set of the interface of multiple configurations
Sound identifies network, obtains the test identification data of the interface of multiple configurations, shows the test identification data of the interface of multiple configurations,
It include the identification information of the interface of corresponding configuration in test identification data.
In another aspect of the invention, the Visual Production platform that body has used speech recognition network is gone back, is loaded on platform
The Visual Production system of speech recognition network in the present invention.The system can make multiple development groups while operate, Duo Gekai
It include multiple developers in every group of hair group, each developer is able to use a separate unit.Separate unit is speech recognition
Single unit in the Visual Production system of network, for example, user interaction unit 101, general corpus acquiring unit 201, spy
One in attribute material acquiring unit 301, language model acquiring unit 401 and WFST speech recognition network acquiring unit 501.
Visual Production platform is configured to store the general language model generated or used in multiple development groups and spy
Determine language model, and according to the version number of the general language model generated or used in multiple development groups and particular language model
Version number establishes multiple version number's corresponding relationships.
Current development group can be selected from the general language model and particular language model that Visual Production platform is stored
Take "current" model.If "current" model is deleted, replaces or edited to current development group, Visual Production platform is according to multiple version numbers
Corresponding relationship notifies corresponding development group, current development group to operate "current" model according to the return information of corresponding development group.To keep away
When exempting from more developers' progress identical platform exploitations, because of the resource contention caused by resource-sharing.Improve the reliability of development platform
And consistency.
It is worth noting that, the unit in embodiment disclosed by the invention is not limited to the scheme of the disclosure, separately
Outside, it can also realize that related function module, such as separation module can also be realized with processor by hardware processor, herein
It repeats no more.
In another embodiment of the invention, the visualized generation method of another speech recognition network is provided.
This method comprises:
1) because this software systems is a complete platform can be serviced by being limited in a program to allow
Device goes to carry out Row control, and artificial forgetting is avoided to execute some step;
Above-mentioned Row control includes three aspects:
1, mandatory parameter button is set during training pattern, test etc., and essential option is unselected, cannot continue to execute the behaviour
Make.
2, to test be a whole set of process from training, there is sequencing, when preceding step does not operate, behind the step of it is aobvious
It is shown as grey.
3, system can verify parameter, such as: check whether Pronounceable dictionary matches with word dictionary, if not
With return error message.
2) to version, the software systems can provide special visual Version Control, and can pass through program checkout
Related dependant avoids changing other versions due to deleting, modifying operation between version;Here related dependant refers to: holding
When the operations such as row deletion, modification, system has checked whether that other model has used the version mould by way of tabling look-up
Type, and prompt information to be confirmed is returned, it clicks and is just deleted, modified after confirming
3) because the software systems are to provide a web operation interface, this system ratio operates in order line
Simply.Simplify flow operations;
4) Row control, Version Control and simple flow are carried out by way of system program, can thus reduces language
Say the risk of model training.
With reference to Fig. 7, firstly, the production of language model needs data to support, so the first step carries out the management of corpus.This
The collection of part corpus includes the crawling of network corpus, the generation of artificial corpus.The management of corpus includes the normalizing of corpus
Change, delete the operation such as mobile.
The above-mentioned corpus that crawls is divided to two kinds: the first, the crawler of better fixed network has been set in system, when needing certain
When a FIELD Data, the field that chooses on web page starts to crawl.Second: provide keyword crawls user in web terminal
Keyword is filled in, then system crawler can be searched in major search engine, extract text in returning to entry.Screen text
Method: general language model is given a mark, and when score is more than some threshold value, is retained entry, is otherwise deleted;Existed according to entry
Putting in order in search engine, the entry of N before extracting.
Second, there is corpus to need to carry out the training of arpa language model later, contains general language model and generate, is fixed
Language model processed generates, language model manages three parts.Above-mentioned language model management includes providing the button of deletion, movement, is deleted
Except the language model referred in deletion file system;It is mobile that mobile storage position is mainly carried out in file system.
Above-mentioned training refers to that general language model is mainly user by choosing the language in the various big fields that system is set
Material is trained to obtain.Custom language models be by user provide keyword crawl corpus and directly corpus be trained
Language model.The two difference is mainly that the selection of corpus is different, can be operated in the different pages, can be embodied on model ID
The two difference
Third, resource management module are mainly raw to the language model combination acoustic model and Pronounceable dictionary that have generated
At WFST speech recognition network.It is to pass through that language model combination acoustic model and Pronounceable dictionary, which generate WFST speech recognition network,
Combination, the operation of determinization, minimum obtain after merging.General, custom language models WFST network union are main
It is to increase a start node by two network foremosts, two networks is made to be together in parallel.It can when identifying decoding
Search for general and custom language models WFST speech recognition networks.
In order to which the purpose for realizing Project settings provides the operation to two WFST speech recognition network union.Resource management
It is supplied to the input of decoder module, so being to be connected to language model module and decoder module.The module additionally provides WFST money
The function of source control.
4th, decoding test and management can mainly provide various configurations with performance of the statistical test collection in new resource
Interface.
Web terminal is built by front-end technology such as html, css, realizes visualization effect.
Server end builds the interface for calling data processing, model training and test with flask, is communicated with front end.
The transmission of data is carried out by way of json.
The operations such as the specific data processing of bottom, model training combine the kit of open source, using python language as carrier
Source code is carried out to write.
In other embodiments, embodiment of the present invention additionally provides a kind of nonvolatile computer storage media,
Computer storage medium is stored with computer executable instructions, which, which can be performed above-mentioned any means, implements
Speech processing and application method in mode;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer
It enables, computer executable instructions setting are as follows:
Step S101 receives keyword by human-computer interaction interface;It chooses and works as from preset multiple general field fields
Preceding field field, each general field field corresponds to multiple default crawlers and corresponding multiple default Web crawl the page;
Step S102, according to the corresponding default crawler of current area, the corresponding multiple default Web in field are crawled in this prior
It is crawled on the page, obtains general corpus according to result is crawled;
Keyword is set as current crawler by step S103, and current crawler is at the end Web, from the return page of setting search engine
It is crawled in face, obtains specific corpus according to result is crawled;
Step S104 obtains general language model by the general corpus of arpa language model training, passes through arpa language
The specific corpus of model training obtains particular language model;The file information of general language model and the file of particular language model
It include the version number with mark action in information;
Step S105, by the WFST speech recognition of the WFST speech recognition network of general language model and particular language model
After network is in parallel, in conjunction with acoustic model and Pronounceable dictionary, by combination, determinization, operation synthesis WFST voice knowledge is minimized
Other network.
As a kind of non-volatile computer readable storage medium storing program for executing, it can be used for storing non-volatile software program, non-volatile
Property computer executable program and module, as the corresponding program of audio signal processing method in embodiment of the present invention refers to
Order/module.One or more program instruction is stored in non-volatile computer readable storage medium storing program for executing, when being executed by processor
When, execute the audio signal processing method in above-mentioned any means embodiment.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey
It sequence area can application program required for storage program area, at least one function;Storage data area can be stored according to voice signal
Processing unit uses created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include that high speed is random
Access memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other
Non-volatile solid state memory part.In some embodiments, it includes opposite that non-volatile computer readable storage medium storing program for executing is optional
In the remotely located memory of processor, these remote memories can pass through network connection to Speech processing unit.On
The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Embodiment of the present invention also provides a kind of computer program product, and computer program product is non-volatile including being stored in
Computer program on property computer readable storage medium, computer program includes program instruction, when program instruction is by computer
When execution, computer is made to execute any of the above-described audio signal processing method.
Fig. 8 is the structural schematic diagram for the electronic equipment that embodiment of the present invention provides, as shown in figure 8, the equipment includes: one
A or multiple processors 710 and memory 720, in Fig. 7 by taking a processor 710 as an example.Audio signal processing method is set
Standby can also include: input unit 730 and output unit 740.Processor 710, memory 720, input unit 730 and output are single
Member 740 can be connected by bus or other modes, in Fig. 7 for being connected by bus.Memory 720 is above-mentioned non-
Volatile computer readable storage medium storing program for executing.The non-volatile software journey that processor 710 is stored in memory 720 by operation
Sequence, instruction and module, thereby executing the various function application and data processing of server, i.e. realization above method embodiment party
Formula audio signal processing method.Input unit 730 can receive the number or character information of input, and generate single with information dispensing
The related key signals input of the user setting and function control of member.Output unit 740 may include that display screen etc. shows equipment.
The said goods can be performed embodiment of the present invention provided by method, have the corresponding functional module of execution method and
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by embodiment of the present invention.
As an implementation, above-mentioned electronic equipment can be applied to the Visual Production platform of speech recognition network
In, comprising: at least one processor;And the memory being connect at least one processor communication;Wherein, memory stores
There is the instruction that can be executed by least one processor, instruction is executed by least one processor, so that at least one processor energy
It is enough:
Keyword is received by human-computer interaction interface;Current area word is chosen from preset multiple general field fields
Section, each general field field corresponds to multiple default crawlers and corresponding multiple default Web crawl the page;
According to the corresponding default crawler of current area, the corresponding multiple default Web in field crawl the page and swash in this prior
It takes, obtains general corpus according to result is crawled;
Keyword is set as current crawler, current crawler crawls from the back page of setting search engine at the end Web,
Specific corpus is obtained according to result is crawled;
General language model is obtained by the general corpus of arpa language model training, it is special by the training of arpa language model
Determine corpus and obtains particular language model;Include in the file information of general language model and the file information of particular language model
Version number with mark action.
The WFST speech recognition network of the WFST speech recognition network of general language model and particular language model is in parallel
Afterwards, in conjunction with acoustic model and Pronounceable dictionary, by combination, determinization, operation synthesis WFST speech recognition network is minimized.
The electronic equipment of embodiment of the present invention exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic units with data interaction function.
Unit embodiment described above is only schematical, wherein unit can be with as illustrated by the separation member
It is or may not be and be physically separated, component shown as a unit may or may not be physical unit,
Can be in one place, or may be distributed over multiple network units.It can select according to the actual needs wherein
Some or all of the modules realize the purpose of present embodiment scheme.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
The method of certain parts of mode or embodiment.
Finally, it should be noted that embodiment of above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
Invention is explained in detail referring to aforementioned embodiments for pipe, those skilled in the art should understand that: its according to
It can so modify to technical solution documented by aforementioned each embodiment, or part of technical characteristic is equal
Replacement;And these are modified or replaceed, each embodiment technical solution of the present invention that it does not separate the essence of the corresponding technical solution
Spirit and scope.
Claims (10)
1. the visualized generation method of speech recognition network, this method can operate in the end Web, which comprises
Step S101 receives keyword by human-computer interaction interface;It is chosen from preset multiple general field fields and works as front neck
Domain field, each general field field, which corresponds to, multiple default crawls word and corresponding multiple default Web crawl the page;
Step S102, obtains corresponding preset according to the current area field and crawls word, crawls word in institute according to described preset
It states the corresponding multiple default Web of current area field and crawls the page and swash to win the first place and crawl as a result, crawling knot according to described first
Fruit obtains general corpus;
The keyword is set as current crawler and crawls word by step S103, crawls word at the end Web from setting according to the current crawler
Determine to climb in the back page of search engine to win the second place and crawl as a result, crawling result according to described second obtains specific corpus;
Step S104 is trained the general language model for generating arpa format based on the general corpus, is based on the spy
Determine corpus and is trained the particular language model for generating arpa format;The file information of the general language model and specific language
Say that in the file information of model include the version number with mark action;
Step S105 merges the general language model and the particular language model, in conjunction with acoustic model and pronunciation word
WFST speech recognition network is synthesized after allusion quotation data.
2. according to the method described in claim 1, further include after the step S105,
Step S106 tests the WFST speech recognition network according to the setting test set of the interface of multiple configurations respectively, obtains
The test of the interface of the multiple configuration identifies data, shows the test identification data of the interface of the multiple configuration, the survey
It include the identification information of the interface of corresponding configuration in examination identification data.
3. according to the method described in claim 1, in the step S102 further include:
Step S1021, the entry marking concentrated by scoring language model to general corpus obtain that the entry is corresponding to be commented
Point, if the scoring of the entry is greater than setting threshold values, retain the entry, is deleted if it is not, then being concentrated from the general corpus
The entry.
4. according to the method described in claim 1, further include in the step S103,
Step S1031 obtains each entry in the specific corpus and sorts in the setting search engine, intercepts the setting
The entry for setting item number in search engine sequence from first backward sequence updates the specific corpus.
5. according to the method described in claim 2, being trained life based on the general corpus described in the step S104
At arpa format general language model the step of include,
The addition setting mandatory parameter button on human-computer interaction interface, if receiving the selection letter of the setting mandatory parameter button
Breath is then trained the general language model for generating arpa format based on the general corpus;
The WFST speech recognition is tested according to the setting test set of the interface of multiple configurations respectively described in the step S106
The step of network includes,
The addition setting mandatory parameter button on human-computer interaction interface, if receiving the selection letter of the setting mandatory parameter button
Breath then tests the WFST speech recognition network according to setting test set.
6. according to the method described in claim 1, general language model described in the step S105 and the language-specific mould
The step of type merges are as follows:
The general language model is converted into WFST form, the particular language model is converted into WFST form, is being converted
For WFST form general language model and be converted to WFST form particular language model first node before increase by one starting
Node merges the general language model and the particular language model.
7. according to the method described in claim 1, wherein,
The step S102 further includes the operation key of generation step S102 on human-computer interaction interface, if the fortune of step S101
Row terminates, then starts the operation key of the step S102;
The step S103 further includes the operation key of generation step S103 on human-computer interaction interface, if the fortune of step S102
Row terminates, then starts the operation key of the step S103;
The step S104 further includes the operation key of generation step S104 on human-computer interaction interface, if the fortune of step S103
Row terminates, then starts the operation key of the step S104;
The step S105 further includes the operation key of generation step S105 on human-computer interaction interface, if the fortune of step S104
Row terminates, then starts the operation key of the step S105.
8. the Visual Production system of speech recognition network, including, user interaction unit, general corpus acquiring unit, specific language
Expect acquiring unit, language model acquiring unit and WFST speech recognition network acquiring unit;
The user interaction unit is configured to receive keyword by human-computer interaction interface;From preset multiple general field words
Choose current area field in section, each general field field, which correspond to, multiple default crawls word and the multiple default Web of correspondence are crawled
The page;
The general corpus acquiring unit is configured to crawl word according to corresponding preset of current area field acquisition, according to
It is described it is default crawl word the corresponding multiple default Web of the current area field crawl the page swash win the first place crawl as a result,
Result, which is crawled, according to described first obtains general corpus;
The specific corpus acquiring unit, is configured to the keyword being set as current crawler to crawl word, is currently climbed according to described
Worm crawls word and climbs to win the second place and crawl as a result, crawling knot according to described second from the back page of setting search engine at the end Web
Fruit obtains specific corpus;
The language model acquiring unit is configured to the general corpus and is trained the common language for generating arpa format
It says model, the particular language model for generating arpa format is trained based on the specific corpus;The general language model
The file information and particular language model the file information in include the version number with mark action;
The WFST speech recognition network acquiring unit merges the general language model and the particular language model, knot
WFST speech recognition network is synthesized after closing acoustic model and Pronounceable dictionary data.
9. system according to claim 8, including, test cell;
The test cell is configured to test the WFST speech recognition respectively according to the setting test set of the interface of multiple configurations
Network obtains the test identification data of the interface of the multiple configuration, shows the test identification number of the interface of the multiple configuration
According to the identification information of the interface including corresponding configuration in the test identification data.
10. the Visual Production platform of speech recognition network, the system in the claim 8 or 9, institute are loaded on the platform
The system of stating can make multiple development groups while operate, and include multiple developers, each exploitation in every group of the multiple development group
Person is able to use a separate unit;The separate unit is the list in the Visual Production system in the claim 8 or 9
Unit one;
The Visual Production platform is configured to store the general language model generated or used in the multiple development group
And particular language model, the Visual Production platform is according to the all-purpose language generated or used in the multiple development group
The version number of model and the version number of particular language model establish multiple version number's corresponding relationships;
Current development group can be selected from the general language model and particular language model that the Visual Production platform is stored
Take "current" model;If the current development group deletes, replaces or edit the "current" model, the Visual Production platform root
Corresponding development group is notified according to the multiple version number's corresponding relationship, and the current development group is according to the return information of corresponding development group
Operate the "current" model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910719492.2A CN110427459B (en) | 2019-08-05 | 2019-08-05 | Visual generation method, system and platform of voice recognition network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910719492.2A CN110427459B (en) | 2019-08-05 | 2019-08-05 | Visual generation method, system and platform of voice recognition network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110427459A true CN110427459A (en) | 2019-11-08 |
CN110427459B CN110427459B (en) | 2021-09-17 |
Family
ID=68414250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910719492.2A Active CN110427459B (en) | 2019-08-05 | 2019-08-05 | Visual generation method, system and platform of voice recognition network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110427459B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111145727A (en) * | 2019-12-02 | 2020-05-12 | 云知声智能科技股份有限公司 | Method and device for recognizing digital string by voice |
CN111933146A (en) * | 2020-10-13 | 2020-11-13 | 苏州思必驰信息科技有限公司 | Speech recognition system and method |
CN111951788A (en) * | 2020-08-10 | 2020-11-17 | 百度在线网络技术(北京)有限公司 | Language model optimization method and device, electronic equipment and storage medium |
CN113111642A (en) * | 2020-01-13 | 2021-07-13 | 京东方科技集团股份有限公司 | Natural language identification model generation method, natural language processing method and equipment |
CN113223522A (en) * | 2021-04-26 | 2021-08-06 | 北京百度网讯科技有限公司 | Speech recognition method, apparatus, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923854A (en) * | 2010-08-31 | 2010-12-22 | 中国科学院计算技术研究所 | Interactive speech recognition system and method |
EP2309487A1 (en) * | 2009-09-11 | 2011-04-13 | Honda Research Institute Europe GmbH | Automatic speech recognition system integrating multiple sequence alignment for model bootstrapping |
CN102760436A (en) * | 2012-08-09 | 2012-10-31 | 河南省烟草公司开封市公司 | Voice lexicon screening method |
CN107705787A (en) * | 2017-09-25 | 2018-02-16 | 北京捷通华声科技股份有限公司 | A kind of audio recognition method and device |
CN108492820A (en) * | 2018-03-20 | 2018-09-04 | 华南理工大学 | Chinese speech recognition method based on Recognition with Recurrent Neural Network language model and deep neural network acoustic model |
CN109976702A (en) * | 2019-03-20 | 2019-07-05 | 青岛海信电器股份有限公司 | A kind of audio recognition method, device and terminal |
-
2019
- 2019-08-05 CN CN201910719492.2A patent/CN110427459B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2309487A1 (en) * | 2009-09-11 | 2011-04-13 | Honda Research Institute Europe GmbH | Automatic speech recognition system integrating multiple sequence alignment for model bootstrapping |
CN101923854A (en) * | 2010-08-31 | 2010-12-22 | 中国科学院计算技术研究所 | Interactive speech recognition system and method |
CN102760436A (en) * | 2012-08-09 | 2012-10-31 | 河南省烟草公司开封市公司 | Voice lexicon screening method |
CN107705787A (en) * | 2017-09-25 | 2018-02-16 | 北京捷通华声科技股份有限公司 | A kind of audio recognition method and device |
CN108492820A (en) * | 2018-03-20 | 2018-09-04 | 华南理工大学 | Chinese speech recognition method based on Recognition with Recurrent Neural Network language model and deep neural network acoustic model |
CN109976702A (en) * | 2019-03-20 | 2019-07-05 | 青岛海信电器股份有限公司 | A kind of audio recognition method, device and terminal |
Non-Patent Citations (2)
Title |
---|
PALASKAR,SHRUTI等: "END-TO-END MULTIMODAL SPEECH RECOGNITION", 《2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 * |
张志楠: "语音Corpus的自动构建和语音最小化标注的研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111145727A (en) * | 2019-12-02 | 2020-05-12 | 云知声智能科技股份有限公司 | Method and device for recognizing digital string by voice |
CN113111642A (en) * | 2020-01-13 | 2021-07-13 | 京东方科技集团股份有限公司 | Natural language identification model generation method, natural language processing method and equipment |
CN111951788A (en) * | 2020-08-10 | 2020-11-17 | 百度在线网络技术(北京)有限公司 | Language model optimization method and device, electronic equipment and storage medium |
CN111933146A (en) * | 2020-10-13 | 2020-11-13 | 苏州思必驰信息科技有限公司 | Speech recognition system and method |
CN113223522A (en) * | 2021-04-26 | 2021-08-06 | 北京百度网讯科技有限公司 | Speech recognition method, apparatus, device and storage medium |
CN113223522B (en) * | 2021-04-26 | 2022-05-03 | 北京百度网讯科技有限公司 | Speech recognition method, apparatus, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110427459B (en) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110427459A (en) | Visualized generation method, system and the platform of speech recognition network | |
US11030412B2 (en) | System and method for chatbot conversation construction and management | |
CN111177569B (en) | Recommendation processing method, device and equipment based on artificial intelligence | |
CN106570106A (en) | Method and device for converting voice information into expression in input process | |
CN107077841A (en) | Superstructure Recognition with Recurrent Neural Network for Text To Speech | |
US20050080628A1 (en) | System, method, and programming language for developing and running dialogs between a user and a virtual agent | |
CN110222827A (en) | The training method of text based depression judgement network model | |
CN109710137A (en) | Technical ability priority configuration method and system for voice dialogue platform | |
CA2365743A1 (en) | Apparatus for design and simulation of dialogue | |
CN109948151A (en) | The method for constructing voice assistant | |
CN108959436A (en) | Dictionary edit methods and system for voice dialogue platform | |
CN109697979A (en) | Voice assistant technical ability adding method, device, storage medium and server | |
CN109313668B (en) | System and method for constructing session understanding system | |
CN110136689A (en) | Song synthetic method, device and storage medium based on transfer learning | |
CN109119067A (en) | Phoneme synthesizing method and device | |
CN111081280A (en) | Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method | |
CN112000330B (en) | Configuration method, device, equipment and computer storage medium of modeling parameters | |
CN110349569A (en) | The training and recognition methods of customized product language model and device | |
CN111145745A (en) | Conversation process customizing method and device | |
CN109032731A (en) | A kind of voice interface method and system based on semantic understanding of oriented manipulation system | |
CN110032355A (en) | Speech playing method, device, terminal device and computer storage medium | |
CN108170676A (en) | Method, system and the terminal of story creation | |
CN108831444A (en) | Semantic resources training method and system for voice dialogue platform | |
CN106844499A (en) | Many wheel session interaction method and devices | |
CN109657125A (en) | Data processing method, device, equipment and storage medium based on web crawlers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |