CN112466283B - Cooperative software voice recognition system - Google Patents

Cooperative software voice recognition system Download PDF

Info

Publication number
CN112466283B
CN112466283B CN202011190095.XA CN202011190095A CN112466283B CN 112466283 B CN112466283 B CN 112466283B CN 202011190095 A CN202011190095 A CN 202011190095A CN 112466283 B CN112466283 B CN 112466283B
Authority
CN
China
Prior art keywords
service
voice recognition
file
server
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011190095.XA
Other languages
Chinese (zh)
Other versions
CN112466283A (en
Inventor
温正棋
李博
刘进涛
任斌
李振龙
周仔恒
郑夺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simulation Center
Original Assignee
Beijing Simulation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simulation Center filed Critical Beijing Simulation Center
Priority to CN202011190095.XA priority Critical patent/CN112466283B/en
Publication of CN112466283A publication Critical patent/CN112466283A/en
Application granted granted Critical
Publication of CN112466283B publication Critical patent/CN112466283B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/146Markers for unambiguous identification of a particular session, e.g. session cookie or URL-encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/63Routing a service request depending on the request content or context

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a collaborative software speech recognition system, which comprises: the system comprises a client, an automatic voice recognition platform and a server; the client is provided with cooperative software, and the cooperative software receives the request and classifies the request: the client processes the recording, transcoding and extracting the parameters; the automatic voice recognition platform processes the voice recognition request and distributes a server for the voice recognition request according to a dynamic routing self-adaptive algorithm and invokes the server; the server receives the voice recognition request and starts a recognition service.

Description

Cooperative software voice recognition system
Technical Field
The invention belongs to the technical field of electronic information, and particularly relates to a collaborative software voice recognition system.
Background
Speech recognition technology, also known as automatic speech recognition (ASR, automatic Speech Recongnition), is the conversion of lexical content in human speech into computer-readable inputs, which in part of the scenario can be understood as speech into text. At present, the voice recognition technology is a relatively general technology in the field of artificial intelligence and is generally used as an auxiliary tool in an application scene. For example, using speech-to-text scenes at a call center, using scenes that require real-time recognition of the entire speaking process in contemporaneous interpretation, where speech recognition is not very demanding for high concurrence frequent call demands. Therefore, most of the current voice recognition service architectures are single architecture, and one recognition service is used for completing all the works of storage, transcoding and recognition.
With the development of voice recognition technology, recognition accuracy and recognition speed are continuously improved, the voice technology is fully applied to various industries, and currently, a few voice recognition solutions in a high concurrency scene are urgently needed. In the daily communication of users in collaborative software, voice messages are messages which are very high in use frequency, huge in quantity and very high in timeliness. And in many communications are generated within a group, the number of participating users is very large and the message repetition is large. In addition, the voice input method is also a rigid requirement in collaborative software, and the voice can be required to be converted into characters in time and provided for a message sender to be corrected.
Therefore, a method for solving the problem of voice recognition and construction of collaborative software is needed, and the use scene of collaborative software and the use scene of voice recognition are organically combined to solve the requirement of high concurrency voice recognition in the scene.
Disclosure of Invention
In order to solve at least one of the above technical problems, the present invention provides a collaborative software speech recognition system, which has the following technical scheme:
a collaborative software speech recognition system, the system comprising: the system comprises a client, an automatic voice recognition platform and a server;
the client is provided with cooperative software, and the cooperative software receives the request and classifies the request: the client processes the recording, transcoding and extracting the parameters;
the automatic voice recognition platform processes the voice recognition request and distributes a server for the voice recognition request according to a dynamic routing self-adaptive algorithm and invokes the server;
the server receives the voice recognition request and starts a recognition service.
The automatic voice recognition platform is based on a micro-service architecture and comprises a micro-service gateway, a micro-service registry and audio related services;
wherein,
the micro service gateway receives the voice recognition request;
the micro-service registry analyzes the voice recognition request and allocates a server for the voice recognition request based on the dynamic routing adaptation algorithm;
and the micro service gateway pulls the server to call the server.
And the micro-service registration center receives the load parameters of the servers and calculates the load value of each server, sorts the servers according to the load values, and selects the optimal server to process the voice recognition request.
The collaborative software receives the voice recognition service, recognizes the audio file by utilizing acoustic parameters, extracts MFCC parameters of the audio file to generate an audio parameter file, and sends the audio parameter file to a file service of the automatic voice recognition platform for storage.
The server receives the voice recognition request and initiates a recognition service, including,
the voice recognition request comprises a resource ID;
the server searches a corresponding identification result in a result cache server according to the resource ID, and returns the identification result if the identification is successful;
and if the identification fails, the server searches the corresponding audio parameter file in the file cache server according to the resource ID and distributes the corresponding audio parameter file to the corresponding automatic voice recognition service queue according to the audio time of the audio file.
The audio related services include a file service and a voice file recognition service;
the file service includes: file storage service and result cache service;
the voice file recognition service includes:
short audio file recognition services, medium audio file recognition services, long audio file recognition services, and real-time speech recognition services.
The identifying a service queue includes: short audio file automatic speech recognition services, medium audio file automatic speech recognition services, and real-time automatic speech recognition services.
The medium audio file identification service segments the audio parameter file by utilizing a voice reading point detection technology, the segments are processed by the short audio file identification service, a processing interface merges a plurality of identification results and returns the merged identification results to the collaborative software, and the identification results and the corresponding resource IDs are stored in a buffer.
The server distributes the audio files with the duration exceeding 120 seconds and the audio files needing to be recognized in real time to the real-time automatic voice recognition service queue; distributing the audio files with the duration longer than 10 seconds and shorter than 120 seconds to a medium audio file automatic voice recognition service queue; and distributing the audio files with the duration less than 10 seconds to the short audio file automatic voice recognition service queue.
The beneficial effects of the invention are as follows:
the invention provides a collaborative software voice recognition system. The system is designed by adopting a micro-service architecture as a whole, and different identification micro-services are designed according to the message length of the request, so that the real-time performance and the high reliability of the identification system under high concurrency are ensured.
Drawings
The following describes the embodiments of the present invention in further detail with reference to the drawings.
FIG. 1 is an overall flow chart of a collaborative software speech recognition system in accordance with an embodiment of the present invention;
FIG. 2 is a dynamic route adaptation flow chart of a collaborative software speech recognition system according to an embodiment of the present invention;
FIG. 3 is a logic diagram of service division of a collaborative software speech recognition system according to an embodiment of the present invention.
Detailed Description
The following describes a collaborative software speech recognition system according to the present invention in detail with reference to the drawings and examples, which are given for illustration only and are not intended to limit the scope of the invention.
A collaborative software speech recognition system, as shown in figure 1, comprises a client and a speech recognition service platform
The voice recognition micro service platform comprises a micro service gateway, a micro service registration center and audio related services, wherein the audio related services comprise file storage services, result caching services, phrase voice file recognition services, medium voice file recognition services, long voice file recognition services and real-time voice recognition services, and the result caching services and the file storage services respectively correspond to different caches.
In a preferred embodiment, the voice recognition micro service platform is based on a SpringCloud micro service architecture, the micro service gateway is implemented on the basis of Zuul, and the micro service registry is implemented on the basis of Euroke.
In the system, the client divides voice service, wherein recording, transcoding and parameter extraction are realized by the client, voice recognition service is realized by using a self-service voice service platform, the client extracts MFCC parameters of the audio file of a user, and the process of extracting acoustic parameters is put at the client to effectively utilize the computing capacity of the client, so that the development and project cost of the server are reduced, the volume of files to be cached is reduced, and the cost of storing original voice is reduced.
The client receives the voice recognition request, transmits the voice recognition request to an automatic voice recognition platform based on a micro-service architecture, the micro-service gateway distributes the voice recognition request to a micro-service registry, the micro-service registry recognizes the voice recognition request, distributes an optimal processor for the voice recognition request according to a dynamic route self-adaptive algorithm, and reports the task state of the node once every 2 seconds as shown in fig. 2, wherein in one embodiment, the maximum service connection number C1 of the server A, the service connection number C2 which is currently processed, and the CPU utilization rate C3, the memory utilization rate C4 and the disk utilization rate C5 of the computing node are used as parameters for evaluating the load of the server. The micro service gateway calculates the Load value of each server according to the Load parameters and the weights of the parameters, the server A presets the total connection weight as W1, the current connection weight as W2, the CPU weight as W3, the memory weight as W4 and the disk weight as W5, and the Load value load=W1-W2-C2-W3-W4-C4-W5 of the server is selected as the next calculation node in the optimal interval each time, and the micro service gateway sorts the Load values of a plurality of servers according to the size, and divides the servers according to the sorting result. After the micro service registry selects the server, the micro service gateway calls the micro service registry pull server.
In a preferred embodiment, the client distinguishes real-time recognition service and audio file recognition service according to the interaction behavior, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, wherein the voice recognition service class comprises a short audio file, a medium audio file and a long audio file.
In an alternative embodiment, the micro service registry distinguishes real-time recognition service and audio file recognition service according to the interaction behavior, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, including a short audio file, a medium audio file and a long audio file.
In an alternative embodiment, the server distinguishes real-time recognition service and audio file recognition service according to the interaction behavior, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, wherein the voice recognition service class comprises a short audio file, a medium audio file and a long audio file.
The three files respectively correspond to three time intervals: 0-10S, 10-120S and more than 120S, wherein the three audio files respectively correspond to a short audio file identification service, a medium audio file identification service and a long audio file identification service, after receiving a voice identification service request, the server firstly calls a result storage service, searches an identification result in the result storage server according to a resource ID transmitted by a client, directly returns the identification result to the client after hitting the identification result, and performs audio processing on different audio file types if the identification result is not found, and firstly calls an audio parameter file of the audio file from a file cache service according to the resource ID during processing. And for short audio files smaller than 10s, the client or the server calls a short audio file identification service interface of HTTP, and the short audio file identification service interface analyzes the corresponding audio parameter file. And for the medium audio file identification service, calling a medium audio identification service interface of HTTP, wherein the medium audio identification service segments the audio parameter file by using an audio breakpoint detection technology (VAD), and numbers the segmented file, and the segments are mutually independent and have no influence on each other. And placing the numbered fragments into an internal queue of the medium audio service, and identifying the fragments in the queue by utilizing a thread pool. And sequencing the fragment identification results according to the numbers, merging all the fragment results according to the sequence, and returning the merged fragment results to the client or the server. For long audio files exceeding 120s, the server calls websocket service of the real-time identification service for processing. The server stores the processed identification result and the corresponding resource ID into a result storage server, so that the comparison of the next identification result is convenient.

Claims (6)

1. A collaborative software speech recognition system, the system comprising: the system comprises a client, an automatic voice recognition platform and a server;
the client is provided with cooperative software, and the cooperative software receives a voice recognition request, and records, transcodes and extracts parameters;
at least one of the client, the automatic voice recognition platform and the server distinguishes real-time recognition service and audio file recognition service according to interaction behaviors, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, wherein the voice recognition service class comprises a short audio file, a medium audio file and a long audio file;
the automatic voice recognition platform processes the voice recognition request and distributes a server for the voice recognition request according to a dynamic routing self-adaptive algorithm and invokes the server;
the server receives the voice recognition request and initiates a recognition service, including,
the voice recognition request comprises a resource ID;
the server searches a corresponding identification result in a result cache server according to the resource ID, and returns the identification result if the identification is successful;
the server searches a corresponding audio parameter file in a file cache server according to the resource ID and distributes the corresponding audio parameter file to a corresponding automatic voice recognition service queue according to the audio time of the audio file;
the automatic voice recognition platform is based on a micro-service architecture and comprises a micro-service gateway, a micro-service registry and audio related services;
wherein,
the micro service gateway receives the voice recognition request;
the micro-service registry analyzes the voice recognition request and allocates a server for the voice recognition request based on the dynamic routing adaptation algorithm;
the micro service gateway pulls the server and calls the server;
the audio related services include a file service and a voice file recognition service;
the file service includes: file storage service and result cache service;
the voice file recognition service includes:
short audio file recognition services, medium audio file recognition services, long audio file recognition services, and real-time speech recognition services.
2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
and the micro-service registration center receives the load parameters of the servers and calculates the load value of each server, sorts the servers according to the load values, and selects the optimal server to process the voice recognition request.
3. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
the collaborative software receives the voice recognition service, recognizes the audio file by utilizing acoustic parameters, extracts MFCC parameters of the audio file to generate an audio parameter file, and sends the audio parameter file to a file service of the automatic voice recognition platform for storage.
4. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
the identifying a service queue includes: short audio file automatic speech recognition services, medium audio file automatic speech recognition services, and real-time automatic speech recognition services.
5. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
the medium audio file identification service segments the audio parameter file by utilizing a voice reading point detection technology, the segments are processed by the short audio file identification service, a processing interface merges a plurality of identification results and returns the merged identification results to the collaborative software, and the identification results and the corresponding resource IDs are stored in a buffer.
6. The system of claim 4, wherein the system further comprises a controller configured to control the controller,
the server distributes the audio files with the duration exceeding 120 seconds and the audio files needing to be recognized in real time to the real-time automatic voice recognition service queue; distributing the audio files with the duration longer than 10 seconds and shorter than 120 seconds to a medium audio file automatic voice recognition service queue; and distributing the audio files with the duration less than 10 seconds to the short audio file automatic voice recognition service queue.
CN202011190095.XA 2020-10-30 2020-10-30 Cooperative software voice recognition system Active CN112466283B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011190095.XA CN112466283B (en) 2020-10-30 2020-10-30 Cooperative software voice recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011190095.XA CN112466283B (en) 2020-10-30 2020-10-30 Cooperative software voice recognition system

Publications (2)

Publication Number Publication Date
CN112466283A CN112466283A (en) 2021-03-09
CN112466283B true CN112466283B (en) 2023-12-01

Family

ID=74834757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011190095.XA Active CN112466283B (en) 2020-10-30 2020-10-30 Cooperative software voice recognition system

Country Status (1)

Country Link
CN (1) CN112466283B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546542A (en) * 2010-12-20 2012-07-04 福建星网视易信息***有限公司 Electronic system and embedded device and transit device of electronic system
CN102571833A (en) * 2010-12-15 2012-07-11 盛乐信息技术(上海)有限公司 Distributed speech recognition system and distributed speech recognition method based on server cluster
CN103325371A (en) * 2013-06-05 2013-09-25 杭州网豆数字技术有限公司 Voice recognition system and method based on cloud
CN104795069A (en) * 2014-01-21 2015-07-22 腾讯科技(深圳)有限公司 Speech recognition method and server
CN106356066A (en) * 2016-08-30 2017-01-25 孟玲 Speech recognition system based on cloud computing
CN109462647A (en) * 2018-11-12 2019-03-12 平安科技(深圳)有限公司 Resource allocation methods, device and computer equipment based on data analysis
CN110309350A (en) * 2018-03-21 2019-10-08 腾讯科技(深圳)有限公司 Recite processing method, system, device, medium and the electronic equipment of task
CN111526208A (en) * 2020-05-06 2020-08-11 重庆邮电大学 High-concurrency cloud platform file transmission optimization method based on micro-service
CN111785277A (en) * 2020-06-29 2020-10-16 北京捷通华声科技股份有限公司 Speech recognition method, speech recognition device, computer-readable storage medium and processor

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700836B (en) * 2013-12-10 2019-01-29 阿里巴巴集团控股有限公司 A kind of audio recognition method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571833A (en) * 2010-12-15 2012-07-11 盛乐信息技术(上海)有限公司 Distributed speech recognition system and distributed speech recognition method based on server cluster
CN102546542A (en) * 2010-12-20 2012-07-04 福建星网视易信息***有限公司 Electronic system and embedded device and transit device of electronic system
CN103325371A (en) * 2013-06-05 2013-09-25 杭州网豆数字技术有限公司 Voice recognition system and method based on cloud
CN104795069A (en) * 2014-01-21 2015-07-22 腾讯科技(深圳)有限公司 Speech recognition method and server
CN106356066A (en) * 2016-08-30 2017-01-25 孟玲 Speech recognition system based on cloud computing
CN110309350A (en) * 2018-03-21 2019-10-08 腾讯科技(深圳)有限公司 Recite processing method, system, device, medium and the electronic equipment of task
CN109462647A (en) * 2018-11-12 2019-03-12 平安科技(深圳)有限公司 Resource allocation methods, device and computer equipment based on data analysis
CN111526208A (en) * 2020-05-06 2020-08-11 重庆邮电大学 High-concurrency cloud platform file transmission optimization method based on micro-service
CN111785277A (en) * 2020-06-29 2020-10-16 北京捷通华声科技股份有限公司 Speech recognition method, speech recognition device, computer-readable storage medium and processor

Also Published As

Publication number Publication date
CN112466283A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
KR102535338B1 (en) Speaker diarization using speaker embedding(s) and trained generative model
CN110557451B (en) Dialogue interaction processing method and device, electronic equipment and storage medium
US9088652B2 (en) System and method for speech-enabled call routing
CN111916082B (en) Voice interaction method, device, computer equipment and storage medium
US10242669B1 (en) Enhanced transcription of audio data with punctuation markings based on silence durations
US9196250B2 (en) Application services interface to ASR
CN111341315B (en) Voice control method, device, computer equipment and storage medium
US20180012600A1 (en) Call management system and its speech recognition control method
TW201843674A (en) System and method for real-time transcription of an audio signal into texts
US20150179165A1 (en) System and method for caller intent labeling of the call-center conversations
US20080147403A1 (en) Multiple sound fragments processing and load balancing
JPH08195763A (en) Voice communications channel of network
CN110782341A (en) Business collection method, device, equipment and medium
CN110942764B (en) Stream type voice recognition method
CN113393842A (en) Voice data processing method, device, equipment and medium
CN112466283B (en) Cooperative software voice recognition system
CN111402906B (en) Speech decoding method, device, engine and storage medium
CN110502631B (en) Input information response method and device, computer equipment and storage medium
CN113132214B (en) Dialogue method, dialogue device, dialogue server and dialogue storage medium
EP3232436A2 (en) Application services interface to asr
CN110740212A (en) Call answering method and device based on intelligent voice technology and electronic equipment
CN113936655A (en) Voice broadcast processing method and device, computer equipment and storage medium
CN112714058A (en) Method, system and electronic equipment for instantly interrupting AI voice
CN110798566A (en) Call information recording method and device and related equipment
CN112908364B (en) Telephone number state judging method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant