CN112466283B

CN112466283B - Cooperative software voice recognition system

Info

Publication number: CN112466283B
Application number: CN202011190095.XA
Authority: CN
Inventors: 温正棋; 李博; 刘进涛; 任斌; 李振龙; 周仔恒; 郑夺
Original assignee: Beijing Simulation Center
Current assignee: Beijing Simulation Center
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2023-12-01
Anticipated expiration: 2040-10-30
Also published as: CN112466283A

Abstract

The invention provides a collaborative software speech recognition system, which comprises: the system comprises a client, an automatic voice recognition platform and a server; the client is provided with cooperative software, and the cooperative software receives the request and classifies the request: the client processes the recording, transcoding and extracting the parameters; the automatic voice recognition platform processes the voice recognition request and distributes a server for the voice recognition request according to a dynamic routing self-adaptive algorithm and invokes the server; the server receives the voice recognition request and starts a recognition service.

Description

Cooperative software voice recognition system

Technical Field

The invention belongs to the technical field of electronic information, and particularly relates to a collaborative software voice recognition system.

Background

Speech recognition technology, also known as automatic speech recognition (ASR, automatic Speech Recongnition), is the conversion of lexical content in human speech into computer-readable inputs, which in part of the scenario can be understood as speech into text. At present, the voice recognition technology is a relatively general technology in the field of artificial intelligence and is generally used as an auxiliary tool in an application scene. For example, using speech-to-text scenes at a call center, using scenes that require real-time recognition of the entire speaking process in contemporaneous interpretation, where speech recognition is not very demanding for high concurrence frequent call demands. Therefore, most of the current voice recognition service architectures are single architecture, and one recognition service is used for completing all the works of storage, transcoding and recognition.

With the development of voice recognition technology, recognition accuracy and recognition speed are continuously improved, the voice technology is fully applied to various industries, and currently, a few voice recognition solutions in a high concurrency scene are urgently needed. In the daily communication of users in collaborative software, voice messages are messages which are very high in use frequency, huge in quantity and very high in timeliness. And in many communications are generated within a group, the number of participating users is very large and the message repetition is large. In addition, the voice input method is also a rigid requirement in collaborative software, and the voice can be required to be converted into characters in time and provided for a message sender to be corrected.

Therefore, a method for solving the problem of voice recognition and construction of collaborative software is needed, and the use scene of collaborative software and the use scene of voice recognition are organically combined to solve the requirement of high concurrency voice recognition in the scene.

Disclosure of Invention

In order to solve at least one of the above technical problems, the present invention provides a collaborative software speech recognition system, which has the following technical scheme:

a collaborative software speech recognition system, the system comprising: the system comprises a client, an automatic voice recognition platform and a server;

the client is provided with cooperative software, and the cooperative software receives the request and classifies the request: the client processes the recording, transcoding and extracting the parameters;

the automatic voice recognition platform processes the voice recognition request and distributes a server for the voice recognition request according to a dynamic routing self-adaptive algorithm and invokes the server;

the server receives the voice recognition request and starts a recognition service.

The automatic voice recognition platform is based on a micro-service architecture and comprises a micro-service gateway, a micro-service registry and audio related services;

wherein,

the micro service gateway receives the voice recognition request;

the micro-service registry analyzes the voice recognition request and allocates a server for the voice recognition request based on the dynamic routing adaptation algorithm;

and the micro service gateway pulls the server to call the server.

And the micro-service registration center receives the load parameters of the servers and calculates the load value of each server, sorts the servers according to the load values, and selects the optimal server to process the voice recognition request.

The collaborative software receives the voice recognition service, recognizes the audio file by utilizing acoustic parameters, extracts MFCC parameters of the audio file to generate an audio parameter file, and sends the audio parameter file to a file service of the automatic voice recognition platform for storage.

The server receives the voice recognition request and initiates a recognition service, including,

the voice recognition request comprises a resource ID;

the server searches a corresponding identification result in a result cache server according to the resource ID, and returns the identification result if the identification is successful;

and if the identification fails, the server searches the corresponding audio parameter file in the file cache server according to the resource ID and distributes the corresponding audio parameter file to the corresponding automatic voice recognition service queue according to the audio time of the audio file.

The audio related services include a file service and a voice file recognition service;

the file service includes: file storage service and result cache service;

the voice file recognition service includes:

short audio file recognition services, medium audio file recognition services, long audio file recognition services, and real-time speech recognition services.

The identifying a service queue includes: short audio file automatic speech recognition services, medium audio file automatic speech recognition services, and real-time automatic speech recognition services.

The medium audio file identification service segments the audio parameter file by utilizing a voice reading point detection technology, the segments are processed by the short audio file identification service, a processing interface merges a plurality of identification results and returns the merged identification results to the collaborative software, and the identification results and the corresponding resource IDs are stored in a buffer.

The server distributes the audio files with the duration exceeding 120 seconds and the audio files needing to be recognized in real time to the real-time automatic voice recognition service queue; distributing the audio files with the duration longer than 10 seconds and shorter than 120 seconds to a medium audio file automatic voice recognition service queue; and distributing the audio files with the duration less than 10 seconds to the short audio file automatic voice recognition service queue.

The beneficial effects of the invention are as follows:

the invention provides a collaborative software voice recognition system. The system is designed by adopting a micro-service architecture as a whole, and different identification micro-services are designed according to the message length of the request, so that the real-time performance and the high reliability of the identification system under high concurrency are ensured.

Drawings

The following describes the embodiments of the present invention in further detail with reference to the drawings.

FIG. 1 is an overall flow chart of a collaborative software speech recognition system in accordance with an embodiment of the present invention;

FIG. 2 is a dynamic route adaptation flow chart of a collaborative software speech recognition system according to an embodiment of the present invention;

FIG. 3 is a logic diagram of service division of a collaborative software speech recognition system according to an embodiment of the present invention.

Detailed Description

The following describes a collaborative software speech recognition system according to the present invention in detail with reference to the drawings and examples, which are given for illustration only and are not intended to limit the scope of the invention.

A collaborative software speech recognition system, as shown in figure 1, comprises a client and a speech recognition service platform

The voice recognition micro service platform comprises a micro service gateway, a micro service registration center and audio related services, wherein the audio related services comprise file storage services, result caching services, phrase voice file recognition services, medium voice file recognition services, long voice file recognition services and real-time voice recognition services, and the result caching services and the file storage services respectively correspond to different caches.

In a preferred embodiment, the voice recognition micro service platform is based on a SpringCloud micro service architecture, the micro service gateway is implemented on the basis of Zuul, and the micro service registry is implemented on the basis of Euroke.

In the system, the client divides voice service, wherein recording, transcoding and parameter extraction are realized by the client, voice recognition service is realized by using a self-service voice service platform, the client extracts MFCC parameters of the audio file of a user, and the process of extracting acoustic parameters is put at the client to effectively utilize the computing capacity of the client, so that the development and project cost of the server are reduced, the volume of files to be cached is reduced, and the cost of storing original voice is reduced.

The client receives the voice recognition request, transmits the voice recognition request to an automatic voice recognition platform based on a micro-service architecture, the micro-service gateway distributes the voice recognition request to a micro-service registry, the micro-service registry recognizes the voice recognition request, distributes an optimal processor for the voice recognition request according to a dynamic route self-adaptive algorithm, and reports the task state of the node once every 2 seconds as shown in fig. 2, wherein in one embodiment, the maximum service connection number C1 of the server A, the service connection number C2 which is currently processed, and the CPU utilization rate C3, the memory utilization rate C4 and the disk utilization rate C5 of the computing node are used as parameters for evaluating the load of the server. The micro service gateway calculates the Load value of each server according to the Load parameters and the weights of the parameters, the server A presets the total connection weight as W1, the current connection weight as W2, the CPU weight as W3, the memory weight as W4 and the disk weight as W5, and the Load value load=W1-W2-C2-W3-W4-C4-W5 of the server is selected as the next calculation node in the optimal interval each time, and the micro service gateway sorts the Load values of a plurality of servers according to the size, and divides the servers according to the sorting result. After the micro service registry selects the server, the micro service gateway calls the micro service registry pull server.

In a preferred embodiment, the client distinguishes real-time recognition service and audio file recognition service according to the interaction behavior, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, wherein the voice recognition service class comprises a short audio file, a medium audio file and a long audio file.

In an alternative embodiment, the micro service registry distinguishes real-time recognition service and audio file recognition service according to the interaction behavior, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, including a short audio file, a medium audio file and a long audio file.

In an alternative embodiment, the server distinguishes real-time recognition service and audio file recognition service according to the interaction behavior, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, wherein the voice recognition service class comprises a short audio file, a medium audio file and a long audio file.

The three files respectively correspond to three time intervals: 0-10S, 10-120S and more than 120S, wherein the three audio files respectively correspond to a short audio file identification service, a medium audio file identification service and a long audio file identification service, after receiving a voice identification service request, the server firstly calls a result storage service, searches an identification result in the result storage server according to a resource ID transmitted by a client, directly returns the identification result to the client after hitting the identification result, and performs audio processing on different audio file types if the identification result is not found, and firstly calls an audio parameter file of the audio file from a file cache service according to the resource ID during processing. And for short audio files smaller than 10s, the client or the server calls a short audio file identification service interface of HTTP, and the short audio file identification service interface analyzes the corresponding audio parameter file. And for the medium audio file identification service, calling a medium audio identification service interface of HTTP, wherein the medium audio identification service segments the audio parameter file by using an audio breakpoint detection technology (VAD), and numbers the segmented file, and the segments are mutually independent and have no influence on each other. And placing the numbered fragments into an internal queue of the medium audio service, and identifying the fragments in the queue by utilizing a thread pool. And sequencing the fragment identification results according to the numbers, merging all the fragment results according to the sequence, and returning the merged fragment results to the client or the server. For long audio files exceeding 120s, the server calls websocket service of the real-time identification service for processing. The server stores the processed identification result and the corresponding resource ID into a result storage server, so that the comparison of the next identification result is convenient.

Claims

1. A collaborative software speech recognition system, the system comprising: the system comprises a client, an automatic voice recognition platform and a server;

the client is provided with cooperative software, and the cooperative software receives a voice recognition request, and records, transcodes and extracts parameters;

at least one of the client, the automatic voice recognition platform and the server distinguishes real-time recognition service and audio file recognition service according to interaction behaviors, and marks the voice recognition service class corresponding to the audio file recognition service according to the duration of the audio file, wherein the voice recognition service class comprises a short audio file, a medium audio file and a long audio file;

the voice recognition request comprises a resource ID;

the server searches a corresponding audio parameter file in a file cache server according to the resource ID and distributes the corresponding audio parameter file to a corresponding automatic voice recognition service queue according to the audio time of the audio file;

wherein,

the micro service gateway receives the voice recognition request;

the micro service gateway pulls the server and calls the server;

the file service includes: file storage service and result cache service;

the voice file recognition service includes:

2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

3. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

4. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

5. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

6. The system of claim 4, wherein the system further comprises a controller configured to control the controller,