CN112466283A

CN112466283A - Collaborative software voice recognition system

Info

Publication number: CN112466283A
Application number: CN202011190095.XA
Authority: CN
Inventors: 温正棋; 李博; 刘进涛; 任斌; 李振龙; 周仔恒; 郑夺
Original assignee: Beijing Simulation Center
Current assignee: Beijing Simulation Center
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-03-09
Anticipated expiration: 2040-10-30
Also published as: CN112466283B

Abstract

The invention provides a collaborative software speech recognition system, which comprises: the system comprises a client, an automatic voice recognition platform and a server; the client side is provided with cooperative software, the cooperative software receives a request and classifies the request as follows: processing the recording, transcoding and parameter extraction by the client; the automatic voice recognition platform processes the voice recognition request and distributes a server for the voice recognition request according to a dynamic routing self-adaptive algorithm and calls the server; and the server receives the voice recognition request and starts a recognition service.

Description

Collaborative software voice recognition system

Technical Field

The invention belongs to the technical field of electronic information, and particularly relates to a collaborative software voice recognition system.

Background

Speech recognition technology, also called Automatic Speech Recognition (ASR), is a technology that converts the vocabulary content in human Speech into computer-readable input, which in some scenarios can be understood as Speech converted into text. At present, a speech recognition technology is a relatively general technology in the field of artificial intelligence, and is generally used in an application scene as an auxiliary tool. For example, in a call center, a speech-to-text scenario, and a scenario in which real-time recognition of the entire speech process is required in simultaneous interpretation, are used, and the speech recognition requirements for frequent calls with high concurrence are not very high. Therefore, most of the existing voice recognition service architectures are single architecture modes, and one recognition service completes all the work of storage, transcoding and recognition.

With the development of voice recognition technology, the recognition accuracy and the recognition speed are continuously improved, the voice technology starts to be comprehensively applied in various industries, and voice recognition solutions under high concurrency scenes are urgently needed at present. In the daily communication of users in the cooperative software, the voice message is a message with very high use frequency, huge quantity and strong timeliness. And many exchanges are generated in the group, the number of participating users is very large, and the message repetition amount is large. In addition, the foreign language voice input method is also a rigid requirement in cooperative software, and requires that voice can be timely converted into characters to be provided for a message sender to correct.

Therefore, a method for solving the problem of cooperative software voice recognition and construction is needed, which organically combines the cooperative software use scene and the voice recognition use scene and solves the high concurrent voice recognition requirement under the scene.

Disclosure of Invention

In order to solve at least one of the above technical problems, the present invention provides a collaborative software speech recognition system, which has the following technical scheme:

a collaborative software speech recognition system, the system comprising: the system comprises a client, an automatic voice recognition platform and a server;

the client side is provided with cooperative software, the cooperative software receives a request and classifies the request as follows: processing the recording, transcoding and parameter extraction by the client;

the automatic voice recognition platform processes the voice recognition request and distributes a server for the voice recognition request according to a dynamic routing self-adaptive algorithm and calls the server;

and the server receives the voice recognition request and starts a recognition service.

The automatic voice recognition platform is based on a micro-service architecture and comprises a micro-service gateway, a micro-service registration center and audio related services;

wherein the content of the first and second substances,

the micro service gateway receives the voice recognition request;

the micro service registration center analyzes the voice recognition request and distributes a server for the voice recognition request based on the dynamic routing self-adaptive algorithm;

and the micro service gateway pulls the server to call the server.

The micro-service registration center receives the load parameters of the servers, calculates the load value of each server, sorts the servers according to the load values, and selects the optimal server to process the voice recognition request.

And the cooperative software receives the voice recognition service, recognizes the audio file by using the acoustic parameters, extracts the MFCC parameters of the audio file to generate an audio parameter file, and sends the audio parameter file to the file service of the automatic voice recognition platform for storage.

The server receives the voice recognition request and initiates a recognition service, including,

the voice recognition request comprises a resource ID;

the server searches a corresponding identification result in a result cache server according to the resource ID, and if the identification is successful, the identification result is returned;

and when the identification fails, the server searches a corresponding audio parameter file in a file cache server according to the resource ID and distributes the audio parameter file to a corresponding automatic voice identification service queue according to the audio time of the audio file.

The audio-related services include file services and voice file recognition services;

the file service includes: file storage service and result caching service;

the voice file recognition service includes:

short audio file recognition service, medium audio file recognition service, long audio file recognition service, and real-time voice recognition service.

The identifying a service queue comprises: short audio file automatic speech recognition service, medium audio file automatic speech recognition service, and real-time automatic speech recognition service.

The medium audio file identification service fragments the audio parameter file by using a voice read point detection technology, the audio parameter file is processed by the short audio file identification service after the fragment, and the processing interface combines a plurality of identification results, returns the combined identification results to the cooperative software and stores the identification results and the corresponding resource ID into a buffer.

The server distributes the audio files with the time length exceeding 120 seconds and the audio files needing real-time identification to the real-time automatic voice identification service queue; distributing the audio files with the time length of more than 10 seconds and less than 120 seconds to an automatic voice recognition service queue of the medium audio files; and distributing the audio files with the duration less than 10 seconds to the short audio file automatic speech recognition service queue.

The invention has the following beneficial effects:

the invention provides a collaborative software voice recognition system. The system is designed by adopting a micro-service architecture, and different identification micro-services are designed according to the message length of a request, so that the real-time performance and the high reliability of the identification system under the condition of high concurrency are ensured.

Drawings

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

FIG. 1 is a general flow chart of a collaborative software speech recognition system according to an embodiment of the present invention;

FIG. 2 is a flow chart of dynamic routing adaptation for a collaborative software speech recognition system according to an embodiment of the present invention;

fig. 3 is a logic diagram of service partitioning for a cooperative software speech recognition system according to an embodiment of the present invention.

Detailed Description

The following detailed description of the cooperative software speech recognition system according to the present invention is provided in conjunction with the accompanying drawings and embodiments, which are provided for the purpose of illustration and are not intended to limit the scope of the invention.

A collaborative software speech recognition system, as shown in FIG. 1, includes a client and a speech recognition service platform

The voice recognition micro service platform comprises a micro service gateway, a micro service registration center and audio related services, wherein the audio related services comprise a file storage service, a result caching service, a short voice file recognition service, a medium voice file recognition service, a long voice file recognition service and a real-time voice recognition service, and the result caching service and the file storage service respectively correspond to different buffers.

In a preferred embodiment, the voice recognition micro-service platform is based on a spring cloud micro-service architecture, the micro-service gateway is implemented based on Zuul, and the micro-service registry is implemented based on Eurake.

In the system, a client divides voice service, wherein recording, transcoding and parameter extraction are realized by the client, voice recognition service is realized by using a self-service voice service platform, the client extracts MFCC parameters of an audio file of a user, and the process of extracting acoustic parameters is carried out at the client, so that the computing capacity of the client can be effectively utilized, the development and project cost of the server is reduced, the volume of files needing to be cached is reduced, and the cost of storing original voice is reduced.

The method comprises the steps that a client receives a voice recognition request and transmits the voice recognition request to an automatic voice recognition platform based on a micro-service architecture, a micro-service gateway distributes the voice recognition request to a micro-service registration center, the micro-service registration center recognizes the voice recognition request and distributes an optimal processor for the voice recognition request according to a dynamic routing self-adaption algorithm, as shown in fig. 2, a server reports a task state of a node every 2 seconds, and in one embodiment, the maximum service connection number C1 of the server A, the service connection number C2 which is currently processed, and the CPU utilization rate C3, the memory utilization rate C4 and the disk utilization rate C5 of the computing node serve as parameters for evaluating server load. And the micro service gateway calculates the Load value of each server according to the Load parameters and the weight of each parameter, wherein the server A is preset with a total connection weight of W1, a current connection weight of W2, a CPU weight of W3, a memory weight of W4 and a disk weight of W5, so that the Load value of the server is W1C 1-W2C 2-W3C 3-W4C 4-W5C 5, the micro service gateway sorts the Load values of the plurality of servers according to the size, divides the servers into intervals according to the sorting result, and selects one server in the optimal interval as a next calculation node each time. After the micro-service registration center selects the server, the micro-service gateway pulls the server to the micro-service registration center for calling.

In a preferred embodiment, the client distinguishes the real-time identification service and the audio file identification service according to the interaction behavior, and marks the voice identification service types corresponding to the audio file identification service according to the duration of the audio file, wherein the voice identification service types comprise a short audio file, a medium audio file and a long audio file.

In an optional embodiment, the micro service registry distinguishes real-time identification services and audio file identification services according to the interaction behavior, and marks the audio file identification services with corresponding voice identification service categories according to the duration of the audio files, wherein the voice identification service categories comprise short audio files, medium audio files and long audio files.

In an optional embodiment, the server distinguishes real-time identification services and audio file identification services according to the interaction behavior, and marks the voice identification service classes corresponding to the audio file identification services according to the duration of the audio file, wherein the voice identification service classes comprise short audio files, medium audio files and long audio files.

The three files respectively correspond to three duration intervals: the method comprises the steps that 0-10S, 10-120S and more than 120S are carried out, the three audio files correspond to a short audio file identification service, a medium audio file identification service and a long audio file identification service respectively, after a server receives a voice identification service request, as shown in figure 3, a result storage service is called firstly, an identification result is searched in the result storage server according to a resource ID transmitted by a client, the identification result is directly returned to the client after being hit, if the identification result does not exist, audio processing is carried out according to different audio file types, and an audio parameter file of the audio file is called from a file cache service according to the resource ID during processing. And for the short audio files smaller than 10s, the client or the server calls a short audio file identification service interface of the HTTP, and the short audio file identification service interface analyzes the corresponding audio parameter files. And for the medium audio file identification service, calling a medium audio identification service interface of HTTP, wherein the medium audio identification service segments the audio parameter file by using audio breakpoint detection (VAD), and numbers the segmented files, wherein each segment is independent, and the results do not influence each other. And putting the numbered fragments into an internal queue of the medium audio service, and identifying the voice of the fragments in the queue by using a thread pool. And sorting the fragmentation identification results according to the numbers, and combining all the fragmentation results according to the sequence and returning the combined fragmentation results to the client or the server. For long audio files exceeding 120s, the server calls the websocket service of the real-time identification service for processing. The server stores the processed identification result and the corresponding resource ID into the result storage server, so that the comparison of the identification result of the next time is facilitated.

Claims

1. A collaborative software speech recognition system, the system comprising: the system comprises a client, an automatic voice recognition platform and a server;

2. The system of claim 1,

wherein the content of the first and second substances,

the micro service gateway receives the voice recognition request;

and the micro service gateway pulls the server to call the server.

3. The system of claim 2,

4. The system of claim 1,

5. The system of claim 1,

the voice recognition request comprises a resource ID;

6. The system of claim 2,

the file service includes: file storage service and result caching service;

the voice file recognition service includes:

7. The system of claim 5,

8. The system of claim 6,

9. The system of claim 1,