CN116822531A

CN116822531A - Method and device for constructing speech library, storage medium and electronic equipment

Info

Publication number: CN116822531A
Application number: CN202310472006.8A
Authority: CN
Inventors: 赵越
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-09-29

Abstract

The application relates to the technical field of digital medical treatment and computers, and particularly discloses a method and a device for constructing a speech surgery library, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring a speaking text set containing a plurality of speaking texts aiming at each doctor object; performing barrel separation processing on each voice text in the voice text set to obtain a plurality of barrel voice texts; clustering is carried out on each bucket of speech text by adopting a clustering algorithm based on density, so as to obtain a plurality of first speech clusters corresponding to each bucket; and merging the first microphone clusters to obtain a plurality of second microphone clusters so as to obtain a target microphone library. The application can improve the construction rate of the speech operation library and shorten the construction time.

Description

Method and device for constructing speech library, storage medium and electronic equipment

Technical Field

The present application relates to the field of digital medical treatment and computer technology, and in particular, to a method and apparatus for constructing a speech library, a storage medium, and an electronic device.

Background

In recent years, on-line inquiry systems are endless, the acceptance of the public is gradually increased, and the access amount to the on-line inquiry systems is greatly increased. However, most of these systems require the physician who receives the diagnosis to type in a reply. In the face of high concurrent user requests, the typing speed of old doctors cannot keep pace, and new doctors who have just received a diagnosis cannot think about the appropriate operation to reply in a short time, so that the problem of long response time during the diagnosis is caused. Thus, a speaking recommendation technique has been developed, i.e., by constructing a speaking library to implement speaking recommendation. However, the existing speech surgery library construction process is long in time consumption and low in construction efficiency, and the speech surgery library is not constructed accurately enough, so that the problem of inaccurate follow-up speech surgery recommendation is caused.

Therefore, a method for constructing a voice library is needed to solve the problems that in the prior art, the voice library construction process is long in time consumption and low in construction efficiency.

Disclosure of Invention

In view of the above, the application provides a method, a device, a storage medium and an electronic device for constructing a speech library, which mainly aims to solve the problems of long time consumption and large storage space occupation in the speech library construction process at present.

In order to solve the above problems, the present application provides a method for constructing a speech library, comprising:

acquiring a speaking text set containing a plurality of speaking texts aiming at each doctor object;

performing barrel separation processing on each voice text in the voice text set to obtain a plurality of barrel voice texts;

clustering is carried out on each bucket of speech text by adopting a clustering algorithm based on density, so as to obtain a plurality of first speech clusters corresponding to each bucket;

and merging the first microphone clusters to obtain a plurality of second microphone clusters so as to obtain a target microphone library.

Optionally, the collecting and obtaining a voice text set including a plurality of voice texts for each doctor object includes:

acquiring on-line inquiry records of all doctor objects to obtain the speaking text set;

and/or collecting recording inquiry records of each doctor, and extracting language text from the voice inquiry records to obtain the speaking text set.

Optionally, before performing the barreling processing based on each of the speaking text sets respectively to obtain a plurality of barreled speaking texts, the method further includes:

screening each of the speaking texts in the speaking text set to obtain a speaking text in a non-question form;

and sequentially carrying out de-duplication processing and stop word removal processing on each screened speaking text to obtain preprocessed speaking text for barrel separation processing.

Optionally, the performing a bucket-splitting process on each of the phone text in the phone text set to obtain a plurality of bucket phone texts includes:

based on each voice text, respectively adopting a SimHash algorithm to calculate a hash value corresponding to each voice text;

calculating and obtaining the similarity between any two voice texts based on the hash values;

and comparing each similarity with a preset similarity value to divide the speaking text corresponding to the similarity value higher than the preset similarity value into the same barrel.

Optionally, clustering processing is performed on each of the microphone text to obtain a plurality of first microphone clusters corresponding to each of the microphone text, including:

based on each of the conversation texts, processing to obtain text vectors corresponding to each of the conversation texts;

and clustering the text vectors of the voice texts in the same barrel by adopting a density-based clustering algorithm to obtain a plurality of first voice clusters corresponding to the barrels.

Optionally, the merging operation for each first microphone cluster includes:

taking the first microphone cluster in any barrel as a reference microphone cluster, and taking the first microphone clusters except the rest barrels as non-reference microphone clusters;

calculating cosine distances between each non-reference microphone cluster and each reference microphone cluster;

and comparing each cosine distance with a preset distance threshold value, and adding the non-reference microphone cluster for calculating the cosine distance into the non-reference microphone cluster for calculating the cosine distance under the condition that the cosine distance is determined to be smaller than the preset distance.

Optionally, after obtaining the number of second microphone clusters, the method further comprises:

and configuring a telephone label for each second telephone cluster based on the questioning subjects and the questioning keywords of each second telephone cluster.

Optionally, the speaking library construction method further includes:

a number of doctor identifications are configured for each of the conversation texts within each of the second conversation clusters based on how frequently each of the doctors uses each of the conversation texts.

In order to solve the above problems, the present application provides a speech library construction device, comprising:

the acquisition module is used for acquiring a voice operation text set containing a plurality of voice operation texts aiming at each doctor object;

the barrel dividing module is used for carrying out barrel dividing processing on each voice operation text in the voice operation text set to obtain a plurality of barrel voice operation texts;

the clustering module is used for clustering each bucket of the conversation text by adopting a clustering algorithm based on density to obtain a plurality of first conversation clusters corresponding to each bucket;

and the merging module is used for merging the first microphone clusters to obtain a plurality of second microphone clusters so as to obtain a target microphone library.

In order to solve the above-mentioned problems, the present application provides a storage medium storing a computer program which, when executed by a processor, implements the steps of the speech library construction method of any one of the above-mentioned problems.

In order to solve the above problems, the present application provides an electronic device, at least including a memory, and a processor, where the memory stores a computer program, and the processor implements the steps of any one of the above-mentioned library construction methods when executing the computer program on the memory.

According to the method, the device, the storage medium and the electronic equipment for constructing the conversation library, disclosed by the application, the conversation text is subjected to barrel separation processing, so that the distributed storage of the conversation text can be realized, and the subsequent parallel processing can be realized for the conversation text in each barrel, so that the processing speed is greatly increased, and the processing time is shortened; meanwhile, the clustering algorithm based on density is adopted to cluster the conversation texts, so that similar conversation texts can be quickly aggregated together to obtain a first conversation cluster, the accuracy of a clustering result is improved, the similarity among clusters is higher, the coupling degree among clusters is lower, and a guarantee is provided for accurately constructing and obtaining a conversation library.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a flowchart of a method for constructing a speech library according to an embodiment of the present application;

FIG. 2 is a flow chart of a merging process according to another embodiment of the present application;

FIG. 3 is a block diagram illustrating a speech library construction apparatus according to another embodiment of the present application;

fig. 4 is a block diagram of an electronic device according to another embodiment of the application.

Detailed Description

Various aspects and features of the present application are described herein with reference to the accompanying drawings.

It should be understood that various modifications may be made to the embodiments of the application herein. Therefore, the above description should not be taken as limiting, but merely as exemplification of the embodiments. Other modifications within the scope and spirit of the application will occur to persons of ordinary skill in the art.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and, together with a general description of the application given above, and the detailed description of the embodiments given below, serve to explain the principles of the application.

These and other characteristics of the application will become apparent from the following description of a preferred form of embodiment, given as a non-limiting example, with reference to the accompanying drawings.

It is also to be understood that, although the application has been described with reference to some specific examples, those skilled in the art can certainly realize many other equivalent forms of the application.

The above and other aspects, features and advantages of the present application will become more apparent in light of the following detailed description when taken in conjunction with the accompanying drawings.

Specific embodiments of the present application will be described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the application, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the application in unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not intended to be limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present application in virtually any appropriately detailed structure.

The specification may use the word "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the application.

The embodiment of the application provides a method for constructing a telephone library, which can be particularly applied to electronic equipment such as a terminal and a server, and can be particularly applied to the telephone library construction process of the electronic equipment such as the terminal and the server before on-line inquiry. By constructing the speech operation library, the speech operation library can accurately and rapidly provide speech answering operation for doctors during on-line inquiry, and is beneficial to improving the speed and quality of on-line inquiry. As shown in fig. 1, the method for constructing a speech library in this embodiment includes:

step S101, collecting and obtaining a speaking text set containing a plurality of speaking texts aiming at each doctor object;

in the specific implementation process, the step can acquire a plurality of voice texts through a history diagnosis record or acquire a plurality of voice texts through a history recording record.

Step S102, carrying out barrel separation processing on each voice text in the voice text set to obtain a plurality of barrel voice texts;

in the specific implementation process, the method specifically can pre-process each phone text to obtain the pre-processed phone text, and then perform barrel separation processing on each phone text after the pre-processing. The specific pretreatment process is as follows: screening each of the speaking texts in the speaking text set to obtain a speaking text in a non-question form; and sequentially carrying out de-duplication processing and stop word removal processing on each screened speaking text to obtain preprocessed speaking text for barrel separation processing. The data volume can be reduced by preprocessing the dialogue text, the problem of data redundancy is avoided, and a foundation is laid for shortening the construction time of the dialogue library. In the specific implementation process of the step, the barrel dividing number can be preset, for example, the barrel dividing number is set to be 3 barrels, 5 barrels or 8 barrels, and the barrel dividing number can be adjusted according to actual needs. When the text is divided into barrels, a SimHash algorithm can be specifically adopted to divide the text of each phone operation into barrels.

Step S103, clustering is carried out on each barrel of speech text by adopting a clustering algorithm based on density to obtain a plurality of first speech clusters corresponding to each barrel;

in the specific implementation process, the clustering algorithm based on density is adopted to perform clustering, so that the accuracy of a clustering result can be improved, and a foundation is laid for the follow-up construction of a more accurate speaking library. The specific clustering process is that the similarity degree of each phone text is determined according to the hash value of each phone text, and then the phone texts with higher similarity degree in the same barrel are divided together to form a first phone cluster. Whereby each bucket may obtain a corresponding number of first microphone clusters.

Step S104, merging the first microphone clusters to obtain a plurality of second microphone clusters so as to obtain a target microphone library.

In the implementation process of the step, the merging operation can be specifically performed based on the cosine distance between the first microphone clusters, that is, the first microphone clusters with the cosine distance smaller than the preset distance are merged together to obtain the second microphone cluster.

According to the conversation library construction method, through barrel separation processing of conversation texts, distributed storage of the conversation texts can be achieved, parallel processing can be achieved for the conversation texts in each barrel later, processing speed is greatly increased, and processing time is shortened; meanwhile, the clustering algorithm based on density is adopted to cluster the conversation texts, so that similar conversation texts can be quickly aggregated together to obtain a first conversation cluster, the accuracy of a clustering result is improved, the similarity among clusters is higher, the coupling degree among clusters is lower, and a guarantee is provided for accurately constructing and obtaining a conversation library.

The application further provides a speech library construction method, which specifically comprises the following steps:

step S201, collecting on-line inquiry records of all doctor objects to obtain the speaking text set; and/or collecting recording inquiry records of each doctor, and extracting language text from the voice inquiry records to obtain the speaking text set;

in the specific implementation process, for example, original historical inquiry messages sent by a doctor through an online inquiry system can be collected, and then massive texts are extracted from the historical inquiry messages, namely, a speaking text set related to inquiry is obtained.

Step S202, screening each of the speaking texts in the speaking text set to obtain a speaking text in a non-question form; sequentially carrying out de-duplication treatment and stop word removal treatment on each screened speaking text to obtain pretreated speaking text for barrel separation treatment;

in the specific implementation process, the specific pretreatment process is as follows:

first, the dialogue text is subjected to screening processing to remove question sentences. I.e. during the period of giving a diagnosis-end, the question is usually provided directly by the system, e.g. "20 minutes have elapsed, please you are still? ", questions in the text can be removed during the preprocessing stage.

A one-step stop word removal process may then be performed to remove stop words in the text, such as removing words such as "good", "people" and the like.

Then, duplicate removal processing can be performed to remove the identical text, only one text is reserved, and the problem of data redundancy is avoided.

Removing the repetition: the exact same text is removed.

Step S202, calculating hash values corresponding to each phone text based on the phone text by adopting a SimHash algorithm; calculating and obtaining the similarity between any two voice texts based on the hash values; comparing each similarity with a preset similarity value to divide the conversation text corresponding to the similarity value higher than the preset similarity value into the same barrel so as to obtain a plurality of conversation texts;

in the specific implementation process, the number of the speaking texts is huge, and a great amount of memory and time are spent on direct clustering speaking, so that the mass speaking texts are divided into a plurality of barrels for distributed storage, and the subsequent operations in the barrels are processed in parallel, so that the processing speed is greatly increased. Specifically, a SimHash algorithm may be used for barreling, specifically, a speaking text is mapped into a binary string with the f power bit of 2 and 0/1, that is, a binary string is obtained, then a hash value is obtained through calculation, that is, a hash value corresponding to each speaking text is obtained, and finally barreling is performed based on similarity of the hash values. In the step, since the hash values obtained by the similar texts are similar, the similar speech operation texts can be divided into the same barrel, and a foundation is laid for the follow-up accurate construction of the speech operation library. In this step, the number of barrels may be set to 8, and f=3 may be set.

Step S204, based on each of the phone texts, processing to obtain text vectors corresponding to each of the phone texts; clustering each phone text by adopting a density-based clustering algorithm aiming at the text vector of each phone text in the same barrel to obtain a plurality of first phone clusters corresponding to each barrel;

in this step, a simple deduplication operation may also be performed, i.e. to remove just as much as altering the individual word-eye phonetic text for each bucket, before the clustering process is performed. For example, "the cause of urticaria is that an allergen is contacted" and "the cause of urticaria is that you are contacted with an allergen", only one sentence remains as a candidate for the library of utterances. By performing secondary deduplication, the speech operation text which is very similar to the speech operation text can be deduplicated, the data volume is further reduced, and the construction rate of the speech operation library is further improved.

In this step, after the secondary deduplication operation is performed, a clustering operation may be performed. When clustering operation is carried out, a text vector is used for representing a pre-training model Sentence-BERT, and the speaking text in each barrel is input into the model, so that a text vector corresponding to each speaking text is obtained; and then clustering by using a density-based algorithm DBSCAN clustering algorithm. The clustering process is to obtain a plurality of first microphone clusters corresponding to each barrel by calculating a first cosine distance of two text vectors in the same barrel and then aggregating microphone texts corresponding to the first cosine distance smaller than a preset value. In a specific implementation, DBSCAN clustering may be implemented based on sklearn libraries in the python library, for example, setting parameters eps=5, min_samples=3. After the clustering is completed, an average value can be calculated based on the text vectors corresponding to the phone texts in the first phone clusters, and the average value can be used as the center of the corresponding phone clusters.

Step S205, taking the first microphone cluster in any barrel as a reference microphone cluster, and taking the first microphone clusters except the rest barrels as non-reference microphone clusters; calculating cosine distances between each non-reference microphone cluster and each reference microphone cluster; comparing each cosine distance with a preset distance threshold value, and adding a non-reference microphone cluster for calculating the cosine distance into the non-reference microphone cluster for calculating the cosine distance under the condition that the cosine distance is smaller than the preset distance, so as to obtain a plurality of second microphone clusters;

in the specific implementation process, an arbitrary barrel can be selected as a reference barrel, so that a first microphone in the barrel is sequentially used as a reference microphone cluster, and then the first microphone clusters in other barrels are sequentially used as non-reference microphone clusters to calculate a second cosine distance between the first microphone clusters and the reference microphone clusters; respectively determining the minimum second cosine distance corresponding to each non-reference microphone cluster, comparing the minimum second cosine distance with a preset distance based on the minimum second cosine distance, and adding the non-reference microphone cluster for calculating the minimum cosine distance into the non-reference microphone cluster for calculating the minimum cosine distance under the condition that the minimum second cosine distance is smaller than the preset distance; in the case that the minimum second cosine distance is determined to be greater than or equal to a predetermined distance, adding a non-reference microphone cluster for calculating the minimum cosine distance into the reference bucket.

As shown in fig. 2, the specific merging operation process is as follows: the cluster set of all the first microphone clusters in the No. 1 barrel is marked as M, the cluster set of all the first microphone clusters in the rest barrels is marked as candi_M, one first microphone cluster is firstly extracted from candi_M and marked as M, the cosine distance between M and the center of each first microphone cluster in M is calculated, the minimum distance minDis and the cluster minClu with the minimum distance are recorded, if the minDis is smaller than a threshold value, the M and the cluster minClu are combined, otherwise, the M is added into the M as a new cluster, and then the M is removed from candi_M; at this time, if the candi_M set is empty, returning a final combined result M and ending the program; otherwise, the cluster is extracted from the candi_M again, and the above flow is repeated, so that the merging of the first telephone cluster is completed.

In the implementation process, after final merging is completed, the text of each cluster with higher frequency can be reserved as a candidate speech library, and meanwhile, the key subject and key center sentence of each cluster are extracted to configure corresponding labels for text speech based on the follow-up inquiry subject and inquiry keywords for reference.

Step S206, configuring a telephone label for each second telephone cluster based on the question topic and the question keyword of each second telephone cluster.

In the implementation process, after a telephone library containing a plurality of second telephone clusters is obtained, corresponding telephone labels can be configured for each telephone cluster, and the telephone labels are configured so that the corresponding telephone clusters can be obtained based on keyword matching in the inquiry information when the inquiry information is received later, and then telephone display is performed based on telephone texts in each telephone cluster, so that telephone recommendation is realized.

In the implementation process of this embodiment, a plurality of doctor identifiers may be configured for each of the voice texts in each of the second voice clusters based on the frequency of use of each of the voice texts by each of the doctors. Therefore, when the inquiry information is received, after a plurality of corresponding telephone clusters are obtained based on keyword matching in the inquiry information, the telephone text corresponding to the target doctor can be determined based on the doctor identification of each telephone text in each telephone cluster and the identification of the target doctor currently in the inquiry, namely, the telephone text frequently used by the target doctor is displayed and recommended, and personalized recommendation of the telephone is realized.

According to the method, through adopting a processing mode of barrel-division parallel clustering, when the input text data volume is large, storage and efficiency can be balanced within an acceptable range, the efficiency of constructing a speech library is improved, and the problem that the time consumption of the speech library constructing process is long is solved.

By adopting the constructed speaking operation library in the embodiment, the doctor can be assisted in the diagnosis, the preferred speaking operation of the doctor is selected from the speaking operation library in real time and recommended, and the doctor can send the voice operation by clicking the voice operation directly, so that the diagnosis time is shortened, and the diagnosis efficiency is improved.

Another embodiment of the present application provides a speech library construction device, as shown in fig. 3, where the speech library construction device in this embodiment includes:

the acquisition module 11 is used for acquiring a voice operation text set containing a plurality of voice operation texts aiming at each doctor object;

the barreling module 12 is used for carrying out barreling processing on each voice text in the voice text set to obtain a plurality of barrel voice texts;

the clustering module 13 is used for clustering each bucket of the speech text by adopting a clustering algorithm based on density to obtain a plurality of first speech clusters corresponding to each bucket;

and the merging module 14 is configured to merge each of the first microphone clusters to obtain a plurality of second microphone clusters, so as to obtain a target microphone library.

In a specific implementation process of this embodiment, the acquisition module is specifically configured to: acquiring on-line inquiry records of all doctor objects to obtain the speaking text set; and/or collecting recording inquiry records of each doctor, and extracting language text from the voice inquiry records to obtain the speaking text set.

The speech library constructing device in the implementation further comprises a preprocessing module, wherein the preprocessing module is used for: screening each of the speaking texts in the speaking text set to obtain a speaking text in a non-question form; and sequentially carrying out de-duplication processing and stop word removal processing on each screened speaking text to obtain preprocessed speaking text for barrel separation processing.

In a specific implementation process of this embodiment, the barrel splitting module is specifically configured to: based on each voice text, respectively adopting a SimHash algorithm to calculate a hash value corresponding to each voice text; calculating and obtaining the similarity between any two voice texts based on the hash values; and comparing each similarity with a preset similarity value to divide the speaking text corresponding to the similarity value higher than the preset similarity value into the same barrel.

In a specific implementation process of this embodiment, the clustering module is specifically configured to: based on each of the conversation texts, processing to obtain text vectors corresponding to each of the conversation texts; and clustering the text vectors of the voice texts in the same barrel by adopting a density-based clustering algorithm to obtain a plurality of first voice clusters corresponding to the barrels.

In a specific implementation process of this embodiment, the merging module is specifically configured to: taking the first microphone cluster in any barrel as a reference microphone cluster, and taking the first microphone clusters except the rest barrels as non-reference microphone clusters; calculating cosine distances between each non-reference microphone cluster and each reference microphone cluster; and comparing each cosine distance with a preset distance threshold value, and adding the non-reference microphone cluster for calculating the cosine distance into the non-reference microphone cluster for calculating the cosine distance under the condition that the cosine distance is determined to be smaller than the preset distance.

In a specific implementation process of this embodiment, the speech library construction device further includes a configuration module, where the configuration module is configured to: and configuring a telephone label for each second telephone cluster based on the questioning subjects and the questioning keywords of each second telephone cluster.

In a specific implementation process of this embodiment, the speech library construction device further includes an identification module, where the identification module is configured to: a number of doctor identifications are configured for each of the conversation texts within each of the second conversation clusters based on how frequently each of the doctors uses each of the conversation texts.

According to the conversation library construction device, through barrel separation processing of conversation texts, distributed storage of the conversation texts can be achieved, parallel processing can be achieved for the conversation texts in each barrel later, processing speed is greatly increased, and processing time is shortened; meanwhile, the clustering algorithm based on density is adopted to cluster the conversation texts, so that similar conversation texts can be quickly aggregated together to obtain a first conversation cluster, the accuracy of a clustering result is improved, the similarity among clusters is higher, the coupling degree among clusters is lower, and a guarantee is provided for accurately constructing and obtaining a conversation library.

Another embodiment of the present application provides a storage medium storing a computer program which, when executed by a processor, performs the method steps of:

step one, aiming at each doctor object, collecting and obtaining a speaking text set containing a plurality of speaking texts;

step two, carrying out barrel separation processing on each voice operation text in the voice operation text set to obtain a plurality of barrel voice operation texts;

step three, clustering is carried out on each bucket of the conversation text by adopting a clustering algorithm based on density to obtain a plurality of first conversation clusters corresponding to each bucket;

and step four, merging the first microphone clusters to obtain a plurality of second microphone clusters so as to obtain a target microphone library.

The specific implementation process of the above method steps can be referred to the above embodiment of any speech library construction method, and this embodiment will not be repeated here.

According to the application, the conversation text is subjected to barrel separation processing, so that the conversation text can be stored in a distributed manner, and the conversation text in each barrel can be processed in parallel later, thereby greatly accelerating the processing speed and shortening the processing time; meanwhile, the clustering algorithm based on density is adopted to cluster the conversation texts, so that similar conversation texts can be quickly aggregated together to obtain a first conversation cluster, the accuracy of a clustering result is improved, the similarity among clusters is higher, the coupling degree among clusters is lower, and a guarantee is provided for accurately constructing and obtaining a conversation library.

Another embodiment of the present application provides an electronic device, as shown in fig. 4, at least including a memory 1 and a processor 2, where the memory 2 stores a computer program, and the processor 2 implements the following method steps when executing the computer program on the memory 1:

The above embodiments are only exemplary embodiments of the present application and are not intended to limit the present application, the scope of which is defined by the claims. Various modifications and equivalent arrangements of this application will occur to those skilled in the art, and are intended to be within the spirit and scope of the application.

Claims

1. The method for constructing the speech library is characterized by comprising the following steps of:

2. The method of claim 1, wherein the acquiring a conversation text set comprising a plurality of conversation texts for each doctor object comprises:

3. The method of claim 1, wherein prior to separately barreling based on each of the sets of spoken text to obtain a plurality of spoken text, the method further comprises:

4. The method of claim 1, wherein the barreling each of the spoken text within the set of spoken text to obtain a plurality of barreled spoken text, comprises:

5. The method of claim 1, wherein clustering is performed on each of the microphone texts to obtain a plurality of first microphone clusters corresponding to each of the microphone texts, respectively, and the clustering comprises:

6. The method of claim 1, wherein merging each of the first microphone clusters comprises:

7. The method of claim 1, wherein after obtaining a number of second microphone clusters, the method further comprises:

8. The method of claim 1, wherein the method further comprises:

9. A speech library construction device, comprising:

10. A storage medium storing a computer program which, when executed by a processor, implements the steps of the library construction method of any one of claims 1-8.