CN109584865B - Application program control method and device, readable storage medium and terminal equipment - Google Patents

Application program control method and device, readable storage medium and terminal equipment Download PDF

Info

Publication number
CN109584865B
CN109584865B CN201811210044.1A CN201811210044A CN109584865B CN 109584865 B CN109584865 B CN 109584865B CN 201811210044 A CN201811210044 A CN 201811210044A CN 109584865 B CN109584865 B CN 109584865B
Authority
CN
China
Prior art keywords
control instruction
word
voice
keyword
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811210044.1A
Other languages
Chinese (zh)
Other versions
CN109584865A (en
Inventor
董亚荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811210044.1A priority Critical patent/CN109584865B/en
Publication of CN109584865A publication Critical patent/CN109584865A/en
Application granted granted Critical
Publication of CN109584865B publication Critical patent/CN109584865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the field of computer technologies, and in particular, to an application control method, an application control device, a computer readable storage medium, and a terminal device. After receiving a voice acquisition instruction, the method acquires voice information input by a user, carries out voice recognition on the acquired voice information to obtain text information corresponding to the voice information, then calculates and determines a target control instruction of an application program through matching degree, and controls the application program to execute operation corresponding to the target control instruction. According to the embodiment of the invention, the user can send the control instruction of the application program in a voice control mode, the application program can automatically execute the corresponding operation, the operation is simple and easy, the efficiency is greatly improved, and the user obtains better use experience.

Description

Application program control method and device, readable storage medium and terminal equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an application control method, an application control device, a computer readable storage medium, and a terminal device.
Background
Along with the development of technology, more and more enterprises begin to adopt electronic office work, users can directly apply for leave, business trip, reimbursement, outgoing application and the like on office application programs, and compared with the traditional paper application mode, the work efficiency is greatly improved. However, the operation of the existing office application program is still complicated, the corresponding function options can be opened only by clicking and searching for multiple times, time and labor are consumed, and the user experience is poor.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide an application control method, an apparatus, a computer readable storage medium, and a terminal device, so as to solve the problem that the existing office application has complicated operation and poor user experience.
A first aspect of an embodiment of the present invention provides an application control method, which may include:
after receiving a voice acquisition instruction, acquiring voice information input by a user, wherein the voice information comprises a control instruction of an application program;
performing voice recognition on the collected voice information to obtain text information corresponding to the voice information;
respectively calculating the matching degree between the text information and each control instruction in a preset control instruction set;
and selecting a control instruction with highest matching degree with the text information from the control instruction set as a target control instruction for the application program, and controlling the application program to execute an operation corresponding to the target control instruction.
A second aspect of an embodiment of the present invention provides an application control apparatus, which may include:
the voice information acquisition module is used for acquiring voice information input by a user after receiving a voice acquisition instruction, wherein the voice information comprises a control instruction of an application program;
The voice recognition module is used for carrying out voice recognition on the collected voice information to obtain text information corresponding to the voice information;
the matching degree calculation module is used for calculating the matching degree between the text information and each control instruction in a preset control instruction set respectively;
The target control instruction selecting module is used for selecting a control instruction with highest matching degree with the text information from the control instruction set as a target control instruction for the application program;
and the operation execution module is used for controlling the application program to execute the operation corresponding to the target control instruction.
A third aspect of embodiments of the present invention provides a computer readable storage medium storing computer readable instructions which when executed by a processor perform the steps of:
after receiving a voice acquisition instruction, acquiring voice information input by a user, wherein the voice information comprises a control instruction of an application program;
performing voice recognition on the collected voice information to obtain text information corresponding to the voice information;
respectively calculating the matching degree between the text information and each control instruction in a preset control instruction set;
and selecting a control instruction with highest matching degree with the text information from the control instruction set as a target control instruction for the application program, and controlling the application program to execute an operation corresponding to the target control instruction.
A fourth aspect of the embodiments of the present invention provides a terminal device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, the processor executing the computer readable instructions to perform the steps of:
after receiving a voice acquisition instruction, acquiring voice information input by a user, wherein the voice information comprises a control instruction of an application program;
performing voice recognition on the collected voice information to obtain text information corresponding to the voice information;
respectively calculating the matching degree between the text information and each control instruction in a preset control instruction set;
and selecting a control instruction with highest matching degree with the text information from the control instruction set as a target control instruction for the application program, and controlling the application program to execute an operation corresponding to the target control instruction.
Compared with the prior art, the embodiment of the invention has the beneficial effects that: after receiving a voice acquisition instruction, the embodiment of the invention acquires voice information input by a user, carries out voice recognition on the acquired voice information to obtain text information corresponding to the voice information, then determines a target control instruction of an application program through matching degree calculation, and controls the application program to execute operation corresponding to the target control instruction. According to the embodiment of the invention, the user can send the control instruction of the application program in a voice control mode, the application program can automatically execute the corresponding operation, the operation is simple and easy, the efficiency is greatly improved, and the user obtains better use experience.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an embodiment of a method for controlling an application program according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of calculating the degree of matching between text information and each control instruction in a preset control instruction set, respectively;
FIG. 3 is a schematic flow chart of computing voiceprint feature vectors of speech information;
FIG. 4 is a block diagram of an embodiment of an application control device according to an embodiment of the present invention;
Fig. 5 is a schematic block diagram of a terminal device in an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, an embodiment of an application control method according to an embodiment of the present invention may include:
step S101, after receiving a voice acquisition instruction, acquiring voice information input by a user.
The voice information comprises control instructions of an application program. And when the user releases the voice input button, namely, a voice acquisition termination instruction is issued to the application program, and the application program ends the acquisition of the voice information.
Step S102, performing voice recognition on the collected voice information to obtain text information corresponding to the voice information.
The speech recognition is to convert a section of speech information into corresponding text information, and mainly comprises the processes of feature extraction, acoustic model, language model, decoding and the like, and in order to extract features more effectively, audio data preprocessing work such as filtering, framing and the like is often needed to be carried out on the collected speech information, so that an audio signal to be analyzed is extracted from an original signal properly.
Feature extraction works transform the speech information from the time domain to the frequency domain, providing the acoustic model with the appropriate feature vectors.
And calculating the score of each feature vector on the acoustic feature according to the acoustic characteristics in the acoustic model. Hidden markov (Hidden Markov Model, HMM) acoustic modeling is preferably used in this embodiment: the concept of a Markov model is a discrete time domain finite state automaton, and hidden Markov refers to the condition that the internal state of the Markov model is invisible outside, and the outside can only see the output value at each moment. For speech recognition systems, the output values are typically acoustic features calculated from individual frames. Two assumptions are made for the HMM to characterize speech information, one in which the transition of the internal state is related to the last state only and the other in which the output value is related to the current state (or current state transition) only, which greatly reduces the complexity of the model. The use of HMMs in speech recognition is typically modeled with a unidirectional, self-loop, cross-loop topology from left to right, one phoneme being a three to five state HMM, one word being an HMM formed by concatenating HMMs of multiple phonemes that make up the word, while the entire model of continuous speech recognition is an HMM of words and silence.
The language model calculates the probability of the corresponding possible phrase sequence of the voice information according to the theory of linguistic correlation. In this embodiment, an N-Gram language model is preferably used, which is based on the assumption that the occurrence of the nth word is related to only the preceding N-1 words, but not to any other word, and the probability of the whole sentence is the product of the occurrence probabilities of the respective words. These probabilities can be obtained by directly counting the number of simultaneous occurrences of N words from the corpus, and are usually binary Bi-Gram and ternary Tri-Gram. The performance of a language model is typically measured in terms of cross entropy and complexity. The meaning of cross entropy is the difficulty of recognition with the model pair, or from a compression perspective, each word is encoded with on average a few bits. The meaning of complexity is that the model represents the average number of branches of this text, the inverse of which can be regarded as the average probability for each word. Smoothing refers to assigning a probability value to the unobserved N-ary combinations to ensure that the word sequence always gets a probability value through the language model.
And finally, decoding the phrase sequence according to the existing dictionary to obtain text information corresponding to the voice information.
Step S103, the matching degree between the text information and each control instruction in a preset control instruction set is calculated respectively.
The control instruction set may include, but is not limited to, a control instruction to create an leave application, create a business application, create a reimbursement application, create an outbound application, and the like.
As shown in fig. 2, step S103 may specifically include the following procedures:
Step S1031, determining keyword sets corresponding to the control instructions, and calculating the classification identifiers of the keywords in each keyword set.
Firstly, word segmentation processing is carried out on each corpus in a preset corpus to obtain each word.
The corpus comprises corpus sub-libraries corresponding to the control instructions respectively, wherein each corpus sub-library can be obtained according to statistics of large-scale user data. Specifically, sentences which are conventionally used by each user when issuing a certain control instruction are obtained, and the sentences are added into a corpus corresponding to the control instruction. For example, if, when user data statistics is performed, a user is used to issue a control instruction for creating a business trip application by using the statement "i want to go on business", and B user is used to issue a control instruction for creating a business trip application by using the statement "please help i create a business trip application", then all the statements are added into a corpus sub-base corresponding to the control instruction for creating a business trip application as the corpus therein.
The word segmentation process refers to the process of segmenting a corpus into individual words, and in this embodiment, the corpus can be segmented according to a general dictionary, so that the segmented words are normal words, and if the words are not in the dictionary, the words are segmented. When words can be formed in the front-back direction, for example, "want pray to Gods for blessing" will be divided according to the size of the statistical word frequency, if "ask" word frequency is high, "ask/mind" will be divided, if "pray to Gods for blessing" word frequency is high, "want/pray to Gods for blessing".
Then, the occurrence frequency of each word in each word sub-library is counted, and the classification identity of each word is calculated according to the following formula:
Wherein w is the sequence number of words, w is 1-WordNum, wordNum, the total number of words, freqSeq w is the frequency sequence of the w word in each word sub-base, FreqSeqw=[Freqw,1,Freqw,2,......,Freqw,c,......,Freqw,ClassNum],Freqw,c is the frequency of the w word in the corpus sub-base corresponding to the c-th control instruction, freqSeq' w is the sequence remaining after the maximum value is removed from FreqSeq w, namely: freqSeq' w=FreqSeqw-MAX(FreqSeqw), MAX is the maximum function, classDeg w is the classification identity of the w-th word;
then, selecting words with classification discrimination greater than a preset discrimination threshold as keywords, wherein the keywords correspond to control instructions corresponding to FreqSeq w when the maximum value is obtained.
The recognition threshold may be set according to practical situations, for example, it may be set to 5, 10, 20 or other values.
Control instructions corresponding to the respective keywords may be determined according to the following equation:
TgtKwSetw=argmax(FreqSeqw)=argmax(Freqw,1,Freqw,2,......,Freqw,c,......,Freqw,ClassNum) Wherein TgtKwSet w is the sequence number of the control instruction corresponding to the w-th keyword;
For example, the frequency of occurrence of the term "illness" in the corpus sub-library corresponding to the creation of the leave application is 1000 times, the frequency of occurrence in the corpus sub-library corresponding to the creation of the reimbursement application is 20 times, and the frequency of occurrence in the corpus sub-library corresponding to the creation of the business application is 1 time, the classification discrimination is:
The classification identity is greater than the identity threshold, and can be determined to be a keyword, and since the classification identity occurs most frequently in the corpus corresponding to the application to be reimbursed, the keyword corresponding to the control instruction to be reimbursed can be determined.
Finally, each keyword corresponding to the c-th control instruction is constructed as a keyword set corresponding to the c-th control instruction, as shown in the following table:
Control instructions Keyword set
Control instruction 1 Set 1= { keyword 1, keyword 2, keyword 3}
Control instruction 2 Set 2= { keyword 4, keyword 5, keyword 6}
Control instruction 3 Set 3= { keyword 7, keyword 8}
…… ……
…… ……
For example, the keyword set for creating the control instruction of the leave application may include: keywords such as "please leave", "wedding holidays", "ill", "accompanying product", etc.
Step S1032, counting the occurrence frequency of each keyword in the text information.
Step S1033, calculating the matching degree between the text information and each control instruction.
Preferably, the matching degree between the text information and each control instruction may be calculated according to the following formula:
Wherein c is the sequence number of the control instruction, c is more than or equal to 1 and less than or equal to ClassNum, classNum, kn is the total number of the control instructions, kn is the sequence number of the keywords, kn is more than or equal to 1 and less than or equal to KwNum c,KwNumc, the total number of the keywords in the keyword set corresponding to the c-th control instruction, msgKWNum c,kn is the frequency of occurrence of the kn-th keyword in the keyword set corresponding to the c-th control instruction in the text information, classDeg c,kn is the classification recognition degree of the kn-th keyword in the keyword set corresponding to the c-th control instruction, and MatchDeg c is the matching degree between the text information and the c-th control instruction.
And step S104, selecting a control instruction with highest matching degree with the text information from the control instruction set as a target control instruction for the application program.
The target control instructions for the application may be determined according to the following equation:
TargetCmd=argmax(MatchDegSeq)
=argmax(MatchDeg1,MatchDeg2,......,MatchDegc,......,MatchDegClassNum)
Wherein ,MatchDegSeq=(MatchDeg1,MatchDeg2,......,MatchDegc,......,MatchDegClassNum),, MATCHDEGSEQ is the matching degree sequence between the text information and each control instruction, and TARGETCMD is the finally determined sequence number of the target control instruction of the application program.
Step S105, controlling the application program to execute an operation corresponding to the target control instruction.
When the operation to be performed by the user is determined, the operation step can be automatically performed. For example, if the voice information sent by the user is "i ill", it is determined that the user wants to create the leave application through voice recognition and matching calculation, the application is controlled to automatically open the corresponding operation interface, and a new leave application is created for the user.
Preferably, in order to ensure the security, after the voice information is collected, before the collected voice information is subjected to voice recognition, the voice information can be authenticated so as to prevent other users from impersonating the current user to perform operation.
First, a voiceprint feature vector of the speech information is calculated.
As shown in fig. 3, the calculation process of the voiceprint feature vector of the voice information may include:
Step S301, dividing the voice information into M voice subsections.
Wherein M is an integer greater than 1, and its specific value may be set according to practical situations, for example, it may be set to 3, 5, 10 or other values, etc.
Step S302, respectively calculating the Mel frequency spectrum scrambling coefficient vector of each voice subsection.
Preferably, the mel-spectrum cepstral coefficient vector for each speech sub-segment may be calculated separately according to:
MelVecm=MFCCFuc(SubVoicem)
Wherein M is the sequence number of the voice sub-segment, M is more than or equal to 1 and less than or equal to M, subVoice m is the M-th voice sub-segment, MFCCFuc is a preset Mel frequency spectrum scrambling coefficient calculation function, melVec m is the Mel frequency spectrum scrambling coefficient vector of the M-th voice sub-segment, and MelVecm=(MelCoem,1,MelCoem,2,......,MelCoem,n,......,MelCoem,N),MelCoem,n is the n-th Mel frequency spectrum scrambling coefficient of the M-th voice sub-segment.
Step S303, respectively calculating the weight coefficient of each voice subsection.
Preferably, the weighting coefficients of the individual speech subsections can be calculated separately according to the following formula:
wherein Weight m is the Weight coefficient of the mth speech subsection;
And S304, constructing the voiceprint feature vector of the voice information.
Preferably, the voiceprint feature vector of the speech information may be constructed according to the following equation:
VoPrintVec=(VpElem1,VpElem2,......,VpElemn,......,VpElemN)
wherein, VoPrintVec is a voiceprint feature vector of the speech information.
And then, inquiring the reference feature vector corresponding to the user in a preset database.
The reference feature vector is a voiceprint feature vector extracted from the voice of the user corresponding to the current login account in advance, and the specific calculation process is similar to the foregoing process, and is not repeated here.
Then, a similarity between a voiceprint feature vector of the speech information and the reference feature vector is calculated according to the following formula:
Wherein N is the element number of the voiceprint feature vector of the voice information, N is 1-N, N is the total number of elements of the voiceprint feature vector of the voice information, vpElem n is the nth element of the voiceprint feature vector of the voice information, stVpElem n is the nth element of the reference feature vector, and SimDeg is the similarity between the voiceprint feature vector of the voice information and the reference feature vector.
If the similarity between the voiceprint feature vector of the voice information and the reference feature vector is greater than a preset similarity threshold, the voice information is indicated to be the user corresponding to the current login account, and the step of performing voice recognition on the collected voice information and the subsequent steps are executed. If the similarity between the voiceprint feature vector of the voice information and the reference feature vector is smaller than or equal to the similarity threshold, the user who sends the voice information and is not corresponding to the current login account is indicated, the voice information is ignored at the moment, and the step of carrying out voice recognition on the collected voice information and the subsequent steps are not executed.
The similarity threshold may be set according to practical situations, for example, it may be set to 70%, 80%, 90%, or other values, and so on.
Preferably, in this embodiment, different operation rights may be set for different users, and each user may only control the application program to execute an operation corresponding to the rights, as shown in the following table:
User' s Operating rights
User 1 Operation authority set 1= { operation 1, operation 2, operation 3, operation 4}
User 2 Operation authority set 2= { operation 1, operation 3, operation 4}
User 3 Operation authority set 3= { operation 1, operation 2}
…… ……
…… ……
After determining the operation corresponding to the target control instruction, inquiring whether the user has the corresponding operation authority, if the user does not have the operation authority, not executing the operation, and prompting the user in a related manner, and if the user has the operation authority, executing the step of controlling the application program to execute the operation corresponding to the target control instruction.
In summary, after receiving a voice acquisition instruction, the embodiment of the invention acquires voice information input by a user, performs voice recognition on the acquired voice information to obtain text information corresponding to the voice information, then determines a target control instruction for an application program through matching degree calculation, and controls the application program to execute an operation corresponding to the target control instruction. According to the embodiment of the invention, the user can send the control instruction of the application program in a voice control mode, the application program can automatically execute the corresponding operation, the operation is simple and easy, the efficiency is greatly improved, and the user obtains better use experience.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
Corresponding to an application control method described in the above embodiments, fig. 4 shows a block diagram of an embodiment of an application control device according to an embodiment of the present invention.
In this embodiment, an application control device may include:
the voice information acquisition module 401 is configured to acquire voice information input by a user after receiving a voice acquisition instruction, where the voice information includes a control instruction of an application program;
the voice recognition module 402 is configured to perform voice recognition on the collected voice information to obtain text information corresponding to the voice information;
A matching degree calculating module 403, configured to calculate matching degrees between the text information and each control instruction in a preset control instruction set respectively;
A target control instruction selecting module 404, configured to select, from the control instruction set, a control instruction with the highest matching degree with the text information as a target control instruction for the application program;
And an operation execution module 405, configured to control the application program to execute an operation corresponding to the target control instruction.
Further, the matching degree calculating module may include:
The keyword set determining unit is used for determining keyword sets corresponding to the control instructions respectively and calculating the classification identifiers of the keywords in each keyword set respectively;
The frequency statistics unit is used for respectively counting the frequency of each keyword in the text information;
The matching degree calculating unit is used for calculating the matching degree between the text information and each control instruction according to the following formula:
Wherein c is the sequence number of the control instruction, c is more than or equal to 1 and less than or equal to ClassNum, classNum, kn is the total number of the control instructions, kn is the sequence number of the keywords, kn is more than or equal to 1 and less than or equal to KwNum c,KwNumc, the total number of the keywords in the keyword set corresponding to the c-th control instruction, msgKWNum c,kn is the frequency of occurrence of the kn-th keyword in the keyword set corresponding to the c-th control instruction in the text information, classDeg c,kn is the classification recognition degree of the kn-th keyword in the keyword set corresponding to the c-th control instruction, and MatchDeg c is the matching degree between the text information and the c-th control instruction.
Further, the keyword set determining unit may include:
The word segmentation processing subunit is used for carrying out word segmentation processing on each corpus in a preset corpus to obtain each word, wherein the corpus comprises corpus sub-databases respectively corresponding to each control instruction;
the frequency statistics subunit is used for respectively counting the frequency of each word in each word sub-library;
the classification recognition degree calculating subunit is used for respectively calculating the classification recognition degree of each word according to the following steps:
Wherein w is the sequence number of words, w is 1-WordNum, wordNum, the total number of words, freqSeq w is the frequency sequence of the w word in each word sub-base, FreqSeqw=[Freqw,1,Freqw,2,......,Freqw,c,......,Freqw,ClassNum],Freqw,c is the frequency of the w word in the corpus sub-base corresponding to the c-th control instruction, freqSeq' w is the sequence remaining after the maximum value is removed from FreqSeq w, namely: freqSeq' w=FreqSeqw-MAX(FreqSeqw), MAX is the maximum function, classDeg w is the classification identity of the w-th word;
the keyword determining subunit is configured to select, as keywords, words with classification identifiers greater than a preset identifier threshold, and determine control instructions corresponding to the keywords according to the following formula:
TgtKwSetw=argmax(FreqSeqw)=argmax(Freqw,1,Freqw,2,......,Freqw,c,......,Freqw,ClassNum) Wherein TgtKwSet w is the sequence number of the control instruction corresponding to the w-th keyword;
and the keyword set construction subunit is used for constructing each keyword corresponding to the c-th control instruction into a keyword set corresponding to the c-th control instruction.
Further, the application control device may further include:
The voiceprint feature vector calculation module is used for calculating voiceprint feature vectors of the voice information;
The reference feature vector query module is used for querying a reference feature vector corresponding to the user in a preset database;
the similarity calculation module is used for calculating the similarity between the voiceprint feature vector of the voice information and the reference feature vector according to the following formula:
Wherein N is the element number of the voiceprint feature vector of the voice information, N is 1-N, N is the total number of elements of the voiceprint feature vector of the voice information, vpElem n is the nth element of the voiceprint feature vector of the voice information, stVpElem n is the nth element of the reference feature vector, and SimDeg is the similarity between the voiceprint feature vector of the voice information and the reference feature vector.
Further, the voiceprint feature vector calculation module may include:
A voice sub-segment dividing unit, configured to divide the voice information into M voice sub-segments, where M is an integer greater than 1;
A mel-frequency spectrum scrambling coefficient vector calculating unit for calculating mel-frequency spectrum scrambling coefficient vectors of the respective voice subsections according to the following formula:
MelVecm=MFCCFuc(SubVoicem)
Wherein M is the sequence number of the voice sub-segment, M is more than or equal to 1 and less than or equal to M, subVoice m is the M-th voice sub-segment, MFCCFuc is a preset Mel frequency spectrum scrambling coefficient calculation function, melVec m is the Mel frequency spectrum scrambling coefficient vector of the M-th voice sub-segment, and MelVecm=(MelCoem,1,MelCoem,2,......,MelCoem,n,......,MelCoem,N),MelCoem,n is the n-th Mel frequency spectrum scrambling coefficient of the M-th voice sub-segment;
The weight coefficient calculation unit is used for calculating the weight coefficient of each voice subsection according to the following formula:
wherein Weight m is the Weight coefficient of the mth speech subsection;
a voiceprint feature vector construction unit configured to construct a voiceprint feature vector of the voice information according to:
VoPrintVec=(VpElem1,VpElem2,......,VpElemn,......,VpElemN)
wherein, VoPrintVec is a voiceprint feature vector of the speech information.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described apparatus, modules and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Fig. 5 shows a schematic block diagram of a terminal device according to an embodiment of the present invention, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown.
In this embodiment, the terminal device 5 may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The terminal device 5 may include: a processor 50, a memory 51, and computer readable instructions 52 stored in the memory 51 and executable on the processor 50, such as computer readable instructions for performing the application control method described above. The processor 50, when executing the computer readable instructions 52, implements the steps of the various application control method embodiments described above, such as steps S101 through S105 shown in fig. 1. Or the processor 50, when executing the computer-readable instructions 52, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of modules 401 through 405 shown in fig. 4.
Illustratively, the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to accomplish the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions describing the execution of the computer readable instructions 52 in the terminal device 5.
The Processor 50 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer readable instructions as well as other instructions and data required by the terminal device 5. The memory 51 may also be used to temporarily store data that has been output or is to be output.
The functional units in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, comprising a number of computer readable instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing computer readable instructions.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An application control method, comprising:
after receiving a voice acquisition instruction, acquiring voice information input by a user, wherein the voice information comprises a control instruction of an application program;
performing voice recognition on the collected voice information to obtain text information corresponding to the voice information;
Word segmentation is carried out on each corpus in a preset corpus to obtain each word, wherein each corpus comprises a corpus sub-database corresponding to each control instruction in a preset control instruction set; counting the occurrence frequency of each word in each word sub-library respectively; taking the ratio of the maximum value to the next maximum value of the frequency of the w-th word in each word sub-library as the classification identification degree of the w-th word, wherein w is the serial number of the word, and w is more than or equal to 1 and less than or equal to WordNum, wordNum and the total number of the words; selecting words with classification recognition degree larger than a preset recognition degree threshold as keywords, and taking a control instruction corresponding to a corpus with the maximum frequency of the w-th keyword as a control instruction corresponding to the w-th keyword; constructing each keyword corresponding to the c-th control instruction into a keyword set corresponding to the c-th control instruction, wherein c is the sequence number of the control instruction, and c is more than or equal to 1 and less than or equal to ClassNum, classNum and is the total number of the control instructions;
respectively counting the occurrence frequency of each keyword in the text information;
Taking the sum of products of the frequency of occurrence and the classification recognition degree of each keyword in the keyword set corresponding to the c-th control instruction in the text information as the matching degree between the text information and the c-th control instruction;
and selecting a control instruction with highest matching degree with the text information from the control instruction set as a target control instruction for the application program, and controlling the application program to execute an operation corresponding to the target control instruction.
2. The application control method according to claim 1, wherein the step of taking, as the degree of matching between the text information and the c-th control instruction, a sum of products of the frequency of occurrence and the classification discrimination of each keyword in the keyword set corresponding to the c-th control instruction in the text information includes:
calculating the matching degree between the text information and the c-th control instruction according to the following formula:
Wherein kn is the sequence number of the key words, kn is 1-KwNum c,KwNumc is the total number of the key words in the key word set corresponding to the c-th control instruction, msgKWNum c,kn is the frequency of occurrence of the kn key words in the key word set corresponding to the c-th control instruction in the text information, classDeg c,kn is the classification identification of the kn key words in the key word set corresponding to the c-th control instruction, and MatchDeg c is the matching degree between the text information and the c-th control instruction.
3. The application control method according to claim 1, wherein the step of using the ratio of the maximum value of the frequency of occurrence of the w-th word in each word sub-library to the next-largest value as the classification recognition degree of the w-th word comprises:
The classification identity of the w-th word is calculated according to the following:
Wherein FreqSeq w is the frequency sequence of occurrence of the w-th word in each word sub-library, FreqSeqw=[Freqw,1,Freqw,2,......,Freqw,c,......,Freqw,ClassNum],Freqw,c is the frequency of occurrence of the w-th word in the corpus sub-library corresponding to the c-th control instruction, freqSeq' w is the sequence remaining after the maximum value is removed from FreqSeq w, namely: freqSeq' w=FreqSeqw-MAX(FreqSeqw), MAX is the maximum function, classDeg w is the classification identity of the w-th word.
4. The application control method according to any one of claims 1 to 3, characterized by further comprising, before performing speech recognition on the collected speech information:
Calculating the voiceprint feature vector of the voice information, and inquiring the reference feature vector corresponding to the user in a preset database;
Calculating the similarity between the voiceprint feature vector of the voice information and the reference feature vector according to the following formula:
wherein N is the element number of the voiceprint feature vector of the voice information, N is more than or equal to 1 and less than or equal to N, N is the total number of elements of the voiceprint feature vector of the voice information, vpElem n is the nth element of the voiceprint feature vector of the voice information, stVpElem n is the nth element of the reference feature vector, and SimDeg is the similarity between the voiceprint feature vector of the voice information and the reference feature vector;
And if the similarity between the voiceprint feature vector of the voice information and the reference feature vector is greater than a preset similarity threshold, executing the step of carrying out voice recognition on the acquired voice information and the subsequent steps.
5. The application control method according to claim 4, wherein the calculating the voiceprint feature vector of the voice information includes:
Dividing the voice information into M voice subsections, wherein M is an integer greater than 1;
Respectively calculating the mel frequency spectrum scrambling coefficient vector of each voice subsection according to the following steps:
MelVecm=MFCCFuc(SubVoicem)
Wherein M is the sequence number of the voice sub-segment, M is more than or equal to 1 and less than or equal to M, subVoice m is the M-th voice sub-segment, MFCCFuc is a preset Mel frequency spectrum scrambling coefficient calculation function, melVec m is the Mel frequency spectrum scrambling coefficient vector of the M-th voice sub-segment, and MelVecm=(MelCoem,1,MelCoem,2,......,MelCoem,n,......,MelCoem,N),MelCoem,n is the n-th Mel frequency spectrum scrambling coefficient of the M-th voice sub-segment;
and respectively calculating the weight coefficient of each voice subsection according to the following steps:
wherein Weight m is the Weight coefficient of the mth speech subsection;
constructing a voiceprint feature vector of the voice information according to the following steps:
VoPrintVec=(VpElem1,VpElem2,......,VpElemn,......,VpElemN)
wherein, VoPrintVec is a voiceprint feature vector of the speech information.
6. An application control apparatus, comprising:
the voice information acquisition module is used for acquiring voice information input by a user after receiving a voice acquisition instruction, wherein the voice information comprises a control instruction of an application program;
The voice recognition module is used for carrying out voice recognition on the collected voice information to obtain text information corresponding to the voice information;
the matching degree calculation module is used for performing word segmentation on each corpus in a preset corpus to obtain each word, wherein the corpus comprises corpus sub-databases respectively corresponding to each control instruction in a preset control instruction set; counting the occurrence frequency of each word in each word sub-library respectively; taking the ratio of the maximum value to the next maximum value of the frequency of the w-th word in each word sub-library as the classification identification degree of the w-th word, wherein w is the serial number of the word, and w is more than or equal to 1 and less than or equal to WordNum, wordNum and the total number of the words; selecting words with classification recognition degree larger than a preset recognition degree threshold as keywords, and taking a control instruction corresponding to a corpus with the maximum frequency of the w-th keyword as a control instruction corresponding to the w-th keyword; constructing each keyword corresponding to the c-th control instruction into a keyword set corresponding to the c-th control instruction, wherein c is the sequence number of the control instruction, and c is more than or equal to 1 and less than or equal to ClassNum, classNum and is the total number of the control instructions; respectively counting the occurrence frequency of each keyword in the text information; taking the sum of products of the frequency of occurrence and the classification recognition degree of each keyword in the keyword set corresponding to the c-th control instruction in the text information as the matching degree between the text information and the c-th control instruction;
The target control instruction selecting module is used for selecting a control instruction with highest matching degree with the text information from the control instruction set as a target control instruction for the application program;
and the operation execution module is used for controlling the application program to execute the operation corresponding to the target control instruction.
7. The application control device according to claim 6, wherein the matching degree calculation module includes:
The keyword set determining unit is used for determining keyword sets corresponding to the control instructions respectively and calculating the classification identifiers of the keywords in each keyword set respectively;
The frequency statistics unit is used for respectively counting the frequency of each keyword in the text information;
The matching degree calculating unit is used for calculating the matching degree between the text information and each control instruction according to the following formula:
Wherein kn is the sequence number of the key words, kn is 1-KwNum c,KwNumc is the total number of the key words in the key word set corresponding to the c-th control instruction, msgKWNum c,kn is the frequency of occurrence of the kn key words in the key word set corresponding to the c-th control instruction in the text information, classDeg c,kn is the classification identification of the kn key words in the key word set corresponding to the c-th control instruction, and MatchDeg c is the matching degree between the text information and the c-th control instruction.
8. The application control device according to claim 7, wherein the keyword set determination unit includes:
the word segmentation processing subunit is used for carrying out word segmentation processing on each corpus in the corpus to obtain each word;
the frequency statistics subunit is used for respectively counting the frequency of each word in each word sub-library;
the classification recognition degree calculating subunit is used for respectively calculating the classification recognition degree of each word according to the following steps:
Wherein FreqSeq w is the frequency sequence of occurrence of the w-th word in each word sub-library, FreqSeqw=[Freqw,1,Freqw,2,......,Freqw,c,......,Freqw,ClassNum],Freqw,c is the frequency of occurrence of the w-th word in the corpus sub-library corresponding to the c-th control instruction, freqSeq' w is the sequence remaining after the maximum value is removed from FreqSeq w, namely: freqSeq' w=FreqSeqw-MAX(FreqSeqw), MAX is the maximum function, classDeg w is the classification identity of the w-th word;
the keyword determining subunit is configured to select, as keywords, words with classification identifiers greater than a preset identifier threshold, and determine control instructions corresponding to the keywords according to the following formula:
TgtKwSetw=argmax(FreqSeqw)=argmax(Freqw,1,Freqw,2,......,Freqw,c,......,Freqw,ClassNum)
Wherein TgtKwSet w is the sequence number of the control instruction corresponding to the w-th keyword;
and the keyword set construction subunit is used for constructing each keyword corresponding to the c-th control instruction into a keyword set corresponding to the c-th control instruction.
9. A computer readable storage medium storing computer readable instructions which, when executed by a processor, implement the steps of the application control method of any one of claims 1 to 5.
10. A terminal device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, wherein the processor, when executing the computer readable instructions, implements the steps of the application control method of any one of claims 1 to 5.
CN201811210044.1A 2018-10-17 2018-10-17 Application program control method and device, readable storage medium and terminal equipment Active CN109584865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811210044.1A CN109584865B (en) 2018-10-17 2018-10-17 Application program control method and device, readable storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811210044.1A CN109584865B (en) 2018-10-17 2018-10-17 Application program control method and device, readable storage medium and terminal equipment

Publications (2)

Publication Number Publication Date
CN109584865A CN109584865A (en) 2019-04-05
CN109584865B true CN109584865B (en) 2024-05-31

Family

ID=65920096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811210044.1A Active CN109584865B (en) 2018-10-17 2018-10-17 Application program control method and device, readable storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN109584865B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147216A (en) * 2019-04-16 2019-08-20 深圳壹账通智能科技有限公司 Page switching method, device, computer equipment and the storage medium of application program
CN110109365A (en) * 2019-04-24 2019-08-09 平安科技(深圳)有限公司 Speaker control method, device and computer readable storage medium
CN110171005A (en) * 2019-06-10 2019-08-27 杭州任你说智能科技有限公司 A kind of tourism robot system based on intelligent sound box
CN111292742A (en) * 2020-01-14 2020-06-16 京东数字科技控股有限公司 Data processing method and device, electronic equipment and computer storage medium
CN112825030B (en) * 2020-02-28 2023-09-19 腾讯科技(深圳)有限公司 Application program control method, device, equipment and storage medium
CN112599125A (en) * 2020-12-02 2021-04-02 一汽资本控股有限公司 Voice office processing method and device, terminal and storage medium
CN112581957B (en) * 2020-12-04 2023-04-11 浪潮电子信息产业股份有限公司 Computer voice control method, system and related device
CN116189673A (en) * 2021-11-29 2023-05-30 中兴通讯股份有限公司 Voice control method, terminal equipment, server and storage medium
CN114298026A (en) * 2021-12-03 2022-04-08 阿里健康科技(杭州)有限公司 Semantic analysis method, referral processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06161488A (en) * 1992-11-17 1994-06-07 Ricoh Co Ltd Speech recognizing device
CN101123428A (en) * 2006-08-09 2008-02-13 马昊 Intelligent electronic remote control switch for voice recognition capable of dynamic setting
CN101447185A (en) * 2008-12-08 2009-06-03 深圳市北科瑞声科技有限公司 Audio frequency rapid classification method based on content
CN201514761U (en) * 2009-09-23 2010-06-23 上海大屯能源股份有限公司 Household voice controller
CN107329843A (en) * 2017-06-30 2017-11-07 百度在线网络技术(北京)有限公司 Application program sound control method, device, equipment and storage medium
CN107492374A (en) * 2017-10-11 2017-12-19 深圳市汉普电子技术开发有限公司 A kind of sound control method, smart machine and storage medium
CN108182937A (en) * 2018-01-17 2018-06-19 出门问问信息科技有限公司 Keyword recognition method, device, equipment and storage medium
CN108597512A (en) * 2018-04-27 2018-09-28 努比亚技术有限公司 Method for controlling mobile terminal, mobile terminal and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2690174C (en) * 2009-01-13 2014-10-14 Crim (Centre De Recherche Informatique De Montreal) Identifying keyword occurrences in audio data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06161488A (en) * 1992-11-17 1994-06-07 Ricoh Co Ltd Speech recognizing device
CN101123428A (en) * 2006-08-09 2008-02-13 马昊 Intelligent electronic remote control switch for voice recognition capable of dynamic setting
CN101447185A (en) * 2008-12-08 2009-06-03 深圳市北科瑞声科技有限公司 Audio frequency rapid classification method based on content
CN201514761U (en) * 2009-09-23 2010-06-23 上海大屯能源股份有限公司 Household voice controller
CN107329843A (en) * 2017-06-30 2017-11-07 百度在线网络技术(北京)有限公司 Application program sound control method, device, equipment and storage medium
CN107492374A (en) * 2017-10-11 2017-12-19 深圳市汉普电子技术开发有限公司 A kind of sound control method, smart machine and storage medium
CN108182937A (en) * 2018-01-17 2018-06-19 出门问问信息科技有限公司 Keyword recognition method, device, equipment and storage medium
CN108597512A (en) * 2018-04-27 2018-09-28 努比亚技术有限公司 Method for controlling mobile terminal, mobile terminal and computer readable storage medium

Also Published As

Publication number Publication date
CN109584865A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN109584865B (en) Application program control method and device, readable storage medium and terminal equipment
CN109509470B (en) Voice interaction method and device, computer readable storage medium and terminal equipment
WO2020082560A1 (en) Method, apparatus and device for extracting text keyword, as well as computer readable storage medium
JP5853029B2 (en) Passphrase modeling device and method for speaker verification, and speaker verification system
Wang et al. Acoustic segment modeling with spectral clustering methods
US7475013B2 (en) Speaker recognition using local models
CN109360572B (en) Call separation method and device, computer equipment and storage medium
US20120323560A1 (en) Method for symbolic correction in human-machine interfaces
WO2017084334A1 (en) Language recognition method, apparatus and device and computer storage medium
WO2021114841A1 (en) User report generating method and terminal device
WO2009101837A1 (en) Mark insertion device and mark insertion method
CN108538286A (en) A kind of method and computer of speech recognition
WO2017198031A1 (en) Semantic parsing method and apparatus
CN106202065B (en) Across the language topic detecting method of one kind and system
CN113314119B (en) Voice recognition intelligent household control method and device
KR101677859B1 (en) Method for generating system response using knowledgy base and apparatus for performing the method
CN112151015A (en) Keyword detection method and device, electronic equipment and storage medium
Harwath et al. Zero resource spoken audio corpus analysis
CN112669842A (en) Man-machine conversation control method, device, computer equipment and storage medium
CN111400489B (en) Dialog text abstract generating method and device, electronic equipment and storage medium
Chien Association pattern language modeling
JP6910002B2 (en) Dialogue estimation method, dialogue activity estimation device and program
EP4024393A2 (en) Training a speech recognition model
CN108899016B (en) Voice text normalization method, device and equipment and readable storage medium
CN114023336A (en) Model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant