Disclosure of Invention
An object of an embodiment of the present application is to provide a data identification method, apparatus, electronic device, and computer-readable storage medium, which can improve accuracy of identification. The specific technical scheme is as follows:
in order to achieve the above object, an embodiment of the present application discloses a data identification method, including:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
Optionally, the process for constructing the target recognition model includes:
calculating a weighted sum of alternative model parameters of each alternative identification model based on preset weights, and taking the weighted sum as a target model parameter;
and determining a target recognition model based on the target model parameters.
Optionally, the determining the target recognition model based on the target model parameters includes:
and determining a model with the target model parameters as a target recognition model.
Optionally, the determining the target recognition model based on the target model parameters includes:
determining a model with the target model parameters as an identification model to be corrected;
and training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
Optionally, the process of obtaining the candidate model parameters of the plurality of candidate recognition models includes:
for each alternative identification scene, acquiring sample data corresponding to the alternative identification scene;
dividing sample data corresponding to the alternative recognition scene into a plurality of sample data sets according to the number of a plurality of preset recognition tasks corresponding to the alternative recognition scene, wherein the sample data are respectively used as the sample data sets corresponding to the preset recognition tasks;
training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
when the number of the rounds of training reaches a preset number, model parameters of the initial recognition model after training are obtained and are used as model parameters to be processed;
based on the model parameter difference value corresponding to each preset recognition task, the original model parameters of the initial recognition model are adjusted to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
In a second aspect, to achieve the above object, an embodiment of the present application discloses a data identification device, including:
the voice data to be recognized acquisition module is used for acquiring voice data to be recognized of the current recognition scene;
the recognition module is used for inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
Optionally, the apparatus further includes:
the target model parameter acquisition module is used for calculating the weighted sum of the candidate model parameters of each candidate recognition model based on the preset weight to serve as the target model parameters;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
Optionally, the object recognition model obtaining module is specifically configured to determine a model with the object model parameters as an object recognition model.
Optionally, the target recognition model acquisition module includes:
the to-be-corrected identification model acquisition module is used for determining a model with the target model parameters as the to-be-corrected identification model;
the target recognition model acquisition sub-module is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
Optionally, the apparatus further includes:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene aiming at each alternative identification scene;
the sample data set acquisition module is used for dividing sample data corresponding to the alternative identification scene into a plurality of sample data sets according to the number of a plurality of preset identification tasks corresponding to the alternative identification scene, and the plurality of sample data sets are respectively used as sample data sets corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
the to-be-processed model parameter acquisition module is used for acquiring the model parameters of the initial recognition model after training when the number of rounds of training reaches a preset number, and taking the model parameters as to-be-processed model parameters;
the alternative model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain alternative model parameters of an alternative recognition model corresponding to the alternative recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
On the other hand, in order to achieve the above object, the embodiment of the application also discloses an electronic device, which includes a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to implement the data identification method according to the first aspect when executing the program stored in the memory.
On the other hand, in order to achieve the above object, an embodiment of the present application further discloses a computer readable storage medium, in which a computer program is stored, the computer program implementing the data identification method according to the first aspect when being executed by a processor.
In order to achieve the above object, on the other hand, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the data identification method according to the first aspect is also disclosed.
The embodiment of the application provides a data identification method, which is used for acquiring voice data to be identified of a current identification scene; inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized; the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
Based on the meta learning mode, the alternative recognition model obtained by training the sample data of the alternative recognition scene can be suitable for other recognition scenes. Furthermore, the target recognition model determined by combining the candidate model parameters of each of the plurality of candidate recognition models can effectively recognize the voice data to be recognized of the current recognition scene without training based on the sample voice data of the current recognition scene, so that even if the sample voice data of the current recognition scene is less, the target recognition model with higher precision can be obtained, and the recognition accuracy is improved.
Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the related art, in order to improve the accuracy of recognition, a large amount of sample data needs to be acquired, that is, if the sample data of the current recognition scene is less, a recognition model with higher accuracy cannot be obtained, resulting in lower accuracy of recognition.
In order to solve the above-mentioned problems, an embodiment of the present application provides a data identification method, referring to fig. 1, fig. 1 is a flowchart of the data identification method provided in the embodiment of the present application, where the method may include the following steps:
s101: and acquiring voice data to be recognized of the current recognition scene.
S102: and inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized.
The recognition result comprises text information contained in the voice data to be recognized; the model parameters of the target recognition model are as follows: based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance. The plurality of alternative recognition models are identical to the model structure of the target recognition model. The multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
According to the data identification method provided by the embodiment of the application, based on a meta-learning mode, the alternative identification model obtained by training the sample data of the alternative identification scene can be suitable for other identification scenes. Furthermore, the target recognition model determined by combining the candidate model parameters of each of the plurality of candidate recognition models can effectively recognize the voice data to be recognized of the current recognition scene without training based on the sample voice data of the current recognition scene, so that even if the sample voice data of the current recognition scene is less, the target recognition model with higher precision can be obtained, and the recognition accuracy is improved.
For step S101, the current recognition scene may be a scene that recognizes voice data, for example, the current recognition scene may be a scene that recognizes a speech of a in a v. Alternatively, the current recognition scene may be data for recognizing Tibetan language. The alternative recognition scene is different from the current recognition scene, that is, the alternative recognition scene is different from sample data corresponding to the current recognition scene. The alternative recognition scene may be a scene that recognizes voice data, or a scene that recognizes image data, but is not limited thereto.
In one embodiment, referring to FIG. 2, the process of constructing the object recognition model may include the steps of:
s201: and calculating a weighted sum of the candidate model parameters of each candidate recognition model based on the preset weight, and taking the weighted sum as a target model parameter.
S202: based on the target model parameters, a target recognition model is determined.
The preset weights may be set empirically by a technician.
In the embodiment of the application, after the candidate recognition models are obtained through training based on the corresponding sample data in advance, the model parameters (i.e., the candidate model parameters) of each candidate recognition model may be obtained.
Furthermore, a weighted sum of candidate model parameters of each candidate recognition model can be calculated according to preset weights to obtain target model parameters, and the target recognition model is determined based on the target model parameters.
It will be appreciated that the model structures of the respective alternative recognition models are identical, and that the alternative model parameters of each alternative recognition model may be plural. Thus, when calculating the weighted sum of the candidate model parameters of the respective candidate recognition models, for each of the candidate model parameters, the weighted sum of the values of the model parameters in the respective candidate recognition models may be calculated, and further, the target model parameters may be obtained.
In one embodiment, referring to fig. 3, the step S202 may include the following steps:
s2021: a model with target model parameters is determined as a target recognition model.
In this embodiment of the present application, after determining the target model parameter, the model with the target model parameter may be directly determined as the target recognition model, that is, the model parameter of the target recognition model is the target model parameter.
In one embodiment, in order to further improve the accuracy of the object recognition model, referring to fig. 4, the step S202 may include the following steps:
s2022: and determining a model with the target model parameters as the identification model to be corrected.
S2023: training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
In the embodiment of the present application, after determining the target model parameter, a model with the target model parameter may be determined as the identification model to be corrected, that is, the model parameter of the identification model to be corrected is the target model parameter.
Then, sample voice data of the current recognition scene can be obtained, and training is carried out on the recognition model to be corrected according to the obtained sample voice data until convergence, so that the target recognition model is obtained.
Based on the processing, only sample voice data with fewer current recognition scenes are needed, so that the target recognition model can be converged, the target recognition model with higher recognition accuracy is obtained, and the recognition accuracy can be improved.
In one embodiment, referring to fig. 5, the acquisition process of the alternative model parameters may include the steps of:
s501: and acquiring sample data corresponding to each alternative identification scene according to each alternative identification scene.
S502: sample data corresponding to the alternative recognition scene is divided into a plurality of sample data sets according to the number of a plurality of preset recognition tasks corresponding to the alternative recognition scene, and the sample data sets are respectively used as sample data sets corresponding to the preset recognition tasks.
S503: and training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task.
S504: and when the number of the rounds of training reaches the preset number, acquiring the model parameters of the initial recognition model after training as the model parameters to be processed.
S505: and adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene.
The model parameter difference value corresponding to one preset recognition task is represented by the following formula: the difference value between the to-be-processed model parameters corresponding to the preset recognition task and the original model parameters.
In the embodiment of the application, for each alternative identification scene, corresponding sample data of the alternative identification scene may be acquired. Then, the sample data can be divided according to the number of a plurality of preset recognition tasks corresponding to the candidate recognition scene, so as to obtain a plurality of sample data sets.
Each sample data set corresponds to a preset recognition task, and further, for each preset recognition task, the initial recognition model of the preset model structure can be trained based on the corresponding sample data set to obtain corresponding model parameters to be processed.
For example, for a speech recognition scenario, the preset recognition task may be a speech keyword recognition task, a continuous speech recognition task, an isolated word recognition task, or the like; for the scene of image recognition, the preset recognition task may be a target detection task, a gesture recognition task, or the like.
After the initial recognition model is trained based on each sample data set, the original model parameters of the initial recognition model can be adjusted by combining the to-be-processed model parameters corresponding to each preset recognition task, and the alternative model parameters are obtained, so that the recognition model with the alternative model parameters can effectively recognize the voice data corresponding to the alternative recognition scene, and further, the recognition accuracy of the target recognition model can be improved.
In one implementation, the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene may be calculated based on a preset formula.
Wherein, the preset formula is:
θ' represents an alternative model parameter of an alternative recognition model corresponding to the alternative recognition scene, θ represents an original model parameter of an initial recognition model, α represents a learning rate, m represents the number of a plurality of preset recognition tasks,
and representing model parameters obtained by training K rounds of the initial recognition model based on the corresponding sample data set aiming at the ith preset recognition task.
In one embodiment, the initial recognition model of the preset model structure may include: input layer, convolution layer 1, convolution layer 2, convolution layer 3, convolution layer 4, full connection layer 1, and full connection layer 2.
Each of the above convolution layers may include 64 convolution kernels of 3×3, and the convolution kernels perform convolution processing with a step size of 2. After BN (Batch Normalization ) processing of the convolution results of the convolution layers, the activation is performed using the Relu function. The fully connected layer 1 and the fully connected layer 2 are both composed of one neuron, and the activation function is sigmoid.
Alternative recognition scenarios include: a scene of voice recognition based on the Van-language isolated word data set, a scene of voice recognition based on the time isolated word data set, and a scene of voice recognition based on the daily voice data set recorded by the user.
The number of preset recognition tasks based on the isolated word data set of the Va language is 5, the number of preset recognition tasks based on the isolated word data set of the time is 5, and the number of preset recognition tasks based on the daily voice data set recorded by a user is 5.
Then, the data set of the Va isolated words can be divided into 5 sample data sets, for each preset recognition task, an initial recognition model of a preset model structure is trained based on the corresponding sample data set, and further, based on the preset formula, candidate model parameters corresponding to a scene for performing voice recognition based on the data set of the Va isolated words can be calculated
Specifically, the number of iterations during training may be 500, the optimizer for adjusting the model parameters during training may use SGD (Stochastic Gradient Descent, random gradient descent), and m= 5,K =20, α=0.95 in the above preset formula.
Similarly, the time isolated word data set can be divided into 5 sample data sets, the initial recognition model of the preset model structure can be trained based on the corresponding sample data set aiming at each preset recognition task, and further, the alternative model parameters corresponding to the scene for performing voice recognition based on the time isolated word data set can be calculated based on the preset formula
Specifically, the number of iterations during training may be 1500, the optimizer for adjusting the model parameters during training may use SGD, and m= 5,K =15, α=0.95 in the above preset formula.
In the same way, the daily voice data recorded by the user can be recordedThe set is divided into 5 sample data sets, for each preset recognition task, the initial recognition model of the preset model structure is trained based on the corresponding sample data set, and further, based on the preset formula, the alternative model parameters corresponding to the scene of voice recognition based on the daily voice data set recorded by the user can be calculated
Specifically, the number of iterations during training may be 2000, the optimizer for adjusting the model parameters during training may use SGD, and m= 5,K =25, and α=1 in the above preset formula.
The target model parameters may then be calculated based on equation (1).
Wherein A, B, C each represents a weight, θ * Representing the target model parameters. For example, A may be 0.3, B may be 0.3, and C may be 0.4.
With target model parameters theta * The target recognition model of the scene recognition model can be used for recognizing voice data of other scenes; alternatively, the number A, B, C may be adjusted, and the obtained object recognition model may be used for other types of recognition tasks, for example, to recognize image data.
Based on the same inventive concept, the embodiment of the present application further provides a data identification device, referring to fig. 6, fig. 6 is a structural diagram of the data identification device provided in the embodiment of the present application, where the device includes:
the to-be-recognized voice data obtaining module 601 is configured to obtain to-be-recognized voice data of a current recognition scene;
the recognition module 602 is configured to input the voice data to be recognized into a pre-constructed target recognition model, so as to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
In one embodiment, the apparatus further comprises:
the target model parameter acquisition module is used for calculating the weighted sum of the candidate model parameters of each candidate recognition model based on the preset weight to serve as the target model parameters;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
In one embodiment, the object recognition model obtaining module is specifically configured to determine a model with the object model parameters as an object recognition model.
In one embodiment, the object recognition model acquisition module includes:
the to-be-corrected identification model acquisition module is used for determining a model with the target model parameters as the to-be-corrected identification model;
the target recognition model acquisition sub-module is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
In one embodiment, the apparatus further comprises:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene aiming at each alternative identification scene;
the sample data set acquisition module is used for dividing sample data corresponding to the alternative identification scene into a plurality of sample data sets according to the number of a plurality of preset identification tasks corresponding to the alternative identification scene, and the plurality of sample data sets are respectively used as sample data sets corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
the to-be-processed model parameter acquisition module is used for acquiring the model parameters of the initial recognition model after training when the number of rounds of training reaches a preset number, and taking the model parameters as to-be-processed model parameters;
the alternative model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain alternative model parameters of an alternative recognition model corresponding to the alternative recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
The embodiment of the application also provides an electronic device, as shown in fig. 7, including a memory 701 and a processor 702;
a memory 701 for storing a computer program;
the processor 702 is configured to implement the data identification method provided in the embodiment of the present application when executing the program stored in the memory 701.
Specifically, the data identification method includes:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
It should be noted that other implementation manners of the data identification method are partially the same as those of the foregoing method embodiment, and are not repeated here.
The electronic device may be provided with a communication interface for enabling communication between the electronic device and another device.
The processor, the communication interface and the memory perform communication with each other through a communication bus, where the communication bus may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus. The communication bus may be classified as an address bus, a data bus, a control bus, or the like.
The Memory may include random access Memory (Random Access Memory, RAM) or Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment provided herein, there is also provided a computer readable storage medium having stored therein a computer program which when executed by a processor implements the steps of any of the data identification methods described above.
In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the data identification methods of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, systems, electronic devices, computer readable storage media, and computer program product embodiments, the description is relatively simple as it is substantially similar to method embodiments, as relevant points are found in the partial description of method embodiments.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.