CN113066486B - Data identification method, device, electronic equipment and computer readable storage medium - Google Patents

Data identification method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113066486B
CN113066486B CN202110319650.2A CN202110319650A CN113066486B CN 113066486 B CN113066486 B CN 113066486B CN 202110319650 A CN202110319650 A CN 202110319650A CN 113066486 B CN113066486 B CN 113066486B
Authority
CN
China
Prior art keywords
recognition
model
target
alternative
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110319650.2A
Other languages
Chinese (zh)
Other versions
CN113066486A (en
Inventor
李森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jinyun Zhilian Technology Co ltd
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202110319650.2A priority Critical patent/CN113066486B/en
Publication of CN113066486A publication Critical patent/CN113066486A/en
Application granted granted Critical
Publication of CN113066486B publication Critical patent/CN113066486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a data identification method, a device, electronic equipment and a computer readable storage medium, and relates to the technical field of deep learning, wherein the method comprises the following steps: acquiring voice data to be recognized of a current recognition scene; inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different. Based on the processing, even if the sample voice data of the current recognition scene is less, a recognition model with higher precision can be obtained, and the recognition accuracy is improved.

Description

Data identification method, device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of deep learning technologies, and in particular, to a data identification method, apparatus, electronic device, and computer readable storage medium.
Background
With the rapid development of computer technology, deep learning technology is widely used in various aspects. Based on the deep learning, different data may be identified, for example, speech data, image data, and the like.
In the related art, for a certain speech recognition scenario, a recognition model of an initial structure may be trained based on a speech training set corresponding to the recognition scenario, where the training set may include sample speech data. When convergence is achieved, the voice data to be recognized corresponding to the recognition scene can be recognized based on the trained recognition model. For example, the recognition scene may be a scene for recognizing a host language, or may be a scene for recognizing a Tibetan language.
However, in the above-described process, in order to improve the accuracy of recognition, a large amount of sample voice data needs to be acquired, that is, if the sample voice data of the current recognition scene is small, a recognition model with higher accuracy cannot be obtained, resulting in low accuracy of recognition.
Disclosure of Invention
An object of an embodiment of the present application is to provide a data identification method, apparatus, electronic device, and computer-readable storage medium, which can improve accuracy of identification. The specific technical scheme is as follows:
in order to achieve the above object, an embodiment of the present application discloses a data identification method, including:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
Optionally, the process for constructing the target recognition model includes:
calculating a weighted sum of alternative model parameters of each alternative identification model based on preset weights, and taking the weighted sum as a target model parameter;
and determining a target recognition model based on the target model parameters.
Optionally, the determining the target recognition model based on the target model parameters includes:
and determining a model with the target model parameters as a target recognition model.
Optionally, the determining the target recognition model based on the target model parameters includes:
determining a model with the target model parameters as an identification model to be corrected;
and training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
Optionally, the process of obtaining the candidate model parameters of the plurality of candidate recognition models includes:
for each alternative identification scene, acquiring sample data corresponding to the alternative identification scene;
dividing sample data corresponding to the alternative recognition scene into a plurality of sample data sets according to the number of a plurality of preset recognition tasks corresponding to the alternative recognition scene, wherein the sample data are respectively used as the sample data sets corresponding to the preset recognition tasks;
training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
when the number of the rounds of training reaches a preset number, model parameters of the initial recognition model after training are obtained and are used as model parameters to be processed;
based on the model parameter difference value corresponding to each preset recognition task, the original model parameters of the initial recognition model are adjusted to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
In a second aspect, to achieve the above object, an embodiment of the present application discloses a data identification device, including:
the voice data to be recognized acquisition module is used for acquiring voice data to be recognized of the current recognition scene;
the recognition module is used for inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
Optionally, the apparatus further includes:
the target model parameter acquisition module is used for calculating the weighted sum of the candidate model parameters of each candidate recognition model based on the preset weight to serve as the target model parameters;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
Optionally, the object recognition model obtaining module is specifically configured to determine a model with the object model parameters as an object recognition model.
Optionally, the target recognition model acquisition module includes:
the to-be-corrected identification model acquisition module is used for determining a model with the target model parameters as the to-be-corrected identification model;
the target recognition model acquisition sub-module is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
Optionally, the apparatus further includes:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene aiming at each alternative identification scene;
the sample data set acquisition module is used for dividing sample data corresponding to the alternative identification scene into a plurality of sample data sets according to the number of a plurality of preset identification tasks corresponding to the alternative identification scene, and the plurality of sample data sets are respectively used as sample data sets corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
the to-be-processed model parameter acquisition module is used for acquiring the model parameters of the initial recognition model after training when the number of rounds of training reaches a preset number, and taking the model parameters as to-be-processed model parameters;
the alternative model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain alternative model parameters of an alternative recognition model corresponding to the alternative recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
On the other hand, in order to achieve the above object, the embodiment of the application also discloses an electronic device, which includes a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to implement the data identification method according to the first aspect when executing the program stored in the memory.
On the other hand, in order to achieve the above object, an embodiment of the present application further discloses a computer readable storage medium, in which a computer program is stored, the computer program implementing the data identification method according to the first aspect when being executed by a processor.
In order to achieve the above object, on the other hand, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the data identification method according to the first aspect is also disclosed.
The embodiment of the application provides a data identification method, which is used for acquiring voice data to be identified of a current identification scene; inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized; the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
Based on the meta learning mode, the alternative recognition model obtained by training the sample data of the alternative recognition scene can be suitable for other recognition scenes. Furthermore, the target recognition model determined by combining the candidate model parameters of each of the plurality of candidate recognition models can effectively recognize the voice data to be recognized of the current recognition scene without training based on the sample voice data of the current recognition scene, so that even if the sample voice data of the current recognition scene is less, the target recognition model with higher precision can be obtained, and the recognition accuracy is improved.
Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a data identification method provided in an embodiment of the present application;
FIG. 2 is a flowchart of generating a target recognition model during data recognition according to an embodiment of the present application;
FIG. 3 is a flowchart of another method for generating a target recognition model during data recognition according to an embodiment of the present application;
FIG. 4 is a flowchart of another method for generating a target recognition model during data recognition according to an embodiment of the present application;
FIG. 5 is a flowchart of generating alternative model parameters for data identification according to an embodiment of the present application;
fig. 6 is a block diagram of a data identification device according to an embodiment of the present application;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the related art, in order to improve the accuracy of recognition, a large amount of sample data needs to be acquired, that is, if the sample data of the current recognition scene is less, a recognition model with higher accuracy cannot be obtained, resulting in lower accuracy of recognition.
In order to solve the above-mentioned problems, an embodiment of the present application provides a data identification method, referring to fig. 1, fig. 1 is a flowchart of the data identification method provided in the embodiment of the present application, where the method may include the following steps:
s101: and acquiring voice data to be recognized of the current recognition scene.
S102: and inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized.
The recognition result comprises text information contained in the voice data to be recognized; the model parameters of the target recognition model are as follows: based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance. The plurality of alternative recognition models are identical to the model structure of the target recognition model. The multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
According to the data identification method provided by the embodiment of the application, based on a meta-learning mode, the alternative identification model obtained by training the sample data of the alternative identification scene can be suitable for other identification scenes. Furthermore, the target recognition model determined by combining the candidate model parameters of each of the plurality of candidate recognition models can effectively recognize the voice data to be recognized of the current recognition scene without training based on the sample voice data of the current recognition scene, so that even if the sample voice data of the current recognition scene is less, the target recognition model with higher precision can be obtained, and the recognition accuracy is improved.
For step S101, the current recognition scene may be a scene that recognizes voice data, for example, the current recognition scene may be a scene that recognizes a speech of a in a v. Alternatively, the current recognition scene may be data for recognizing Tibetan language. The alternative recognition scene is different from the current recognition scene, that is, the alternative recognition scene is different from sample data corresponding to the current recognition scene. The alternative recognition scene may be a scene that recognizes voice data, or a scene that recognizes image data, but is not limited thereto.
In one embodiment, referring to FIG. 2, the process of constructing the object recognition model may include the steps of:
s201: and calculating a weighted sum of the candidate model parameters of each candidate recognition model based on the preset weight, and taking the weighted sum as a target model parameter.
S202: based on the target model parameters, a target recognition model is determined.
The preset weights may be set empirically by a technician.
In the embodiment of the application, after the candidate recognition models are obtained through training based on the corresponding sample data in advance, the model parameters (i.e., the candidate model parameters) of each candidate recognition model may be obtained.
Furthermore, a weighted sum of candidate model parameters of each candidate recognition model can be calculated according to preset weights to obtain target model parameters, and the target recognition model is determined based on the target model parameters.
It will be appreciated that the model structures of the respective alternative recognition models are identical, and that the alternative model parameters of each alternative recognition model may be plural. Thus, when calculating the weighted sum of the candidate model parameters of the respective candidate recognition models, for each of the candidate model parameters, the weighted sum of the values of the model parameters in the respective candidate recognition models may be calculated, and further, the target model parameters may be obtained.
In one embodiment, referring to fig. 3, the step S202 may include the following steps:
s2021: a model with target model parameters is determined as a target recognition model.
In this embodiment of the present application, after determining the target model parameter, the model with the target model parameter may be directly determined as the target recognition model, that is, the model parameter of the target recognition model is the target model parameter.
In one embodiment, in order to further improve the accuracy of the object recognition model, referring to fig. 4, the step S202 may include the following steps:
s2022: and determining a model with the target model parameters as the identification model to be corrected.
S2023: training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
In the embodiment of the present application, after determining the target model parameter, a model with the target model parameter may be determined as the identification model to be corrected, that is, the model parameter of the identification model to be corrected is the target model parameter.
Then, sample voice data of the current recognition scene can be obtained, and training is carried out on the recognition model to be corrected according to the obtained sample voice data until convergence, so that the target recognition model is obtained.
Based on the processing, only sample voice data with fewer current recognition scenes are needed, so that the target recognition model can be converged, the target recognition model with higher recognition accuracy is obtained, and the recognition accuracy can be improved.
In one embodiment, referring to fig. 5, the acquisition process of the alternative model parameters may include the steps of:
s501: and acquiring sample data corresponding to each alternative identification scene according to each alternative identification scene.
S502: sample data corresponding to the alternative recognition scene is divided into a plurality of sample data sets according to the number of a plurality of preset recognition tasks corresponding to the alternative recognition scene, and the sample data sets are respectively used as sample data sets corresponding to the preset recognition tasks.
S503: and training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task.
S504: and when the number of the rounds of training reaches the preset number, acquiring the model parameters of the initial recognition model after training as the model parameters to be processed.
S505: and adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene.
The model parameter difference value corresponding to one preset recognition task is represented by the following formula: the difference value between the to-be-processed model parameters corresponding to the preset recognition task and the original model parameters.
In the embodiment of the application, for each alternative identification scene, corresponding sample data of the alternative identification scene may be acquired. Then, the sample data can be divided according to the number of a plurality of preset recognition tasks corresponding to the candidate recognition scene, so as to obtain a plurality of sample data sets.
Each sample data set corresponds to a preset recognition task, and further, for each preset recognition task, the initial recognition model of the preset model structure can be trained based on the corresponding sample data set to obtain corresponding model parameters to be processed.
For example, for a speech recognition scenario, the preset recognition task may be a speech keyword recognition task, a continuous speech recognition task, an isolated word recognition task, or the like; for the scene of image recognition, the preset recognition task may be a target detection task, a gesture recognition task, or the like.
After the initial recognition model is trained based on each sample data set, the original model parameters of the initial recognition model can be adjusted by combining the to-be-processed model parameters corresponding to each preset recognition task, and the alternative model parameters are obtained, so that the recognition model with the alternative model parameters can effectively recognize the voice data corresponding to the alternative recognition scene, and further, the recognition accuracy of the target recognition model can be improved.
In one implementation, the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene may be calculated based on a preset formula.
Wherein, the preset formula is:
Figure BDA0002992297520000091
θ' represents an alternative model parameter of an alternative recognition model corresponding to the alternative recognition scene, θ represents an original model parameter of an initial recognition model, α represents a learning rate, m represents the number of a plurality of preset recognition tasks,
Figure BDA0002992297520000092
and representing model parameters obtained by training K rounds of the initial recognition model based on the corresponding sample data set aiming at the ith preset recognition task.
In one embodiment, the initial recognition model of the preset model structure may include: input layer, convolution layer 1, convolution layer 2, convolution layer 3, convolution layer 4, full connection layer 1, and full connection layer 2.
Each of the above convolution layers may include 64 convolution kernels of 3×3, and the convolution kernels perform convolution processing with a step size of 2. After BN (Batch Normalization ) processing of the convolution results of the convolution layers, the activation is performed using the Relu function. The fully connected layer 1 and the fully connected layer 2 are both composed of one neuron, and the activation function is sigmoid.
Alternative recognition scenarios include: a scene of voice recognition based on the Van-language isolated word data set, a scene of voice recognition based on the time isolated word data set, and a scene of voice recognition based on the daily voice data set recorded by the user.
The number of preset recognition tasks based on the isolated word data set of the Va language is 5, the number of preset recognition tasks based on the isolated word data set of the time is 5, and the number of preset recognition tasks based on the daily voice data set recorded by a user is 5.
Then, the data set of the Va isolated words can be divided into 5 sample data sets, for each preset recognition task, an initial recognition model of a preset model structure is trained based on the corresponding sample data set, and further, based on the preset formula, candidate model parameters corresponding to a scene for performing voice recognition based on the data set of the Va isolated words can be calculated
Figure BDA0002992297520000102
Specifically, the number of iterations during training may be 500, the optimizer for adjusting the model parameters during training may use SGD (Stochastic Gradient Descent, random gradient descent), and m= 5,K =20, α=0.95 in the above preset formula.
Similarly, the time isolated word data set can be divided into 5 sample data sets, the initial recognition model of the preset model structure can be trained based on the corresponding sample data set aiming at each preset recognition task, and further, the alternative model parameters corresponding to the scene for performing voice recognition based on the time isolated word data set can be calculated based on the preset formula
Figure BDA0002992297520000103
Specifically, the number of iterations during training may be 1500, the optimizer for adjusting the model parameters during training may use SGD, and m= 5,K =15, α=0.95 in the above preset formula.
In the same way, the daily voice data recorded by the user can be recordedThe set is divided into 5 sample data sets, for each preset recognition task, the initial recognition model of the preset model structure is trained based on the corresponding sample data set, and further, based on the preset formula, the alternative model parameters corresponding to the scene of voice recognition based on the daily voice data set recorded by the user can be calculated
Figure BDA0002992297520000104
Specifically, the number of iterations during training may be 2000, the optimizer for adjusting the model parameters during training may use SGD, and m= 5,K =25, and α=1 in the above preset formula.
The target model parameters may then be calculated based on equation (1).
Figure BDA0002992297520000101
Wherein A, B, C each represents a weight, θ * Representing the target model parameters. For example, A may be 0.3, B may be 0.3, and C may be 0.4.
With target model parameters theta * The target recognition model of the scene recognition model can be used for recognizing voice data of other scenes; alternatively, the number A, B, C may be adjusted, and the obtained object recognition model may be used for other types of recognition tasks, for example, to recognize image data.
Based on the same inventive concept, the embodiment of the present application further provides a data identification device, referring to fig. 6, fig. 6 is a structural diagram of the data identification device provided in the embodiment of the present application, where the device includes:
the to-be-recognized voice data obtaining module 601 is configured to obtain to-be-recognized voice data of a current recognition scene;
the recognition module 602 is configured to input the voice data to be recognized into a pre-constructed target recognition model, so as to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
In one embodiment, the apparatus further comprises:
the target model parameter acquisition module is used for calculating the weighted sum of the candidate model parameters of each candidate recognition model based on the preset weight to serve as the target model parameters;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
In one embodiment, the object recognition model obtaining module is specifically configured to determine a model with the object model parameters as an object recognition model.
In one embodiment, the object recognition model acquisition module includes:
the to-be-corrected identification model acquisition module is used for determining a model with the target model parameters as the to-be-corrected identification model;
the target recognition model acquisition sub-module is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
In one embodiment, the apparatus further comprises:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene aiming at each alternative identification scene;
the sample data set acquisition module is used for dividing sample data corresponding to the alternative identification scene into a plurality of sample data sets according to the number of a plurality of preset identification tasks corresponding to the alternative identification scene, and the plurality of sample data sets are respectively used as sample data sets corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
the to-be-processed model parameter acquisition module is used for acquiring the model parameters of the initial recognition model after training when the number of rounds of training reaches a preset number, and taking the model parameters as to-be-processed model parameters;
the alternative model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain alternative model parameters of an alternative recognition model corresponding to the alternative recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
The embodiment of the application also provides an electronic device, as shown in fig. 7, including a memory 701 and a processor 702;
a memory 701 for storing a computer program;
the processor 702 is configured to implement the data identification method provided in the embodiment of the present application when executing the program stored in the memory 701.
Specifically, the data identification method includes:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different.
It should be noted that other implementation manners of the data identification method are partially the same as those of the foregoing method embodiment, and are not repeated here.
The electronic device may be provided with a communication interface for enabling communication between the electronic device and another device.
The processor, the communication interface and the memory perform communication with each other through a communication bus, where the communication bus may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus. The communication bus may be classified as an address bus, a data bus, a control bus, or the like.
The Memory may include random access Memory (Random Access Memory, RAM) or Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment provided herein, there is also provided a computer readable storage medium having stored therein a computer program which when executed by a processor implements the steps of any of the data identification methods described above.
In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the data identification methods of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, systems, electronic devices, computer readable storage media, and computer program product embodiments, the description is relatively simple as it is substantially similar to method embodiments, as relevant points are found in the partial description of method embodiments.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (12)

1. A method of data identification, the method comprising:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different, so that the target recognition model does not need to be trained based on sample voice data of the current recognition scene.
2. The method of claim 1, wherein the process of constructing the object recognition model comprises:
calculating a weighted sum of alternative model parameters of each alternative identification model based on preset weights, and taking the weighted sum as a target model parameter;
and determining a target recognition model based on the target model parameters.
3. The method of claim 2, wherein the determining a target recognition model based on the target model parameters comprises:
and determining a model with the target model parameters as a target recognition model.
4. The method of claim 2, wherein the determining a target recognition model based on the target model parameters comprises:
determining a model with the target model parameters as an identification model to be corrected;
and training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
5. The method of claim 1, wherein the process of obtaining the candidate model parameters for the plurality of candidate recognition models comprises:
for each alternative identification scene, acquiring sample data corresponding to the alternative identification scene;
dividing sample data corresponding to the alternative recognition scene into a plurality of sample data sets according to the number of a plurality of preset recognition tasks corresponding to the alternative recognition scene, wherein the sample data are respectively used as the sample data sets corresponding to the preset recognition tasks;
training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
when the number of the rounds of training reaches a preset number, model parameters of the initial recognition model after training are obtained and are used as model parameters to be processed;
based on the model parameter difference value corresponding to each preset recognition task, the original model parameters of the initial recognition model are adjusted to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
6. A data recognition device, the device comprising:
the voice data to be recognized acquisition module is used for acquiring voice data to be recognized of the current recognition scene;
the recognition module is used for inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
the model parameters of the target recognition model are as follows: determining based on the respective candidate model parameters of the plurality of candidate recognition models trained in advance; the plurality of alternative recognition models have the same model structure as the target recognition model; the multiple alternative recognition models are obtained by training in a meta-learning mode based on sample data of alternative recognition scenes except the current recognition scene, and training sets adopted by the multiple alternative recognition models are different, so that the target recognition model does not need to be trained based on sample voice data of the current recognition scene.
7. The apparatus of claim 6, wherein the apparatus further comprises:
the target model parameter acquisition module is used for calculating the weighted sum of the candidate model parameters of each candidate recognition model based on the preset weight to serve as the target model parameters;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
8. The apparatus according to claim 7, wherein the object recognition model acquisition module is configured to determine a model having the object model parameters as an object recognition model.
9. The apparatus of claim 7, wherein the object recognition model acquisition module comprises:
the to-be-corrected identification model acquisition module is used for determining a model with the target model parameters as the to-be-corrected identification model;
the target recognition model acquisition sub-module is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
10. The apparatus of claim 6, wherein the apparatus further comprises:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene aiming at each alternative identification scene;
the sample data set acquisition module is used for dividing sample data corresponding to the alternative identification scene into a plurality of sample data sets according to the number of a plurality of preset identification tasks corresponding to the alternative identification scene, and the plurality of sample data sets are respectively used as sample data sets corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of the preset model structure based on the corresponding sample data set aiming at each preset recognition task;
the to-be-processed model parameter acquisition module is used for acquiring the model parameters of the initial recognition model after training when the number of rounds of training reaches a preset number, and taking the model parameters as to-be-processed model parameters;
the alternative model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain alternative model parameters of an alternative recognition model corresponding to the alternative recognition scene; the model parameter difference value corresponding to one preset recognition task is represented by the following formula: and the difference value between the to-be-processed model parameters corresponding to the preset identification task and the original model parameters.
11. An electronic device comprising a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to implement the method steps of any one of claims 1-5 when executing a program stored on the memory.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.
CN202110319650.2A 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium Active CN113066486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110319650.2A CN113066486B (en) 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110319650.2A CN113066486B (en) 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113066486A CN113066486A (en) 2021-07-02
CN113066486B true CN113066486B (en) 2023-06-09

Family

ID=76561818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110319650.2A Active CN113066486B (en) 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113066486B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026359A (en) * 1996-09-20 2000-02-15 Nippon Telegraph And Telephone Corporation Scheme for model adaptation in pattern recognition based on Taylor expansion
CN111797854B (en) * 2019-04-09 2023-12-15 Oppo广东移动通信有限公司 Scene model building method and device, storage medium and electronic equipment
CN112434717B (en) * 2019-08-26 2024-03-08 杭州海康威视数字技术股份有限公司 Model training method and device
CN110675864A (en) * 2019-09-12 2020-01-10 上海依图信息技术有限公司 Voice recognition method and device
CN111508479B (en) * 2020-04-16 2022-11-22 重庆农村商业银行股份有限公司 Voice recognition method, device, equipment and storage medium
CN111613212B (en) * 2020-05-13 2023-10-31 携程旅游信息技术(上海)有限公司 Speech recognition method, system, electronic device and storage medium
CN112489637B (en) * 2020-11-03 2024-03-26 北京百度网讯科技有限公司 Speech recognition method and device

Also Published As

Publication number Publication date
CN113066486A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN109376615B (en) Method, device and storage medium for improving prediction performance of deep learning network
CN108197652B (en) Method and apparatus for generating information
CN110046706B (en) Model generation method and device and server
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
CN111414987A (en) Training method and training device for neural network and electronic equipment
CN112948612B (en) Human body cover generation method and device, electronic equipment and storage medium
CN111027428A (en) Training method and device of multi-task model and electronic equipment
CN112017777B (en) Method and device for predicting similar pair problem and electronic equipment
CN111178364A (en) Image identification method and device
CN114419378B (en) Image classification method and device, electronic equipment and medium
CN111930859A (en) Node processing method, device and equipment based on heterogeneous graph neural network
CN113449840A (en) Neural network training method and device and image classification method and device
CN117057443B (en) Prompt learning method of visual language model and electronic equipment
CN111178082A (en) Sentence vector generation method and device and electronic equipment
CN113011532B (en) Classification model training method, device, computing equipment and storage medium
CN113066486B (en) Data identification method, device, electronic equipment and computer readable storage medium
CN117173269A (en) Face image generation method and device, electronic equipment and storage medium
US7457788B2 (en) Reducing number of computations in a neural network modeling several data sets
CN106297807A (en) The method and apparatus of training Voiceprint Recognition System
CN113269259B (en) Target information prediction method and device
CN114970732A (en) Posterior calibration method and device for classification model, computer equipment and medium
CN111582456B (en) Method, apparatus, device and medium for generating network model information
CN112906909A (en) Deep learning model training method and device, electronic equipment and storage medium
CN113420699A (en) Face matching method and device and electronic equipment
CN113378561A (en) Word prediction template generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240527

Address after: No.006, 6th floor, building 4, No.33 yard, middle Xierqi Road, Haidian District, Beijing 100085

Patentee after: BEIJING KINGSOFT CLOUD NETWORK TECHNOLOGY Co.,Ltd.

Country or region after: China

Patentee after: Wuxi Jinyun Zhilian Technology Co.,Ltd.

Address before: No.006, 6th floor, building 4, No.33 yard, middle Xierqi Road, Haidian District, Beijing 100085

Patentee before: BEIJING KINGSOFT CLOUD NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right