CN113066486A - Data identification method and device, electronic equipment and computer readable storage medium - Google Patents

Data identification method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113066486A
CN113066486A CN202110319650.2A CN202110319650A CN113066486A CN 113066486 A CN113066486 A CN 113066486A CN 202110319650 A CN202110319650 A CN 202110319650A CN 113066486 A CN113066486 A CN 113066486A
Authority
CN
China
Prior art keywords
model
recognition
candidate
target
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110319650.2A
Other languages
Chinese (zh)
Other versions
CN113066486B (en
Inventor
李森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jinyun Zhilian Technology Co ltd
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202110319650.2A priority Critical patent/CN113066486B/en
Publication of CN113066486A publication Critical patent/CN113066486A/en
Application granted granted Critical
Publication of CN113066486B publication Critical patent/CN113066486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a data identification method, a data identification device, electronic equipment and a computer readable storage medium, and relates to the technical field of deep learning, wherein the method comprises the following steps: acquiring voice data to be recognized of a current recognition scene; inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the model parameters of the target recognition model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the multiple candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of the candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different. Based on the processing, even if the sample voice data of the current recognition scene is less, a recognition model with higher precision can be obtained, and the recognition accuracy is improved.

Description

Data identification method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of deep learning technologies, and in particular, to a data identification method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
With the rapid development of computer technology, deep learning technology is widely applied in various aspects. Based on deep learning, different data may be identified, for example, speech data, image data, and the like.
In the related art, for a certain speech recognition scenario, a recognition model of an initial structure may be trained based on a speech training set corresponding to the recognition scenario, where the training set may include sample speech data. When convergence is achieved, the to-be-recognized voice data corresponding to the recognition scene can be recognized based on the trained recognition model. For example, the recognition scene may be a scene in which the fortress language is recognized, or may be a scene in which the Tibetan language is recognized.
However, in the above process, in order to improve the recognition accuracy, a large amount of sample voice data needs to be acquired, that is, if the sample voice data of the current recognition scene is less, a recognition model with higher accuracy cannot be obtained, resulting in lower recognition accuracy.
Disclosure of Invention
An object of the embodiments of the present application is to provide a data identification method, apparatus, electronic device and computer-readable storage medium, which can improve the accuracy of identification. The specific technical scheme is as follows:
in a first aspect, in order to achieve the above object, an embodiment of the present application discloses a data identification method, where the method includes:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
wherein the model parameters of the target identification model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the plurality of candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
Optionally, the process of constructing the target recognition model includes:
based on the preset weight, calculating the weighted sum of the candidate model parameters of each candidate recognition model as a target model parameter;
and determining a target recognition model based on the target model parameters.
Optionally, the determining a target recognition model based on the target model parameters includes:
and determining the model with the target model parameters as a target recognition model.
Optionally, the determining a target recognition model based on the target model parameters includes:
determining a model with the target model parameters as a recognition model to be corrected;
and training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
Optionally, the process of obtaining candidate model parameters of the multiple candidate recognition models includes:
acquiring sample data corresponding to each alternative identification scene;
dividing sample data corresponding to the candidate identification scene into a plurality of sample data groups according to the number of a plurality of preset identification tasks corresponding to the candidate identification scene, and respectively taking the sample data groups as the sample data groups corresponding to the plurality of preset identification tasks;
aiming at each preset recognition task, training an initial recognition model of a preset model structure based on a corresponding sample data set;
when the number of rounds of training reaches a preset number, acquiring model parameters of the initial recognition model after training as model parameters to be processed;
adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain alternative model parameters of the alternative recognition model corresponding to the alternative recognition scene; wherein, the model parameter difference value corresponding to one preset identification task represents: and the difference value between the model parameter to be processed corresponding to the preset identification task and the original model parameter.
In a second aspect, in order to achieve the above object, an embodiment of the present application discloses a data identification apparatus, including:
the voice data to be recognized acquiring module is used for acquiring the voice data to be recognized of the current recognition scene;
the recognition module is used for inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
wherein the model parameters of the target identification model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the plurality of candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
Optionally, the apparatus further comprises:
the target model parameter acquisition module is used for calculating the weighted sum of the alternative model parameters of each alternative recognition model as a target model parameter based on preset weight;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
Optionally, the target recognition model obtaining module is specifically configured to determine a model with the target model parameters as a target recognition model.
Optionally, the target recognition model obtaining module includes:
the identification model to be corrected acquisition module is used for determining a model with the target model parameters as the identification model to be corrected;
and the target recognition model obtaining submodule is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain the target recognition model.
Optionally, the apparatus further comprises:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene;
a sample data group acquisition module, configured to divide sample data corresponding to the candidate identification scene into a plurality of sample data groups according to the number of the plurality of preset identification tasks corresponding to the candidate identification scene, where the sample data groups are respectively used as sample data groups corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of a preset model structure based on the corresponding sample data group aiming at each preset recognition task;
the model parameter acquisition module to be processed is used for acquiring the model parameters of the initial recognition model after training as the model parameters to be processed when the number of rounds of training reaches the preset number;
the candidate model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene; wherein, the model parameter difference value corresponding to one preset identification task represents: and the difference value between the model parameter to be processed corresponding to the preset identification task and the original model parameter.
On the other hand, in order to achieve the above object, an embodiment of the present application further discloses an electronic device, which includes a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to implement the data identification method according to the first aspect when executing the program stored in the memory.
On the other hand, in order to achieve the above object, an embodiment of the present application further discloses a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the data identification method according to the first aspect.
On the other hand, in order to achieve the above object, an embodiment of the present application further discloses a computer program product containing instructions, which when run on a computer, causes the computer to execute the data identification method according to the first aspect.
The embodiment of the application provides a data identification method, which comprises the steps of obtaining voice data to be identified of a current identification scene; inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized; the model parameters of the target recognition model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the multiple candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of the candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
Based on the meta-learning mode, the candidate recognition model obtained by training the sample data of the candidate recognition scene can be suitable for other recognition scenes. Furthermore, the target recognition model determined by combining the candidate model parameters of the multiple candidate recognition models can effectively recognize the voice data to be recognized of the current recognition scene without training based on the sample voice data of the current recognition scene, so that even if the sample voice data of the current recognition scene is less, the target recognition model with higher precision can be obtained, and the recognition accuracy is improved.
Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a data identification method according to an embodiment of the present application;
FIG. 2 is a flow chart of generating a target recognition model in data recognition according to an embodiment of the present application;
FIG. 3 is a flow chart of another method for generating a target recognition model in data recognition according to an embodiment of the present application;
FIG. 4 is a flow chart of another method for generating a target recognition model in data recognition according to an embodiment of the present application;
FIG. 5 is a flow chart of generating candidate model parameters during data identification according to an embodiment of the present disclosure;
fig. 6 is a block diagram of a data recognition apparatus according to an embodiment of the present application;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the related art, in order to improve the accuracy of recognition, a large amount of sample data needs to be acquired, that is, if the sample data of the current recognition scene is less, a recognition model with higher accuracy cannot be obtained, so that the accuracy of recognition is lower.
In order to solve the above problem, an embodiment of the present application provides a data identification method, and referring to fig. 1, fig. 1 is a flowchart of the data identification method provided by the embodiment of the present application, and the method may include the following steps:
s101: and acquiring the voice data to be recognized of the current recognition scene.
S102: and inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized.
The recognition result comprises text information contained in the voice data to be recognized; the model parameters of the target recognition model are: determined based on the candidate model parameters of each of the plurality of candidate recognition models trained in advance. The plurality of candidate recognition models are identical in model structure to the target recognition model. The multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of the candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
The data identification method provided by the embodiment of the application is based on a meta-learning mode, so that the alternative identification model obtained by training sample data of the alternative identification scene can be suitable for other identification scenes. Furthermore, the target recognition model determined by combining the candidate model parameters of the multiple candidate recognition models can effectively recognize the voice data to be recognized of the current recognition scene without training based on the sample voice data of the current recognition scene, so that even if the sample voice data of the current recognition scene is less, the target recognition model with higher precision can be obtained, and the recognition accuracy is improved.
For step S101, the current recognition scene may be a scene that recognizes voice data, for example, the current recognition scene may be a scene that recognizes the fortress language. Alternatively, the current recognition scene may be data for recognizing the Tibetan language. The alternative recognition scenario is different from the current recognition scenario, that is, the sample data corresponding to the alternative recognition scenario is different from the sample data corresponding to the current recognition scenario. The candidate recognition scene may be a scene for recognizing voice data or a scene for recognizing image data, but is not limited thereto.
In one embodiment, referring to FIG. 2, the process of constructing the target recognition model may include the steps of:
s201: and calculating the weighted sum of the candidate model parameters of each candidate recognition model as the target model parameter based on the preset weight.
S202: based on the target model parameters, a target recognition model is determined.
Wherein the preset weight can be set by a technician according to experience.
In the embodiment of the present application, after candidate recognition models are obtained by training based on corresponding sample data in advance, model parameters (i.e., candidate model parameters) of each candidate recognition model may be obtained.
Furthermore, the weighted sum of the candidate model parameters of each candidate recognition model can be calculated according to the preset weight to obtain the target model parameter, and the target recognition model is determined based on the target model parameter.
It is understood that the model structures of the candidate recognition models are the same, and the candidate model parameters of each candidate recognition model can be multiple. Therefore, when calculating the weighted sum of the candidate model parameters of each candidate recognition model, for each model parameter in the candidate model parameters, the weighted sum of the numerical values of the model parameter in each candidate recognition model can be calculated, and further, the target model parameter can be obtained.
In one embodiment, referring to fig. 3, the step S202 may include the following steps:
s2021: a model having target model parameters is determined as a target recognition model.
In the embodiment of the present application, after the target model parameters are determined, the model with the target model parameters may be directly determined as the target recognition model, that is, the model parameters of the target recognition model are the target model parameters.
In one embodiment, in order to further improve the accuracy of the target recognition model, referring to fig. 4, the step S202 may include the following steps:
s2022: and determining the model with the target model parameters as the identification model to be corrected.
S2023: and training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain the target recognition model.
In the embodiment of the present application, after the target model parameters are determined, a model with the target model parameters may be determined as the recognition model to be corrected, that is, the model parameters of the recognition model to be corrected are the target model parameters.
Then, sample voice data of the current recognition scene can be obtained, and the recognition model to be corrected is trained according to the obtained sample voice data until convergence, so that the target recognition model is obtained.
Based on the processing, the target recognition model can be converged only by sample voice data with few current recognition scenes, so that the target recognition model with high recognition precision is obtained, and the recognition accuracy can be improved.
In one embodiment, referring to fig. 5, the process of obtaining the alternative model parameters may include the steps of:
s501: and acquiring sample data corresponding to each candidate identification scene.
S502: and dividing the sample data corresponding to the candidate identification scene into a plurality of sample data groups according to the number of a plurality of preset identification tasks corresponding to the candidate identification scene, and respectively using the sample data groups as the sample data groups corresponding to the plurality of preset identification tasks.
S503: and aiming at each preset recognition task, training an initial recognition model of a preset model structure based on the corresponding sample data set.
S504: and when the number of the training rounds reaches a preset number, obtaining the model parameters of the initial recognition model after training as the parameters of the model to be processed.
S505: and adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain the alternative model parameters of the alternative recognition model corresponding to the alternative recognition scene.
Wherein, the model parameter difference value corresponding to one preset identification task represents: and the difference value between the model parameter to be processed corresponding to the preset identification task and the original model parameter.
In the embodiment of the present application, for each candidate recognition scenario, sample data corresponding to the candidate recognition scenario may be acquired. Then, according to the number of the plurality of preset identification tasks corresponding to the candidate identification scene, sample data can be divided to obtain a plurality of sample data groups.
Each sample data set corresponds to a preset identification task, and further, for each preset identification task, the initial identification model of the preset model structure can be trained respectively based on the corresponding sample data set, so that corresponding model parameters to be processed are obtained.
For example, for a speech recognition scene, the preset recognition task may be a speech keyword recognition task, a continuous speech recognition task, an isolated word recognition task, or the like; for the scene of image recognition, the preset recognition task can be a target detection task, a gesture recognition task and the like.
After the initial recognition model is trained based on each sample data set, original model parameters of the initial recognition model can be adjusted by combining to-be-processed model parameters corresponding to each preset recognition task to obtain alternative model parameters, so that the recognition model with the alternative model parameters can effectively recognize voice data corresponding to the alternative recognition scene, and further, the recognition accuracy of the target recognition model can be improved.
In an implementation manner, the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene may be calculated based on a preset formula.
Wherein, the preset formula is as follows:
Figure BDA0002992297520000091
theta' represents the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene, theta represents the original model parameters of the initial recognition model, alpha represents the learning rate, m represents the number of a plurality of preset recognition tasks,
Figure BDA0002992297520000092
and representing model parameters obtained by training K rounds of the initial recognition model based on the corresponding sample data set aiming at the ith preset recognition task.
In one embodiment, the initial recognition model of the preset model structure may include: input layer, convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, full-link layer 1, and full-link layer 2.
Each convolution layer may include 64 convolution kernels of 3 × 3, and the step size of convolution processing performed by the convolution kernels is 2. After BN (Batch Normalization) processing is performed on convolution results of the convolutional layers, activation is performed using the Relu function. The full connection layer 1 and the full connection layer 2 are both composed of a neuron, and the activation function is sigmoid.
The alternative recognition scenarios include: the method comprises the steps of carrying out voice recognition on the basis of an isolated word data set of the Va language, carrying out voice recognition on the basis of an isolated word data set of the time, and carrying out voice recognition on the basis of a daily voice data set recorded by a user.
The number of preset recognition tasks based on the Va language isolated word data set is 5, the number of preset recognition tasks based on the time isolated word data set is 5, and the number of preset recognition tasks based on the daily voice data set recorded by a user is 5.
Then, the data set of isolated words in Va nationality language can be divided into 5 sample data groups, an initial recognition model of a preset model structure is trained on the basis of the corresponding sample data group for each preset recognition task, and further, parameters of alternative models corresponding to scenes for voice recognition based on the data set of isolated words in Va nationality language can be calculated on the basis of the preset formula
Figure BDA0002992297520000102
Specifically, the number of iterations in training may be 500,the optimizer for adjusting the model parameters during training may use SGD (Stochastic Gradient) and in the above preset formula, m is 5, K is 20, and α is 0.95.
Similarly, the time isolated word data set can be divided into 5 sample data groups, an initial recognition model of a preset model structure is trained on the basis of the corresponding sample data group for each preset recognition task, and then, the candidate model parameter corresponding to the scene for performing voice recognition on the time isolated word data set can be calculated on the basis of the preset formula
Figure BDA0002992297520000103
Specifically, the number of iterations in the training may be 1500, the optimizer that adjusts the model parameters in the training may use SGD, and in the preset formula, m is 5, K is 15, and α is 0.95.
Similarly, the daily voice data set recorded by the user can be divided into 5 sample data groups, the initial recognition model of the preset model structure is trained based on the corresponding sample data group for each preset recognition task, and then, the alternative model parameter corresponding to the scene of voice recognition based on the daily voice data set recorded by the user can be calculated based on the preset formula
Figure BDA0002992297520000104
Specifically, the number of iterations in the training may be 2000, the optimizer for adjusting the model parameters in the training may adopt SGD, and in the preset formula, m is 5, K is 25, and α is 1.
Then, target model parameters may be calculated based on equation (1).
Figure BDA0002992297520000101
Wherein A, B, C respectively represent the respective weights, theta*Representing the target model parameters. For example, a may be 0.3, B may be 0.3, and C may be 0.4.
With target model parameters theta*The target recognition model of (2) can be used for recognizing voice data of other scenes; alternatively, the object recognition model obtained by adjusting the numbers of A, B, C may be used for other types of recognition tasks, for example, for recognizing image data.
Based on the same inventive concept, an embodiment of the present application further provides a data identification device, referring to fig. 6, where fig. 6 is a structural diagram of the data identification device provided in the embodiment of the present application, and the device includes:
a to-be-recognized voice data obtaining module 601, configured to obtain to-be-recognized voice data of a current recognition scene;
the recognition module 602 is configured to input the to-be-recognized speech data into a pre-constructed target recognition model, so as to obtain a recognition result of the to-be-recognized speech data; the recognition result comprises text information contained in the voice data to be recognized;
wherein the model parameters of the target identification model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the plurality of candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
In one embodiment, the apparatus further comprises:
the target model parameter acquisition module is used for calculating the weighted sum of the alternative model parameters of each alternative recognition model as a target model parameter based on preset weight;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
In an embodiment, the target recognition model obtaining module is specifically configured to determine a model having the target model parameters as a target recognition model.
In one embodiment, the target recognition model obtaining module includes:
the identification model to be corrected acquisition module is used for determining a model with the target model parameters as the identification model to be corrected;
and the target recognition model obtaining submodule is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain the target recognition model.
In one embodiment, the apparatus further comprises:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene;
a sample data group acquisition module, configured to divide sample data corresponding to the candidate identification scene into a plurality of sample data groups according to the number of the plurality of preset identification tasks corresponding to the candidate identification scene, where the sample data groups are respectively used as sample data groups corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of a preset model structure based on the corresponding sample data group aiming at each preset recognition task;
the model parameter acquisition module to be processed is used for acquiring the model parameters of the initial recognition model after training as the model parameters to be processed when the number of rounds of training reaches the preset number;
the candidate model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene; wherein, the model parameter difference value corresponding to one preset identification task represents: and the difference value between the model parameter to be processed corresponding to the preset identification task and the original model parameter.
An embodiment of the present application further provides an electronic device, as shown in fig. 7, including a memory 701 and a processor 702;
a memory 701 for storing a computer program;
the processor 702 is configured to implement the data identification method provided in the embodiment of the present application when executing the program stored in the memory 701.
Specifically, the data identification method includes:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
wherein the model parameters of the target identification model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the plurality of candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
It should be noted that other implementation manners of the data identification method are the same as those of the foregoing method embodiment, and are not described herein again.
The electronic device may be provided with a communication interface for realizing communication between the electronic device and another device.
The processor, the communication interface, and the memory are configured to communicate with each other through a communication bus, where the communication bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above data identification methods.
In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the data recognition methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, system, electronic device, computer-readable storage medium, and computer program product embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference may be made to some descriptions of the method embodiments for relevant points.
The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (12)

1. A method of data identification, the method comprising:
acquiring voice data to be recognized of a current recognition scene;
inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
wherein the model parameters of the target identification model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the plurality of candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
2. The method of claim 1, wherein the constructing of the object recognition model comprises:
based on the preset weight, calculating the weighted sum of the candidate model parameters of each candidate recognition model as a target model parameter;
and determining a target recognition model based on the target model parameters.
3. The method of claim 2, wherein determining an object recognition model based on the object model parameters comprises:
and determining the model with the target model parameters as a target recognition model.
4. The method of claim 2, wherein determining an object recognition model based on the object model parameters comprises:
determining a model with the target model parameters as a recognition model to be corrected;
and training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain a target recognition model.
5. The method of claim 1, wherein the obtaining of the candidate model parameters of the candidate recognition models comprises:
acquiring sample data corresponding to each alternative identification scene;
dividing sample data corresponding to the candidate identification scene into a plurality of sample data groups according to the number of a plurality of preset identification tasks corresponding to the candidate identification scene, and respectively taking the sample data groups as the sample data groups corresponding to the plurality of preset identification tasks;
aiming at each preset recognition task, training an initial recognition model of a preset model structure based on a corresponding sample data set;
when the number of rounds of training reaches a preset number, acquiring model parameters of the initial recognition model after training as model parameters to be processed;
adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain alternative model parameters of the alternative recognition model corresponding to the alternative recognition scene; wherein, the model parameter difference value corresponding to one preset identification task represents: and the difference value between the model parameter to be processed corresponding to the preset identification task and the original model parameter.
6. A data recognition apparatus, the apparatus comprising:
the voice data to be recognized acquiring module is used for acquiring the voice data to be recognized of the current recognition scene;
the recognition module is used for inputting the voice data to be recognized into a pre-constructed target recognition model to obtain a recognition result of the voice data to be recognized; the recognition result comprises text information contained in the voice data to be recognized;
wherein the model parameters of the target identification model are as follows: the method comprises the steps of determining on the basis of candidate model parameters of a plurality of candidate recognition models trained in advance; the model structures of the plurality of candidate recognition models and the target recognition model are the same; the multiple candidate recognition models are obtained by training in a meta-learning mode based on sample data of candidate recognition scenes except the current recognition scene, and training sets adopted by the multiple candidate recognition models are different.
7. The apparatus of claim 6, further comprising:
the target model parameter acquisition module is used for calculating the weighted sum of the alternative model parameters of each alternative recognition model as a target model parameter based on preset weight;
and the target recognition model acquisition module is used for determining a target recognition model based on the target model parameters.
8. The apparatus according to claim 7, wherein the object recognition model obtaining module is specifically configured to determine a model having the object model parameters as an object recognition model.
9. The apparatus of claim 7, wherein the target recognition model obtaining module comprises:
the identification model to be corrected acquisition module is used for determining a model with the target model parameters as the identification model to be corrected;
and the target recognition model obtaining submodule is used for training the recognition model to be corrected based on the sample voice data of the current recognition scene to obtain the target recognition model.
10. The apparatus of claim 6, further comprising:
the sample data acquisition module is used for acquiring sample data corresponding to each alternative identification scene;
a sample data group acquisition module, configured to divide sample data corresponding to the candidate identification scene into a plurality of sample data groups according to the number of the plurality of preset identification tasks corresponding to the candidate identification scene, where the sample data groups are respectively used as sample data groups corresponding to the plurality of preset identification tasks;
the training module is used for training an initial recognition model of a preset model structure based on the corresponding sample data group aiming at each preset recognition task;
the model parameter acquisition module to be processed is used for acquiring the model parameters of the initial recognition model after training as the model parameters to be processed when the number of rounds of training reaches the preset number;
the candidate model parameter acquisition module is used for adjusting the original model parameters of the initial recognition model based on the model parameter difference values corresponding to the preset recognition tasks to obtain the candidate model parameters of the candidate recognition model corresponding to the candidate recognition scene; wherein, the model parameter difference value corresponding to one preset identification task represents: and the difference value between the model parameter to be processed corresponding to the preset identification task and the original model parameter.
11. An electronic device comprising a memory and a processor;
the memory is used for storing a computer program;
the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-5.
12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.
CN202110319650.2A 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium Active CN113066486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110319650.2A CN113066486B (en) 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110319650.2A CN113066486B (en) 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113066486A true CN113066486A (en) 2021-07-02
CN113066486B CN113066486B (en) 2023-06-09

Family

ID=76561818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110319650.2A Active CN113066486B (en) 2021-03-25 2021-03-25 Data identification method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113066486B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026359A (en) * 1996-09-20 2000-02-15 Nippon Telegraph And Telephone Corporation Scheme for model adaptation in pattern recognition based on Taylor expansion
CN111508479A (en) * 2020-04-16 2020-08-07 重庆农村商业银行股份有限公司 Voice recognition method, device, equipment and storage medium
CN111613212A (en) * 2020-05-13 2020-09-01 携程旅游信息技术(上海)有限公司 Speech recognition method, system, electronic device and storage medium
CN111797854A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Scene model establishing method and device, storage medium and electronic equipment
CN112434717A (en) * 2019-08-26 2021-03-02 杭州海康威视数字技术股份有限公司 Model training method and device
CN112489637A (en) * 2020-11-03 2021-03-12 北京百度网讯科技有限公司 Speech recognition method and device
WO2021047201A1 (en) * 2019-09-12 2021-03-18 上海依图信息技术有限公司 Speech recognition method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026359A (en) * 1996-09-20 2000-02-15 Nippon Telegraph And Telephone Corporation Scheme for model adaptation in pattern recognition based on Taylor expansion
CN111797854A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Scene model establishing method and device, storage medium and electronic equipment
CN112434717A (en) * 2019-08-26 2021-03-02 杭州海康威视数字技术股份有限公司 Model training method and device
WO2021047201A1 (en) * 2019-09-12 2021-03-18 上海依图信息技术有限公司 Speech recognition method and device
CN111508479A (en) * 2020-04-16 2020-08-07 重庆农村商业银行股份有限公司 Voice recognition method, device, equipment and storage medium
CN111613212A (en) * 2020-05-13 2020-09-01 携程旅游信息技术(上海)有限公司 Speech recognition method, system, electronic device and storage medium
CN112489637A (en) * 2020-11-03 2021-03-12 北京百度网讯科技有限公司 Speech recognition method and device

Also Published As

Publication number Publication date
CN113066486B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
KR102204286B1 (en) Batch normalization layers
CN109948149B (en) Text classification method and device
CN108197652B (en) Method and apparatus for generating information
US20220092416A1 (en) Neural architecture search through a graph search space
CN111027428B (en) Training method and device for multitasking model and electronic equipment
CN111626340B (en) Classification method, device, terminal and computer storage medium
CN110135681A (en) Risk subscribers recognition methods, device, readable storage medium storing program for executing and terminal device
WO2020151175A1 (en) Method and device for text generation, computer device, and storage medium
CN116822651A (en) Large model parameter fine adjustment method, device, equipment and medium based on incremental learning
EP4343616A1 (en) Image classification method, model training method, device, storage medium, and computer program
CN114168318A (en) Training method of storage release model, storage release method and equipment
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN113449840A (en) Neural network training method and device and image classification method and device
CN111178082A (en) Sentence vector generation method and device and electronic equipment
CN110807476A (en) Password security level classification method and device and electronic equipment
CN109783769B (en) Matrix decomposition method and device based on user project scoring
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
CN112949855A (en) Face recognition model training method, recognition method, device, equipment and medium
US7457788B2 (en) Reducing number of computations in a neural network modeling several data sets
CN113066486B (en) Data identification method, device, electronic equipment and computer readable storage medium
CN113269259B (en) Target information prediction method and device
CN106297807A (en) The method and apparatus of training Voiceprint Recognition System
CN114970732A (en) Posterior calibration method and device for classification model, computer equipment and medium
CN113420699A (en) Face matching method and device and electronic equipment
CN112906909A (en) Deep learning model training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240527

Address after: No.006, 6th floor, building 4, No.33 yard, middle Xierqi Road, Haidian District, Beijing 100085

Patentee after: BEIJING KINGSOFT CLOUD NETWORK TECHNOLOGY Co.,Ltd.

Country or region after: China

Patentee after: Wuxi Jinyun Zhilian Technology Co.,Ltd.

Address before: No.006, 6th floor, building 4, No.33 yard, middle Xierqi Road, Haidian District, Beijing 100085

Patentee before: BEIJING KINGSOFT CLOUD NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right