CN113674736A - Classifier integration-based teacher classroom instruction identification method and system - Google Patents

Classifier integration-based teacher classroom instruction identification method and system Download PDF

Info

Publication number
CN113674736A
CN113674736A CN202110740889.7A CN202110740889A CN113674736A CN 113674736 A CN113674736 A CN 113674736A CN 202110740889 A CN202110740889 A CN 202110740889A CN 113674736 A CN113674736 A CN 113674736A
Authority
CN
China
Prior art keywords
classifier
voice
recognized
teacher
base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110740889.7A
Other languages
Chinese (zh)
Inventor
赵军
颜庆国
董勤伟
查显光
吴俊�
何泽家
赵新冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co Ltd
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co Ltd
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co Ltd, Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co Ltd
Priority to CN202110740889.7A priority Critical patent/CN113674736A/en
Publication of CN113674736A publication Critical patent/CN113674736A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Technology (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a system for identifying class instructions of teachers based on classifier integration. According to the method and the device, accurate identification of the teacher voice instruction in the classroom can be realized, better assessment of classroom teaching effects of the teacher is facilitated, and the control strength of the teacher to the classroom and the classroom atmosphere condition of the teacher are measured.

Description

Classifier integration-based teacher classroom instruction identification method and system
Technical Field
The invention relates to the technical field of classroom teaching, in particular to a method and a system for identifying classroom instructions of a teacher based on classifier integration.
Background
In the intelligent classroom teaching, teaching evaluation is a very important link in order to understand, evaluate, adjust and promote teaching services. The teaching assessment in the classroom teaching is used as the basis for the development of teaching activities, can directly reflect the teaching level of teachers, and also directly influences the learning effect of students. Among them, assessment of teacher-student interaction becomes an important part more and more. The teacher-student interaction effect refers to the response condition of students to instructions of teachers, for example, the teacher emphasizes that certain part of knowledge is important to note, and the students have or not to record the content in time; or the teacher asks questions, the students have no timely thought and answer the questions of the teacher, etc. The evaluation of the teacher-student interaction effect is important, and the evaluation is derived from the assignment of teacher class instructions, so that the behavior analysis of the teacher class instructions in the classroom teaching process can be realized, the control strength and the classroom atmosphere of the teacher in the classroom can be analyzed from the whole angle, and the evaluation method has important significance for improving the teaching mode, improving the teaching quality and creating good classroom atmosphere.
Currently, in the field of speech instruction signal processing, a text instruction corresponding to a speech instruction is usually obtained first, and then text classification of the instruction is performed. The traditional machine learning text classification method mainly comprises latent Dirichlet distribution, a K-nearest neighbor method, a support vector machine and the like. The methods are mature in development, but the classification effect is seriously dependent on the extracted features and the parameter optimization of the model, and the whole process is time-consuming and labor-consuming. With the popularization of neural networks, many scholars apply the neural networks to the field of natural language processing, such as applying convolutional neural networks to sentence classification, combining cyclic neural networks with convolutional neural networks to text classification, and the like. In short text classification of a single model, the models of the methods have high complexity, and although good results are obtained, the effect improvement space is limited.
Therefore, how to realize accurate recognition of the teacher voice instruction in the classroom is a technical problem to be solved by the technical personnel in the field.
Disclosure of Invention
The purpose of the invention is: the method and the system for identifying the classroom instruction of the teacher based on classifier integration can realize accurate identification of the voice instruction of the teacher in classroom, are beneficial to better evaluating the classroom teaching effect of the teacher, and measure the classroom control strength of the teacher and the classroom atmosphere condition of the teacher.
In order to achieve the above object, an aspect of the present invention provides a method for identifying a teacher classroom instruction based on classifier integration, including:
acquiring to-be-recognized voice of a teacher in a classroom teaching process;
preprocessing the voice to be recognized to obtain a preprocessed audio clip set;
performing content identification on the audio clip set to obtain an identified text set;
extracting content attributes of the text set to obtain an extracted feature vector set;
inputting the feature vector set into a preset integrated classifier to obtain an instruction classification result corresponding to the voice to be recognized; the integrated classifier is integrated by a plurality of base classifiers, a feature vector set of voice training data is used as a training sample, and an instruction classification result corresponding to the voice training data is used as a sample label for training.
Preferably, the preprocessing the speech to be recognized to obtain a preprocessed audio segment set includes:
carrying out noise reduction processing on the voice to be recognized by utilizing wavelet transformation;
and cutting and segmenting the voice to be recognized after the noise reduction treatment to obtain a plurality of audio fragment sets in sentence forms.
Preferably, the content recognition of the audio segment set to obtain a recognized text set includes:
and performing content recognition of fixed-form sentences on the audio clip set by utilizing a speech recognition technology, and converting the audio clip set into texts to obtain a recognized text set.
Preferably, the integration mode of the plurality of base classifiers is specifically as follows:
subjecting the plurality of base classifiers to an attention mechanism to generate a weight coefficient of each base classifier;
and integrating a plurality of base classifiers by using a weighted summation mode based on the weight coefficients.
Preferably, the plurality of base classifiers comprises: the system comprises a convolutional neural network base classifier, a bidirectional long-time and short-time memory neural network base classifier, a convolutional long-time and short-time memory neural network base classifier and a regional convolutional neural network base classifier.
In another aspect, the present invention provides a system for identifying classroom instructions for teachers based on classifier integration, including:
the acquisition module is used for acquiring the voice to be recognized of the teacher in the classroom teaching process;
the preprocessing module is used for preprocessing the voice to be recognized to obtain a preprocessed audio clip set;
the recognition module is used for carrying out content recognition on the audio clip set to obtain a recognized text set;
the extraction module is used for extracting the content attribute of the text set to obtain an extracted feature vector set;
the output module is used for inputting the feature vector set into a preset integrated classifier to obtain an instruction classification result corresponding to the voice to be recognized; the integrated classifier is integrated by a plurality of base classifiers, a feature vector set of voice training data is used as a training sample, and an instruction classification result corresponding to the voice training data is used as a sample label for training.
Preferably, the preprocessing module comprises:
the noise reduction processing unit is used for performing noise reduction processing on the voice to be recognized by utilizing wavelet transformation;
and the cutting and segmenting unit is used for cutting and segmenting the voice to be recognized after the noise reduction processing to obtain a plurality of audio fragment sets in sentence forms.
Preferably, the first and second electrodes are formed of a metal,
the recognition module is specifically configured to perform content recognition of fixed-form sentences on the audio segment set by using a speech recognition technology, convert the audio segment set into a text, and obtain a recognized text set.
Preferably, the integration mode of the plurality of base classifiers is specifically as follows:
subjecting the plurality of base classifiers to an attention mechanism to generate a weight coefficient of each base classifier;
and integrating a plurality of base classifiers by using a weighted summation mode based on the weight coefficients.
Preferably, the plurality of base classifiers comprises: the system comprises a convolutional neural network base classifier, a bidirectional long-time and short-time memory neural network base classifier, a convolutional long-time and short-time memory neural network base classifier and a regional convolutional neural network base classifier.
The invention has at least the following beneficial effects:
the invention obtains the voice to be recognized of a teacher in the classroom teaching process, preprocesses the obtained voice to be recognized to obtain an audio fragment set, then carries out content recognition on the audio fragment set to obtain a text set, then carries out content attribute extraction on the text set to obtain a characteristic vector set, and finally inputs the characteristic vector set into a preset integrated classifier to obtain an instruction classification result corresponding to the voice to be recognized, wherein the preset integrated classifier is integrated by a plurality of base classifiers, the characteristic vector set of voice training data is taken as a training sample, the instruction classification result corresponding to the voice training data is taken as a sample label for training to obtain the instruction classification result, and due to the adoption of the idea of integration of a plurality of base classifiers, when carrying out classification prediction on the characteristic vector set of the voice to be recognized, more information can be captured and a real hypothesis space can be learned as much as possible, the instruction identification accuracy is improved, so that accurate identification of the teacher voice instruction in a classroom is realized, better assessment of classroom teaching effects of the teacher is facilitated, and classroom control strength and classroom atmosphere conditions of the teacher are measured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a teacher class instruction identification method based on classifier integration according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an integration of the integration classifier according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a teacher classroom instruction identification system based on classifier integration according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only and do not represent the only embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, in one aspect, an embodiment of the present invention provides a method for identifying a teacher classroom instruction based on classifier integration, including:
and S110, acquiring the voice to be recognized of the teacher in the classroom teaching process.
In the embodiment of the invention, high-precision digital monitoring sound pickups are arranged on two sides of a platform of a teacher and connected to a host computer of a computer, and when the teacher performs classroom teaching, the digital monitoring sound pickups collect voice data of the teacher in real time and store the voice data into wav format audio files in sections according to time to serve as voices to be recognized.
And S120, preprocessing the voice to be recognized to obtain a preprocessed audio clip set.
In the embodiment of the invention, in order to avoid the interference of external noise, after the voice to be recognized of the teacher is acquired, the acquired voice to be recognized is preprocessed by noise reduction, enhancement and the like to obtain cleaner voice data, and the voice frequency segmentation operation is carried out to split continuous voice into the segmented audio frequency segment sets.
And S130, identifying the content of the audio clip set to obtain an identified text set.
In the embodiment of the invention, for interactive instruction recognition and classroom voice analysis, the content recognition can be carried out on the preprocessed audio segment set through a voice recognition technology, the audio data is translated into the text data, and the recognized text set is obtained, so that the language interactive behaviors of a teacher and students in a classroom can be effectively captured, and instructions are transmitted in language.
And S140, extracting the content attribute of the text set to obtain an extracted feature vector set.
In the embodiment of the invention, the text content is expressed by the vector of the multidimensional space by using the vector space model, and the semantic similarity is expressed by using the spatial similarity, so that the method is intuitive, easy to understand, concise and efficient. Specifically, a corpus for training word vectors is generated by using a word2vec tool, and the word vectors are trained on short texts. And extracting the content attribute feature vectors of the recognized text set by using the trained corpus to obtain a feature vector set corresponding to the speech to be recognized.
S150, inputting the feature vector set into a preset integrated classifier to obtain an instruction classification result corresponding to the voice to be recognized; the integrated classifier is integrated by a plurality of base classifiers, a feature vector set of voice training data is used as a training sample, and an instruction classification result corresponding to the voice training data is used as a sample label for training.
In the embodiment of the invention, a plurality of base classifiers are combined in a parallel connection mode to construct a classifier integration system. And then training each base classifier in the classifier integrated system by taking the feature vector set of the voice training data as a training sample and taking an instruction classification result corresponding to the voice training data as a sample label to obtain a plurality of trained base classifiers. And integrating the plurality of base classifiers according to the importance degree of each base classifier to obtain a final integrated classifier. And finally, inputting the feature vector set corresponding to the voice to be recognized into the integrated classifier, and performing text classification on the feature vectors to obtain an instruction classification result corresponding to the voice to be recognized.
As can be seen from the above, the teacher classroom instruction identification method based on classifier integration provided in the embodiments of the present invention obtains the to-be-identified voice of a teacher during classroom teaching, and preprocesses the obtained to-be-identified voice to obtain an audio segment set, then performs content identification on the audio segment set to obtain a text set, then performs content attribute extraction on the text set to obtain a feature vector set, and finally inputs the feature vector set into a preset integrated classifier, so as to obtain the instruction classification result corresponding to the to-be-identified voice, where the preset integrated classifier is integrated by a plurality of base classifiers, and is trained by using the feature vector set of voice training data as a training sample, and using the instruction classification result corresponding to the voice training data as a sample label, and due to the idea of integration of a plurality of base classifiers, when performing classification prediction on the feature vector set of the to-be-identified voice, more information can be captured as far as possible, real hypothesis space can be learned, and the recognition accuracy of the instruction is improved, so that accurate recognition of the teacher voice instruction in the classroom is realized, better evaluation on the classroom teaching effect of the teacher is facilitated, and the classroom control strength of the teacher and the classroom atmosphere condition of the teacher are measured.
In an embodiment of the present invention, the above step S120 is introduced, and an implementation manner of preprocessing the speech to be recognized is described, optionally, the process may include:
and S1201, performing noise reduction processing on the voice to be recognized by utilizing wavelet transformation.
In the embodiment of the invention, wavelet transformation is used for denoising the acquired voice to be recognized, so that a blank voice region when no speaking is carried out in a classroom and a more disordered voice region when free discussion or break in class are removed, and cleaner voice data are obtained. The coefficients which are larger than and smaller than the set threshold value in each layer of coefficients after wavelet decomposition are respectively processed, the part of the wavelet coefficient which is lower than the threshold value is set to be zero, and then inverse transformation is carried out, so that the purpose of removing noise is achieved.
S1202, cutting and segmenting the voice to be recognized after the noise reduction processing to obtain a plurality of audio fragment sets in sentence forms.
In the embodiment of the invention, an audioread function in Matlab is used for reading in the voice to be recognized after noise reduction processing, the voice to be recognized is segmented on a time domain, the sampling rate is set to be multiplied by the number of seconds at the beginning, and the sampling rate is set to be multiplied by the number of seconds at the end, the voice to be recognized is segmented into short audio streams of one sentence, and a plurality of audio segment sets in sentence forms are obtained.
In another embodiment of the present invention, the aforementioned step S130 is introduced, and a process of performing content recognition on the audio segment set to obtain a recognized text set is as follows:
and performing content recognition of fixed-form sentences on the audio clip set by utilizing a speech recognition technology, and converting the audio clip set into texts to obtain a recognized text set.
In the embodiment of the invention, the content recognition of fixed-form sentences is carried out on the audio fragment set by the SAPI voice recognition technology of the Speech SDK, the text information of the audio fragment set is captured, and the audio data is translated into the text data to obtain the recognized text set. The Speech SDK is a software development resource package developed by a set of voice application programs, is developed based on a COM standard, and a bottom layer protocol is completely independent of an application program layer in the form of a COM component.
The method for acquiring the recognized text result by using the programming interface for voice recognition in the SAPI mainly comprises the following steps: (1) IspRecognizer speech recognition engine interface. The method is mainly used for identifying and creating the identification engine. There are two types of recognition engines, shared and private. We use a private recognition engine that can only be used by the applications we create. (2) IspRecoContext is a context recognition interface which is mainly used for receiving and sending event messages related to voice recognition messages and loading and unloading recognition grammar resources. (3) IspRecoGrammer speech recognition grammar interface. The private speech recognition engine needs XML grammar document, and the program loads and activates XML grammar rule through the interface, and the grammar rule defines the words and sentences to be recognized. (4) Isprechresult interface speech recognition results interface. The interface can carry out the guess and identification of the acquired voice information identification words and then acquire the identified word results, and meanwhile, the interface can provide related information of error identification and prompt related results.
In specific implementation, in the step S140, in the process of extracting the content attribute of the text set to obtain the extracted feature vector set, when training word vectors for short texts, each word may be represented by a trained word vector:
Wi=(w1,w2,...,wk);
in the formula, wiRepresents the weight of the ith dimension in the word vector, and k represents the dimension of the word vector obtained after word2vec training.
Each sentence can be expressed in a word cascade manner:
Figure BDA0003141343640000091
in the formula (I), the compound is shown in the specification,
Figure BDA0003141343640000092
representing cascade operators and N representing the length of the sentence, i.e. the number of included words.
This makes it possible to obtain a matrix representation for each sentence as input data for each model.
Referring to fig. 2, a schematic diagram of an integration manner of the integration classifier is shown. In another embodiment of the present invention, in conjunction with FIG. 2, the process of integrating multiple base classifiers is described:
s1501, the multiple base classifiers are subjected to an attention mechanism, and a weight coefficient of each base classifier is generated.
In the embodiment of the present invention, it is considered that the classification effects of different base classifiers on samples of different classes and different regions are different, and therefore, for a plurality of base classifiers after training, we obtain the weight coefficient of each base classifier by using an attention mechanism, that is, the weight coefficient represents the importance degree of each base classifier. Due to the fact that the attention mechanism is used for adaptively adjusting the weights of different base classifiers, the method is more beneficial to selecting the combined weights of different base classifiers according to different data, so that the classification advantages of the different base classifiers on different data are exerted, the recognition accuracy of the text instruction is improved, and the classification performance is improved. The attention mechanism is a mechanism for making the target data undergo the process of weighted change by means of coding and decoding so as to make the system know clearly where the attention should be paid.
And S1502, integrating the plurality of base classifiers by using a weighted summation mode based on the weight coefficients.
In the embodiment of the invention, a plurality of base classifiers are integrated by using a weighted summation mode according to the weight coefficient of each base classifier to obtain a final integrated classifier, which is expressed as follows:
Figure BDA0003141343640000101
wherein H (x) represents an integrated classifier, hi(x) Represents the ith base classifier, ωiAnd ≧ 0 represents the weight coefficient of the ith base classifier, and T represents the number of base classifiers.
In particular, the plurality of base classifiers comprises: the system comprises a convolutional neural network base classifier, a bidirectional long-time and short-time memory neural network base classifier, a convolutional long-time and short-time memory neural network base classifier and a regional convolutional neural network base classifier.
In the embodiment of the invention, the final integrated classifier consists of a CNN (common Neural Networks, Chinese full name: Convolutional Neural Networks) base classifier, a B-LSTM (Bi-directional Long Short-Term Memory, Chinese full name: bidirectional Long and Short time Memory Neural Networks) base classifier and a C-LSTM (common Short Term: Convolutional Short Term Memory, Chinese full name: Convolutional Long and Short time Memory Neural Networks) base classifier, wherein the R-CNN (Region-Convolutional Neural Networks, Chinese full name: regional Convolutional Neural Networks) base classifier have respective strong fields, and the prediction effect generated when the same text set is processed has relatively independent characteristics, so that the classification prediction of the instruction of a teacher is facilitated.
In another aspect, the present invention provides a system for identifying a teacher classroom instruction based on classifier integration, which can be referred to in correspondence with the above-described method.
Referring to fig. 3, the system includes:
the obtaining module 310 is used for obtaining the voice to be recognized of the teacher in the classroom teaching process;
the preprocessing module 320 is configured to preprocess a speech to be recognized to obtain a preprocessed audio segment set;
the identification module 330 is configured to perform content identification on the audio segment set to obtain an identified text set;
the extracting module 340 is configured to perform content attribute extraction on the text set to obtain an extracted feature vector set;
the output module 350 is configured to input the feature vector set into a preset integrated classifier to obtain an instruction classification result corresponding to the speech to be recognized; the integrated classifier is integrated by a plurality of base classifiers, a feature vector set of voice training data is used as a training sample, and an instruction classification result corresponding to the voice training data is used as a sample label for training.
Optionally, in an embodiment of the present invention, the preprocessing module 320 includes:
a denoising processing unit 3201, configured to perform denoising processing on a speech to be recognized by using wavelet transform;
and a cutting and segmenting unit 3202, configured to cut and segment the speech to be recognized after the noise reduction processing, so as to obtain a plurality of sentence-form audio segment sets.
Optionally, in an embodiment of the present invention, the recognition module 330 is specifically configured to perform content recognition of fixed-form statements on an audio segment set by using a speech recognition technology, convert the audio segment set into a text, and obtain a recognized text set.
Optionally, in an embodiment of the present invention, an integration manner of the multiple base classifiers is specifically:
subjecting the plurality of base classifiers to an attention mechanism to generate a weight coefficient of each base classifier;
and integrating the plurality of base classifiers by using a weighted summation mode based on the weight coefficients.
Optionally, in an embodiment of the present invention, the plurality of base classifiers include: the system comprises a convolutional neural network base classifier, a bidirectional long-time and short-time memory neural network base classifier, a convolutional long-time and short-time memory neural network base classifier and a regional convolutional neural network base classifier.
As can be seen from the above, the instruction recognition system for a teacher in a classroom based on classifier integration according to the embodiments of the present invention obtains the voice to be recognized of the teacher in the classroom teaching process, preprocesses the obtained voice to be recognized to obtain the audio segment set, performs content recognition on the audio segment set to obtain the text set, performs content attribute extraction on the text set to obtain the feature vector set, and finally inputs the feature vector set into the preset integrated classifier, thereby obtaining the instruction classification result corresponding to the voice to be recognized, where the preset integrated classifier is integrated by a plurality of base classifiers, and is trained by using the feature vector set of the voice training data as the training sample, and using the instruction classification result corresponding to the voice training data as the sample label, and due to the idea of integration of a plurality of base classifiers, when performing classification prediction on the feature vector set of the voice to be recognized, more information can be captured as far as possible, real hypothesis space can be learned, and the recognition accuracy of the instruction is improved, so that accurate recognition of the teacher voice instruction in the classroom is realized, better evaluation on the classroom teaching effect of the teacher is facilitated, and the classroom control strength of the teacher and the classroom atmosphere condition of the teacher are measured.
For the description of the relevant parts in the system for identifying teacher class instructions based on classifier integration provided by the embodiment of the present invention, refer to the detailed description of the corresponding parts in the method for identifying teacher class instructions based on classifier integration provided by the embodiment of the present invention, and all have the corresponding effects of the method for identifying teacher class instructions based on classifier integration provided by the embodiment of the present invention, and are not described herein again.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A teacher classroom instruction identification method based on classifier integration is characterized by comprising the following steps:
acquiring to-be-recognized voice of a teacher in a classroom teaching process;
preprocessing the voice to be recognized to obtain a preprocessed audio clip set;
performing content identification on the audio clip set to obtain an identified text set;
extracting content attributes of the text set to obtain an extracted feature vector set;
inputting the feature vector set into a preset integrated classifier to obtain an instruction classification result corresponding to the voice to be recognized; the integrated classifier is integrated by a plurality of base classifiers, a feature vector set of voice training data is used as a training sample, and an instruction classification result corresponding to the voice training data is used as a sample label for training.
2. The method of claim 1, wherein the preprocessing the speech to be recognized to obtain a preprocessed audio clip set comprises:
carrying out noise reduction processing on the voice to be recognized by utilizing wavelet transformation;
and cutting and segmenting the voice to be recognized after the noise reduction treatment to obtain a plurality of audio fragment sets in sentence forms.
3. The method of claim 2, wherein the identifying the content of the audio clip set to obtain an identified text set comprises:
and performing content recognition of fixed-form sentences on the audio clip set by utilizing a speech recognition technology, and converting the audio clip set into texts to obtain a recognized text set.
4. The method for identifying instructor classroom instructions based on classifier integration according to claim 1, wherein the integration of the plurality of base classifiers is specifically:
subjecting the plurality of base classifiers to an attention mechanism to generate a weight coefficient of each base classifier;
and integrating a plurality of base classifiers by using a weighted summation mode based on the weight coefficients.
5. The method of claim 4, wherein the plurality of base classifiers comprises: the system comprises a convolutional neural network base classifier, a bidirectional long-time and short-time memory neural network base classifier, a convolutional long-time and short-time memory neural network base classifier and a regional convolutional neural network base classifier.
6. A teacher classroom instruction identification system based on classifier integration, comprising:
the acquisition module is used for acquiring the voice to be recognized of the teacher in the classroom teaching process;
the preprocessing module is used for preprocessing the voice to be recognized to obtain a preprocessed audio clip set;
the recognition module is used for carrying out content recognition on the audio clip set to obtain a recognized text set;
the extraction module is used for extracting the content attribute of the text set to obtain an extracted feature vector set;
the output module is used for inputting the feature vector set into a preset integrated classifier to obtain an instruction classification result corresponding to the voice to be recognized; the integrated classifier is integrated by a plurality of base classifiers, a feature vector set of voice training data is used as a training sample, and an instruction classification result corresponding to the voice training data is used as a sample label for training.
7. The classifier-based integrated teacher classroom instruction recognition system of claim 6, wherein said preprocessing module comprises:
the noise reduction processing unit is used for performing noise reduction processing on the voice to be recognized by utilizing wavelet transformation;
and the cutting and segmenting unit is used for cutting and segmenting the voice to be recognized after the noise reduction processing to obtain a plurality of audio fragment sets in sentence forms.
8. The teacher classroom instruction recognition system based on classifier integration according to claim 7,
the recognition module is specifically configured to perform content recognition of fixed-form sentences on the audio segment set by using a speech recognition technology, convert the audio segment set into a text, and obtain a recognized text set.
9. The system of claim 6, wherein the plurality of base classifiers are integrated in a manner that:
subjecting the plurality of base classifiers to an attention mechanism to generate a weight coefficient of each base classifier;
and integrating a plurality of base classifiers by using a weighted summation mode based on the weight coefficients.
10. The classifier-based integrated teacher classroom instruction recognition system of claim 9, wherein a plurality of said base classifiers includes: the system comprises a convolutional neural network base classifier, a bidirectional long-time and short-time memory neural network base classifier, a convolutional long-time and short-time memory neural network base classifier and a regional convolutional neural network base classifier.
CN202110740889.7A 2021-06-30 2021-06-30 Classifier integration-based teacher classroom instruction identification method and system Pending CN113674736A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110740889.7A CN113674736A (en) 2021-06-30 2021-06-30 Classifier integration-based teacher classroom instruction identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110740889.7A CN113674736A (en) 2021-06-30 2021-06-30 Classifier integration-based teacher classroom instruction identification method and system

Publications (1)

Publication Number Publication Date
CN113674736A true CN113674736A (en) 2021-11-19

Family

ID=78538535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110740889.7A Pending CN113674736A (en) 2021-06-30 2021-06-30 Classifier integration-based teacher classroom instruction identification method and system

Country Status (1)

Country Link
CN (1) CN113674736A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027584A (en) * 2019-10-23 2020-04-17 宋飞 Classroom behavior identification method and device
CN111274401A (en) * 2020-01-20 2020-06-12 华中师范大学 Classroom utterance classification method and device based on multi-feature fusion
US20200286396A1 (en) * 2017-11-17 2020-09-10 Shenzhen Eaglesoul Audio Technologies CO.,Ltd. Following teaching system having voice evaluation function
WO2020216064A1 (en) * 2019-04-24 2020-10-29 京东方科技集团股份有限公司 Speech emotion recognition method, semantic recognition method, question-answering method, computer device and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200286396A1 (en) * 2017-11-17 2020-09-10 Shenzhen Eaglesoul Audio Technologies CO.,Ltd. Following teaching system having voice evaluation function
WO2020216064A1 (en) * 2019-04-24 2020-10-29 京东方科技集团股份有限公司 Speech emotion recognition method, semantic recognition method, question-answering method, computer device and computer-readable storage medium
CN111027584A (en) * 2019-10-23 2020-04-17 宋飞 Classroom behavior identification method and device
CN111274401A (en) * 2020-01-20 2020-06-12 华中师范大学 Classroom utterance classification method and device based on multi-feature fusion

Similar Documents

Publication Publication Date Title
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
CN111783394A (en) Training method of event extraction model, event extraction method, system and equipment
CN111598485A (en) Multi-dimensional intelligent quality inspection method, device, terminal equipment and medium
CN111177350A (en) Method, device and system for forming dialect of intelligent voice robot
CN112487949A (en) Learner behavior identification method based on multi-modal data fusion
CN112417158A (en) Training method, classification method, device and equipment of text data classification model
CN111653270B (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN113051887A (en) Method, system and device for extracting announcement information elements
CN116741159A (en) Audio classification and model training method and device, electronic equipment and storage medium
CN105957517A (en) Voice data structured conversion method and system based on open source API
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN113393841B (en) Training method, device, equipment and storage medium of voice recognition model
CN112052686B (en) Voice learning resource pushing method for user interactive education
CN117238321A (en) Speech comprehensive evaluation method, device, equipment and storage medium
CN111489736A (en) Automatic seat speech technology scoring device and method
CN110782221A (en) Intelligent interview evaluation system and method
CN113674736A (en) Classifier integration-based teacher classroom instruction identification method and system
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal
CN115331703A (en) Song voice detection method and device
CN112015921A (en) Natural language processing method based on learning-assisted knowledge graph
CN112466324A (en) Emotion analysis method, system, equipment and readable storage medium
CN112562665A (en) Voice recognition method, storage medium and system based on information interaction
CN117635381B (en) Method and system for evaluating computing thinking quality based on man-machine conversation
CN112992150B (en) Method and device for evaluating using effect of dialect template
CN112287673B (en) Method for realizing voice navigation robot based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination