CN115358300A - Student cognitive recognition method, device and equipment based on voice and text classification - Google Patents

Student cognitive recognition method, device and equipment based on voice and text classification Download PDF

Info

Publication number
CN115358300A
CN115358300A CN202210905870.8A CN202210905870A CN115358300A CN 115358300 A CN115358300 A CN 115358300A CN 202210905870 A CN202210905870 A CN 202210905870A CN 115358300 A CN115358300 A CN 115358300A
Authority
CN
China
Prior art keywords
sample
classroom
data
voice data
cognitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210905870.8A
Other languages
Chinese (zh)
Inventor
曾康
李靖延
唐小煜
温溢舒
邱淑辉
何俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202210905870.8A priority Critical patent/CN115358300A/en
Publication of CN115358300A publication Critical patent/CN115358300A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for recognizing student cognition based on voice and text classification, wherein the method comprises the following steps: acquiring a voice data set of each student in a discussion classroom, wherein the voice data set comprises a plurality of voice data at different time moments; inputting the voice data set of each student into a preset voiceprint recognition model, obtaining voiceprint recognition data corresponding to the voice data at different time moments, and obtaining student identity identifications corresponding to the voice data at different time moments according to the voiceprint recognition data and a preset voiceprint feature library; inputting the voice data sets of students into a preset text conversion model to obtain classroom text data corresponding to the voice data at different time moments; and inputting the classroom text data corresponding to the voice data at different time moments into a preset cognitive recognition model, and acquiring cognitive grade data corresponding to the voice data at different time moments.

Description

Student cognitive recognition method, device and equipment based on voice and text classification
Technical Field
The invention relates to the technical field of computers, in particular to a student cognitive recognition method, device, equipment and storage medium based on voice and text classification.
Background
Discussion-based teaching is a teaching mode widely used in middle school classrooms. In the discussion taking students as main bodies, the students can exert subjective initiative and actively construct self cognition through a series of thinking activities. Meanwhile, the classroom discussion can establish closer relation between students and teachers, improve classroom performance, help students to explore potential, promote health quality of students, and realize common innovation of teaching and learning.
However, many shortcomings in the discussion are still to be solved. First, a large difference in the number of teachers and students is ubiquitous around the world. In the course of discussion, the teacher cannot timely master the discussion status of each group of members and cannot timely give help. Secondly, because the group members of the discussion lesson are randomly grouped, the cognitive level of the group members will affect the discussion participation and thinking degree of the students, and finally the discussion lesson will have little effect. Moreover, there are many factors, such as bad state of students, difficulty in getting through the characters between students, difficulty in supporting the students to speak actively, etc., which results in the failure of deep communication between students.
Disclosure of Invention
Based on this, the invention aims to provide a student cognitive recognition method, device, equipment and storage medium based on voice and text classification, which apply voiceprint recognition technology and text conversion technology to a discussion classroom to obtain the identity of a student corresponding to voice data and classroom text data, and adopt a deep learning method to accurately and efficiently evaluate the cognitive situation of the student in the discussion classroom, so as to provide more comprehensive information for teachers and help to improve the teaching quality of future discussion courses.
In a first aspect, an embodiment of the present application provides a student cognitive recognition method based on speech and text classification, including the following steps:
obtaining a voice data set of each student in a discussion classroom, wherein the voice data set comprises a plurality of voice data at different time moments;
inputting the voice data sets of the students into a preset voiceprint recognition model to obtain voiceprint recognition data corresponding to the voice data at different time moments, and obtaining student identity identifications corresponding to the voice data at different time moments according to the voiceprint recognition data and a preset voiceprint feature library;
inputting the voice data sets of the students into a preset text conversion model to obtain classroom text data corresponding to the voice data at different time moments;
inputting the classroom text data corresponding to the voice data at different time moments into a preset cognitive recognition model, and acquiring cognitive grade data corresponding to the voice data at different time moments;
and acquiring the cognitive change condition of each student in the discussion classroom according to the student identity corresponding to the voice data at each different time and the cognitive grade data corresponding to the voice data at each different time.
In a second aspect, an embodiment of the present application provides a device for recognizing cognitive levels of students based on speech and text classification, including:
the voice data acquisition module is used for acquiring a voice data set of each student in a discussion classroom, wherein the voice data set comprises a plurality of voice data at different time moments;
the identity recognition module is used for inputting the voice data sets of the students into a preset voiceprint recognition model, acquiring voiceprint recognition data corresponding to the voice data at different time moments, and acquiring student identity identifications corresponding to the voice data at different time moments according to the voiceprint recognition data and a preset voiceprint feature library;
the classroom text conversion module is used for inputting the voice data sets of the students into a preset text conversion model to obtain classroom text data corresponding to the voice data at different time moments;
the cognitive grade recognition module is used for inputting the classroom text data corresponding to the voice data at different time moments into a preset cognitive recognition model and acquiring the cognitive grade data corresponding to the voice data at different time moments;
and the display module is used for acquiring the cognitive change condition of each student in the discussion classroom according to the student identity identifications corresponding to the voice data at different time moments and the cognitive grade data corresponding to the voice data at different time moments.
In a third aspect, an embodiment of the present application provides a computer device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the steps of the cognitive recognition method for students based on speech and text classification according to the first aspect.
In a fourth aspect, the present application provides a storage medium storing a computer program, where the computer program, when executed by a processor, implements the steps of the method for cognitive recognition of students based on speech and text classification according to the first aspect.
In the embodiment of the application, a voiceprint recognition technology and a text conversion technology are applied to a discussion classroom to obtain the identity and classroom text data of a student corresponding to voice data, a deep learning method is adopted to accurately and efficiently evaluate the cognitive conditions corresponding to a plurality of time intervals of the student in the discussion period, a corresponding cognitive level change condition chart is constructed, the performance and learning condition of the student in the discussion period can be reflected, more comprehensive information is provided for a teacher, and the teaching quality of future discussion courses is improved.
For a better understanding and practice, the present invention is described in detail below with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic flowchart of a cognitive recognition method for students based on speech and text classification according to a first embodiment of the present application;
fig. 2 is a schematic diagram of a flow of a cognitive recognition method for students based on speech and text classification according to a second embodiment of the present application;
fig. 3 is a schematic diagram of a flow of a cognitive recognition method for students based on speech and text classification according to a third embodiment of the present application;
fig. 4 is a schematic diagram of S8 in a flow of a method for cognitive recognition of students based on speech and text classification according to a third embodiment of the present application;
fig. 5 is a schematic diagram of S82 in the process of the method for cognitive recognition of students based on speech and text classification according to the third embodiment of the present application;
fig. 6 is a schematic diagram of S83 in the process of the method for cognitive recognition of students based on speech and text classification according to the third embodiment of the present application;
fig. 7 is a schematic view of a flow chart of a cognitive recognition method for students based on speech and text classification according to a fourth embodiment of the present application;
fig. 8 is a schematic structural diagram of a device for recognizing cognitive levels of students based on speech and text classification according to a fifth embodiment of the present application;
fig. 9 is a schematic structural diagram of a computer device according to a sixth embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if/if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination", depending on the context.
In a discussion classroom, it is common for a teacher to give students a question for discussion, about which the students speak for discussion, and for cognition to be how well the students understand the question when speaking for discussion.
Referring to fig. 1, fig. 1 is a schematic flowchart of a cognitive recognition method for students based on speech and text classification according to a first embodiment of the present application, where the method includes the following steps:
s1: a speech data set is obtained for each student in a discussion classroom, wherein the speech data set includes speech data for a number of different time instants.
The main execution body of the student cognitive recognition method based on the voice and text classification is recognition equipment (hereinafter referred to as recognition equipment for short) of the student cognitive recognition method based on the voice and text classification.
In this embodiment, the speech data set when each student discusses in the discussion classroom can be obtained by the recognition device, specifically, through preset speech acquisition device, before the student speaks once, start this speech acquisition device, gather the speech data that this time speaks, and record the time moment that this time speaks, when the student finishes speaking, stop this speech acquisition device to acquire the speech data of a plurality of different time moments, as the speech data set.
In an alternative embodiment, the recognition device may also obtain the pre-stored speech data sets of the students in the discussion classroom from a preset database.
S2: and inputting the voice data sets of the students into a preset voiceprint recognition model to obtain voiceprint recognition data corresponding to the voice data at different time moments, and obtaining student identity identifications corresponding to the voice data at different time moments according to the voiceprint recognition data and a preset voiceprint feature library.
In this embodiment, the voice data is input to a preset voice print recognition model by the recognition device, voice print recognition data corresponding to the voice data at each different time moment is obtained, and cosine similarity matching is performed between the voice print recognition data and voice print feature data of each student in the voice print feature library, so as to obtain an identity of the student corresponding to the voice data.
The identity is a unique identity of each student. The identification may be the name of the student, the number of the student, and the like, and is not limited in detail herein.
In an optional embodiment, the recognition device employs a ResNet34 convolutional neural network structure to construct the voiceprint recognition model, wherein the ResNet34 convolutional neural network structure includes a convolutional layer, a pooling layer, a weight calculation layer and a fully-connected layer which are connected in sequence, the convolutional layer includes an algorithm for calculating feature vectors of the voice data, the pooling layer is used for performing dimensionality reduction processing to prevent gradient disappearance and network degradation, the weight calculation layer includes a weight calculation algorithm and a weight assignment algorithm for performing weight assignment on the feature vectors of the voice data, wherein the weight assignment algorithm is:
Xc=F scale (Uc,Sc)
wherein Xc is a feature vector of the speech data to which the weight is given, uc is Embedding corresponding to the feature vector, and Sc is a weight parameter corresponding to the feature vector.
The full-connection layer plays a role of a classifier and is used for carrying out a weighted sum on the feature vectors of the voice data given by the weight output by the weight calculation layer.
The recognition equipment takes a zhaishell data set in the zhvoice Chinese corpus as training voice data of a voiceprint recognition model, performs framing, windowing, short-time Fourier transform and standard deviation standardization processing on the training voice data of the voiceprint recognition model to obtain processed voice frequency spectrum characteristics, and finally inputs the processed voice frequency spectrum characteristics into the voiceprint recognition model for training. Specifically, an SGD optimizer is used in the training process of the voiceprint recognition model, a learning rate adjustment strategy selects StepLR, and a loss function selects cross entropy.
Referring to fig. 2, fig. 2 is a schematic view of a flow of a student cognitive recognition method based on speech and text classification according to a second embodiment of the present application, and further includes a step S6, where in step S6 before step S2, the following steps are specifically included:
s6: and preprocessing the voice data of each student to obtain the preprocessed voice data of each student, wherein the preprocessing comprises framing, windowing, short-time Fourier transform and standard deviation standardization.
Since speech data has short-term stationarity, the processing of the speech data analysis must be based on a "short-term" basis. In this embodiment, the recognition device performs preprocessing on the voice data, including framing, windowing, short-time fourier transform, and standard deviation standardization, specifically, the framing is performed by presetting an observation unit, where the observation unit is formed by collecting 512 sampling points, so as to avoid that two frames change too much and an overlap area needs to be present between the two frames. The overlap region is 0.001, i.e., 160, of the sample rate. And adding a Hamming window to each frame of voice signals of the voice data so as to increase the continuity of the left end and the right end of each frame. Performing short-time Fourier transform on each frame of windowed signal, as follows:
Figure BDA0003772410900000061
in the formula, STFT (t, f) is the frequency spectrum of the voice signal, and h (tau-t) is the Hamming window.
And then, carrying out standard deviation standardization on the frequency spectrum of the voice signal of the voice data to obtain the preprocessed voice data.
S3: and inputting the voice data sets of the students into a preset text conversion model to obtain classroom text data corresponding to the voice data at different time moments.
The text conversion model is one of ASR (automatic Speech Recognition) Speech text Recognition models, and is used for converting user Speech data into corresponding text data.
In this embodiment, the recognition device inputs the voice data sets of the students to a preset text conversion model to obtain the classroom text data corresponding to the voice data at different time instants, and in an optional embodiment, the recognition device may perform stop-word and segmentation processing on the classroom text data according to a preset jieba chinese segmentation library to improve accuracy and efficiency of cognitive recognition of the students.
S4: and inputting the classroom text data corresponding to the voice data at different time moments into a preset cognitive recognition model, and acquiring cognitive grade data corresponding to the voice data at different time moments.
The cognitive recognition model is one of SVM (Support Vector Machine) classifiers, and the SVM classifier is a generalized linear classifier (generalized linear classifier) which performs binary classification on data in a supervised learning (supervised learning) mode.
In this embodiment, the recognition device inputs the classroom text data corresponding to the voice data at different time instants to a preset cognitive recognition model, and acquires cognitive level data corresponding to the voice data at different time instants, where the cognitive level data may include a cognitive level a, a cognitive level B, a cognitive level C, a cognitive level D, and the like, so as to embody the cognitive situation of the voice data of the student at different time instants. And by adopting a deep learning method, the cognition condition of the students in the discussion classroom is accurately and efficiently evaluated.
Referring to fig. 3, fig. 3 is a schematic view of a flow of a student cognitive recognition method based on speech and text classification according to a third embodiment of the present application, which further includes training the cognitive recognition model, where the training of the cognitive recognition model includes steps S7 to S8, specifically as follows:
s7: the method comprises the steps of obtaining a plurality of sample classroom text data and cognitive tag data corresponding to the sample classroom text data.
The cognitive tag data corresponds to the cognitive grade data, and the evaluation tag is labeled manually and used for embodying the cognitive condition of the sample classroom text data.
In this embodiment, the recognition device obtains the sample classroom text data input by the user and the cognitive tag data corresponding to the sample classroom text data, where the sample classroom text data includes sample words.
S8: inputting the sample classroom text data into a preset sentence vector expression calculation model, obtaining the sentence vector expression of the sample classroom text data, inputting the sentence vector expression of the sample classroom text data and corresponding cognitive tag data into a neural network model to be trained, and obtaining the cognitive recognition model.
The sentence vector representation calculation model comprises an algorithm associated with sentence vector representation for calculating sample classroom text data;
the neural network model to be trained is an SVM classifier and comprises a punishment coefficient and a kernel function coefficient; in this embodiment, the recognition device inputs the sample classroom text data into a preset sentence vector representation calculation model, obtains sentence vector representation of the sample classroom text data, inputs the sentence vector representation of the sample classroom text data and corresponding cognitive tag data into a neural network model to be trained, iteratively trains the neural network model to be trained until the neural network model obtains an optimal punishment coefficient and a kernel function coefficient, and takes the neural network model as the cognitive recognition model.
Referring to fig. 4, the sentence vector representation calculation model includes a word embedding vector calculation module, a fused word vector calculation module, and a sentence vector calculation module, and fig. 4 is a schematic diagram of S8 in a flow of the student cognitive recognition method based on speech and text classification according to the third embodiment of the present application, and includes steps S81 to S83:
s81: and acquiring multi-dimensional word embedding vector representation of a plurality of sample words in the plurality of sample classroom text data output by the word embedding vector calculation module according to the plurality of sample classroom text data and the word embedding vector calculation module.
The Word embedded vector calculation module adopts a Word2Vec Word vector model, in an optional embodiment, the recognition device obtains a pre-trained Word2Vec Word vector model, the pre-trained Word2Vec Word vector model is obtained by training researchers of north teachers, universities and colleges on media such as microblog and fox search news, and the recognition device inputs the text data of the plurality of sample classes into the pre-trained Word vector model for transfer learning to serve as the Word embedded vector calculation module.
In this embodiment, the recognition device obtains, according to the plurality of sample classroom text data and the word embedding vector calculation module, a multi-dimensional word embedding vector representation of a plurality of sample words in the plurality of sample classroom text data output by the word embedding vector calculation module according to a dimension preset in the word embedding vector calculation module.
S82: and acquiring the fusion word vector representation of a plurality of sample words of the plurality of sample classroom text data according to the multi-dimensional word embedding vector representation of the plurality of sample words in the plurality of sample classroom text data and a fusion word vector calculation module.
In this embodiment, the recognition device acquires the fused word vector representation of a plurality of sample words of the plurality of sample classroom text data according to the multi-dimensional word embedded vector representation of the plurality of sample words in the plurality of sample classroom text data and the fused word vector calculation module, and the fused word vector representation is used for training the cognitive recognition model, so that the cognitive situation corresponding to each time interval of the student is more comprehensively analyzed.
Referring to fig. 5, fig. 5 is a schematic diagram of S82 in a flow of a cognitive recognition method for students based on speech and text classification according to a third embodiment of the present application, including steps S821 to S823, as follows:
s821: and obtaining word frequency-inverse document frequency values corresponding to a plurality of sample words of the plurality of sample classroom text data according to a plurality of sample words in the plurality of sample classroom text data and a preset word frequency-inverse document frequency calculation algorithm.
The word frequency-inverse document frequency calculation algorithm comprises the following steps:
Figure BDA0003772410900000081
in the formula, C i Word frequency-inverse document frequency value, TF, for the ith sample term i,j The word frequency of the jth corpus text data in the preset corpus is used as the ith sample word; IDF i The frequency of the inverse document of the ith sample word is obtained; n is i,j For the number of times the ith sample word appears in the jth corpus text data, Σ k n k,j K is the total number of occurrences of all the corpus words of the jth corpus text data, and k is the number of different sample words in the sample classroom text data; d is the total number of corpus text data, | j: t i ∈d j I is a word corpus containing a sample word t i The number of corpus text data;
in this embodiment, the recognition device obtains word frequency-inverse document frequency values corresponding to a plurality of sample words of the plurality of sample classroom text data according to a preset corpus, the plurality of sample words of the plurality of sample classroom text data, and a preset word frequency-inverse document frequency calculation algorithm.
S822: acquiring the fusion word vector representation of a plurality of sample words of the plurality of sample classroom text data according to the multi-dimensional word embedding vector representation of the plurality of sample words in the plurality of sample classroom text data, the corresponding word frequency-inverse document frequency value and a preset fusion word vector calculation algorithm.
The fusion word vector calculation algorithm is as follows:
W2V-TFIDF i =C i *W2V i
in the formula, W2V-TFIDF i A fused word vector, C, for the ith sample word i Is the word frequency-inverse document frequency value of the ith sample word, W2V i Embedding a vector representation for a multi-dimensional word of the ith sample word.
In this embodiment, the recognition device obtains the fused word vector representation of a plurality of sample words of the plurality of sample classroom text data according to the multidimensional word embedded vector representation of the plurality of sample words in the plurality of sample classroom text data, the corresponding word frequency-inverse document frequency value and a preset fused word vector calculation algorithm, and obtains the corresponding fused word vector representation by combining the word frequency-inverse document frequency value and the multidimensional word embedded vector representation, so as to improve the accuracy and the universality of the cognitive recognition model.
S823: and obtaining the sentence vector representation of the sample classroom text data according to the fusion word vector representation of the sample words of the sample classroom text data and a sentence vector calculation module.
In this embodiment, the recognition device obtains the sentence vector representations of the plurality of sample classroom text data according to the fused word vector representations of the plurality of sample words of the plurality of sample classroom text data and the sentence vector calculation module.
Referring to fig. 6, fig. 6 is a schematic diagram of S83 in a process of a student cognitive recognition method based on speech and text classification according to a third embodiment of the present application, including step S831, which is as follows:
s831: and obtaining sentence vector representation of the sample classroom text data according to the fusion word vector representation of the sample words of the sample classroom text data and a preset sentence vector calculation algorithm.
The sentence vector calculation algorithm is as follows:
Figure BDA0003772410900000101
in the formula, SV p For the sentence vector representation, sigma, of the p-th sample classroom text data k W2V-TFIDF k And n is the dimension of the fused word vector.
In this embodiment, the identification device averages each dimension represented by the fusion word vector of the sample words included in each sample class text data according to the fusion word vector of the sample words of the sample class text data and a preset sentence vector calculation algorithm, and obtains the sentence vector representation of the sample class text data.
S5: and acquiring the cognitive change condition of each student in the discussion classroom according to the student identity corresponding to the voice data at each different time and the cognitive grade data corresponding to the voice data at each different time.
In this embodiment, the recognition device obtains the cognitive change condition of each student in the discussion classroom according to the student identity corresponding to the voice data at each different time and the cognitive level data corresponding to the voice data at each different time.
Specifically, the recognition device may combine, based on the student identity identifiers, the cognitive level data corresponding to the voice data at different time instants corresponding to the student identity identifiers to serve as cognitive level data sets corresponding to the students, sort, according to a time sequence, a plurality of pieces of cognitive level data in the cognitive level data sets corresponding to the students, obtain the cognitive level data sets corresponding to the students after sorting, and construct a cognitive level change situation chart or a cognitive level change situation table corresponding to each student according to the cognitive level data sets corresponding to the students after sorting, so as to obtain the cognitive change situation of each student in the discussion classroom, and store the cognitive change situation in a preset storage space.
Referring to fig. 7, fig. 7 is a schematic view of a flow of a student cognitive recognition method based on speech and text classification according to a fourth embodiment of the present application, further including step S9, which is specifically as follows:
s9: responding to a display instruction, wherein the display instruction comprises student identity identifications of students to be displayed, acquiring the cognitive change condition of the students to be displayed in the discussion classroom according to the student identity identifications of the students to be displayed, returning to a preset display interface, and displaying and marking.
The display instruction is sent by a user and received by the identification device.
The identification equipment acquires a display instruction sent by a user and responds, and according to the acquired identification of the student to be displayed, the identification equipment acquires the cognitive change condition of the student to be displayed in a discussion classroom from a preset storage space, returns to a preset display interface, displays and marks, reflects the change of the cognitive grade of the student along with time, can reflect the performance and the learning condition of the student during discussion, provides more comprehensive information for a teacher, and is favorable for improving the teaching quality of future discussion courses.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a device for recognizing cognitive level of a student based on speech and text classification according to a fifth embodiment of the present application, where the device may implement all or part of the device for recognizing cognitive level of a student based on speech and text classification through software, hardware, or a combination of the two, and the device 8 includes:
the voice data obtaining module 81 is configured to obtain a voice data set of each student in a discussion classroom, where the voice data set includes a plurality of voice data at different time instants;
the identity recognition module 82 is configured to input the voice data sets of the students into a preset voiceprint recognition model, obtain voiceprint recognition data corresponding to the voice data at different time instants, and obtain student identity identifiers corresponding to the voice data at different time instants according to the voiceprint recognition data and a preset voiceprint feature library;
the classroom text conversion module 83 is configured to input the voice data sets of the students into a preset text conversion model, and obtain classroom text data corresponding to the voice data at different time instants;
the cognitive grade recognition module 84 is configured to input the classroom text data corresponding to the voice data at each different time instant to a preset cognitive recognition model, and acquire cognitive grade data corresponding to the voice data at each different time instant;
and the display module 85 is used for acquiring the cognitive change condition of each student in the discussion classroom according to the student identity corresponding to the voice data at each different time and the cognitive grade data corresponding to the voice data at each different time.
In the embodiment of the application, a voice data set of each student in a discussion classroom is obtained through a voice data obtaining module, wherein the voice data set comprises a plurality of voice data at different time moments; inputting the voice data sets of the students into a preset voiceprint recognition model through an identity recognition module to obtain voiceprint recognition data corresponding to the voice data at different time moments, and obtaining student identity identifications corresponding to the voice data at different time moments according to the voiceprint recognition data and a preset voiceprint feature library; inputting the voice data sets of the students into a preset text conversion model through a classroom text conversion module to obtain classroom text data corresponding to the voice data at different time moments; inputting the classroom text data corresponding to the voice data at different time moments into a preset cognitive recognition model through a cognitive grade recognition module, and acquiring cognitive grade data corresponding to the voice data at different time moments; and acquiring the cognitive change condition of each student in the discussion classroom through a display module according to the student identity corresponding to the voice data at each different time moment and the cognitive grade data corresponding to the voice data at each different time moment. By applying the voiceprint recognition technology and the text conversion technology to the discussion classroom, the identity identification of the student corresponding to the voice data and classroom text data are obtained, a deep learning method is adopted, the cognition condition of the student in the discussion classroom is accurately and efficiently evaluated, a corresponding cognition grade change condition chart is constructed, the performance and the learning condition of the student during the discussion can be reflected, more comprehensive information is provided for a teacher, and the teaching quality of future discussion courses is improved.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to a sixth embodiment of the present application, where the computer device 9 includes: a processor 91, a memory 92 and a computer program 93 stored on the memory 92 and operable on the processor 91; the computer device may store a plurality of instructions, where the instructions are suitable for being loaded by the processor 91 and executing the method steps in the first to fifth embodiments, and specific execution processes may refer to specific descriptions of the first to fifth embodiments and are not described herein again.
Processor 91 may include one or more processing cores, among others. The processor 91 is connected to various parts in the server by various interfaces and lines, executes various functions of the student cognitive rank recognition apparatus 8 based on voice and text classification and processes data by operating or executing instructions, programs, code sets or instruction sets stored in the memory 92 and calling data in the memory 92, and optionally, the processor 91 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), programmable Logic Array (PLA). The processor 91 may integrate one or a combination of a Central Processing Unit (CPU) 91, a Graphics Processing Unit (GPU) 91, a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing contents required to be displayed by the touch display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 91, but may be implemented by a single chip.
The Memory 92 may include a Random Access Memory (RAM) 92 or a Read-Only Memory (Read-Only Memory) 92. Optionally, the memory 92 includes a non-transitory computer-readable medium. The memory 92 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 92 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch instructions, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 92 may alternatively be at least one memory device located remotely from the processor 91.
The present embodiment further provides a storage medium, where the storage medium may store multiple instructions, where the instructions are suitable for being loaded by a processor and executed to perform the method steps in the first to fifth embodiments, and specific execution processes may refer to specific descriptions in the first to fifth embodiments and are not described herein again.
It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.
The present invention is not limited to the above-described embodiments, and various modifications and variations of the present invention are intended to be included within the scope of the claims and the equivalent technology of the present invention if they do not depart from the spirit and scope of the present invention.

Claims (10)

1. A student cognitive recognition method based on voice and text classification is characterized by comprising the following steps:
obtaining a voice data set of each student in a discussion classroom, wherein the voice data set comprises a plurality of voice data at different time moments;
inputting the voice data sets of the students into a preset voiceprint recognition model to obtain voiceprint recognition data corresponding to the voice data at different time moments, and obtaining student identity identifications corresponding to the voice data at different time moments according to the voiceprint recognition data and a preset voiceprint feature library;
inputting the voice data sets of the students into a preset text conversion model to obtain classroom text data corresponding to the voice data at different time moments;
inputting the classroom text data corresponding to the voice data at different time moments into a preset cognitive recognition model, and acquiring cognitive grade data corresponding to the voice data at different time moments;
and acquiring the cognitive change condition of each student in the discussion classroom according to the student identity corresponding to the voice data at each different time and the cognitive grade data corresponding to the voice data at each different time.
2. The cognitive recognition method for students based on voice and text classification according to claim 1, wherein before the voice data of each student is inputted to a preset voiceprint recognition model and the voiceprint recognition data corresponding to each voice data is obtained, the method comprises the following steps:
and preprocessing the voice data of each student to obtain the preprocessed voice data of each student, wherein the preprocessing comprises framing, windowing, short-time Fourier transform and standard deviation standardization.
3. The cognitive recognition method for students based on speech and text classification as claimed in claim 1, further comprising training the cognitive recognition model, comprising the steps of:
the method comprises the steps of obtaining a plurality of sample classroom text data and cognitive label data corresponding to the sample classroom text data, wherein the sample classroom text data comprises a plurality of sample words;
inputting the sample classroom text data into a preset sentence vector representation calculation model, obtaining the sentence vector representation of the sample classroom text data, inputting the sentence vector representation of the sample classroom text data and corresponding cognitive tag data into a neural network model to be trained, and obtaining the cognitive recognition model.
4. The cognitive recognition method for students based on speech and text classification as claimed in claim 3, wherein:
the sentence vector representation calculation model comprises a word embedded vector calculation module, a fusion word vector calculation module and a sentence vector calculation module;
the method for inputting the sample classroom text data into a preset sentence vector representation calculation model to obtain the sentence vector representation of the sample classroom text data comprises the following steps:
acquiring multi-dimensional word embedding vector representation of a plurality of sample words in the plurality of sample classroom text data output by the word embedding vector calculation module according to the plurality of sample classroom text data and the word embedding vector calculation module;
acquiring fusion word vector representation of a plurality of sample words of the plurality of sample classroom text data according to the multi-dimensional word embedding vector representation of the plurality of sample words in the plurality of sample classroom text data and a fusion word vector calculation module;
and obtaining sentence vector representation of the sample classroom text data according to the fusion word vector representation of the sample words of the sample classroom text data and a sentence vector calculation module.
5. The method for cognitive recognition of students according to claim 4, wherein the step of obtaining the fused word vector representation of the sample words in the sample class text data according to the multi-dimensional word embedded vector representation of the sample words in the sample class text data and the fused word vector calculation module comprises the steps of:
obtaining word frequency-inverse document frequency values corresponding to a plurality of sample words of the plurality of sample classroom text data according to a plurality of sample words in the plurality of sample classroom text data and a preset word frequency-inverse document frequency calculation algorithm, wherein the word frequency-inverse document frequency calculation algorithm is as follows:
Figure FDA0003772410890000021
in the formula, C i Word frequency-inverse document frequency value, TF, for the ith sample term i,j The word frequency of the jth corpus text data in the preset corpus is used as the ith sample word; IDF i The frequency of the reverse document of the ith sample word; n is a radical of an alkyl radical i,j For the number of times the ith sample word appears in the jth corpus text data, Σ k n k,j K is the total number of occurrences of all the corpus words of the jth corpus text data, and k is the number of different sample words in the sample classroom text data; d is the total number of corpus text data, | j: t i ∈d j I is the word t containing the sample in the corpus i The number of corpus text data;
acquiring fusion word vector representation of a plurality of sample words of the plurality of sample classroom text data according to multi-dimensional word embedding vector representation of the plurality of sample words in the plurality of sample classroom text data, corresponding word frequency-inverse document frequency values and a preset fusion word vector calculation algorithm, wherein the fusion word vector calculation algorithm is as follows:
W2V-TFIDF i =C i *W2V i
in the formula, W2V-TFIDF i A fused word vector, C, for the ith sample word i Is the word frequency-inverse document frequency value of the ith sample word, W2V i Embedding a vector representation for the multi-dimensional word of the ith sample word.
6. The cognitive recognition method for students based on speech and text classification as claimed in claim 5, wherein the obtaining of sentence vector representation of the sample classroom text data according to the fusion word vector representation of the sample words of the sample classroom text data and the sentence vector calculation module comprises the steps of:
obtaining sentence vector representation of the plurality of sample classroom text data according to fusion word vector representation of a plurality of sample words of the plurality of sample classroom text data and a preset sentence vector calculation algorithm, wherein the sentence vector calculation algorithm is as follows:
Figure FDA0003772410890000031
in the formula, SV p For the sentence vector representation, sigma, of the p-th sample classroom text data k W2V-TFIDF k And n is the dimension of the fused word vector.
7. The cognitive recognition method for students based on speech and text classification as claimed in claim 1, further comprising the steps of:
responding to a display instruction, wherein the display instruction comprises student identity marks of students to be displayed, acquiring the cognitive change condition of the students to be displayed in the discussion classroom according to the student identity marks of the students to be displayed, returning to a preset display interface, and displaying and labeling.
8. A student's cognitive grade recognition device based on speech and text classification, comprising:
the voice data acquisition module is used for acquiring a voice data set of each student in a discussion classroom, wherein the voice data set comprises a plurality of voice data at different time moments;
the identity recognition module is used for inputting the voice data sets of the students into a preset voiceprint recognition model, acquiring voiceprint recognition data corresponding to the voice data at different time moments, and acquiring student identity identifications corresponding to the voice data at different time moments according to the voiceprint recognition data and a preset voiceprint feature library;
the classroom text conversion module is used for inputting the voice data sets of the students into a preset text conversion model to obtain classroom text data corresponding to the voice data at different time moments;
the cognitive grade recognition module is used for inputting the classroom text data corresponding to the voice data at different time moments into a preset cognitive recognition model and acquiring the cognitive grade data corresponding to the voice data at different time moments;
and the display module is used for acquiring the cognitive change condition of each student in the discussion classroom according to the student identity identifications corresponding to the voice data at different time moments and the cognitive grade data corresponding to the voice data at different time moments.
9. A computer device comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for student cognitive recognition based on speech and text classification according to any one of claims 1 to 7 when executing the computer program.
10. A storage medium, characterized by: the storage medium stores a computer program which, when executed by a processor, implements the steps of the method for cognitive recognition of students based on speech and text classification according to any one of claims 1 to 7.
CN202210905870.8A 2022-07-29 2022-07-29 Student cognitive recognition method, device and equipment based on voice and text classification Pending CN115358300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210905870.8A CN115358300A (en) 2022-07-29 2022-07-29 Student cognitive recognition method, device and equipment based on voice and text classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210905870.8A CN115358300A (en) 2022-07-29 2022-07-29 Student cognitive recognition method, device and equipment based on voice and text classification

Publications (1)

Publication Number Publication Date
CN115358300A true CN115358300A (en) 2022-11-18

Family

ID=84031494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210905870.8A Pending CN115358300A (en) 2022-07-29 2022-07-29 Student cognitive recognition method, device and equipment based on voice and text classification

Country Status (1)

Country Link
CN (1) CN115358300A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049743A (en) * 2022-12-14 2023-05-02 深圳市仰和技术有限公司 Cognitive recognition method based on multi-modal data, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049743A (en) * 2022-12-14 2023-05-02 深圳市仰和技术有限公司 Cognitive recognition method based on multi-modal data, computer equipment and storage medium
CN116049743B (en) * 2022-12-14 2023-10-31 深圳市仰和技术有限公司 Cognitive recognition method based on multi-modal data, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110825875B (en) Text entity type identification method and device, electronic equipment and storage medium
CN104217226B (en) Conversation activity recognition methods based on deep neural network Yu condition random field
CN113724882B (en) Method, device, equipment and medium for constructing user portrait based on inquiry session
US10388177B2 (en) Cluster analysis of participant responses for test generation or teaching
CN109325106A (en) A kind of U.S. chat robots intension recognizing method of doctor and device
US20210312288A1 (en) Method for training classification model, classification method, apparatus and device
CN110263174B (en) Topic category analysis method based on focus attention
CN110046356B (en) Label-embedded microblog text emotion multi-label classification method
CN112163162A (en) Portrait recognition-based selected course recommendation method, storage medium and electronic equipment
CN111199158A (en) Method and device for scoring civil aviation customer service
CN111460101A (en) Knowledge point type identification method and device and processor
CN110019719A (en) Based on the question and answer asserted
CN110232128A (en) Topic file classification method and device
CN115358300A (en) Student cognitive recognition method, device and equipment based on voice and text classification
CN113453065A (en) Video segmentation method, system, terminal and medium based on deep learning
CN116844080B (en) Fatigue degree multi-mode fusion detection method, electronic equipment and storage medium
CN113011196A (en) Concept-enhanced representation and one-way attention-containing subjective question automatic scoring neural network model
CN116361541A (en) Test question recommendation method based on knowledge tracking and similarity analysis
CN107992482B (en) Protocol method and system for solving steps of mathematic subjective questions
US11521283B2 (en) Assigning a student to a cohort on a platform
CN116244474A (en) Learner learning state acquisition method based on multi-mode emotion feature fusion
JP2020177507A (en) Examination question prediction system and examination question prediction method
CN113704472B (en) Method and system for identifying hate and offensive language based on theme memory network
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
CN111581351B (en) Dynamic element embedding method based on multi-head self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination