CN108920513A

CN108920513A - A kind of multimedia data processing method, device and electronic equipment

Info

Publication number: CN108920513A
Application number: CN201810549255.1A
Authority: CN
Inventors: 黄佳洋; 陈清才; 丘宇彬; 陈枫; 黄文山; 朱易文
Original assignee: Shenzhen Turing Robot Co Ltd
Current assignee: Shenzhen Turing Robot Co Ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2018-11-30
Anticipated expiration: 2038-05-31
Also published as: CN108920513B

Abstract

The embodiment of the invention discloses a kind of multimedia data processing methods, including：Obtain the first audio-frequency information and the second audio-frequency information belonged in classroom environment；The audio section for belonging to teacher's type is searched in the first audio-frequency information, as the first audio section of target, and target second audio section is searched in the second audio-frequency information；Target second audio section temporal information corresponding with the first audio section of target does not overlap；The first audio section of target is converted into the first text information of target, the first type of teaching information corresponding with the first audio section of target is identified according to the first text information of target；According to the first type of teaching information, the corresponding temporal information of the first audio section of target, the corresponding temporal information of target second audio section, the corresponding audio types of target second audio section and audio frequency characteristics, the key link information in classroom environment is counted.Using the present invention, the analysis efficiency and precision of analysis of Classroom instruction quality can be improved.

Description

A kind of multimedia data processing method, device and electronic equipment

Technical field

The present invention relates to field of computer technology more particularly to a kind of multimedia data processing method, device and electronics to set It is standby.

Background technique

Classroom instruction is the instructional mode expansion with " teacher be leading, based on student ".Teacher in teaching process The degree of participation of teaching quality and student will have a direct impact on student to the degree of understanding of theoretical knowledge and professional skill.Therefore it is Improve the quality of teaching, the process of teaching effectively managed it is most important, by teaching process teacher and Raw performance exercises supervision and scientifically analyzes, and is just able to achieve teaching process and fully and effectively manages.

In order to comprehensively analyze classroom, the existing quality surveillance to teaching classroom is based on manual oversight, is existed by supervisor Give lessons situation and the study situation of student of teaching classroom on-site supervision teacher.The classroom situation of monitoring is analyzed again And statistics, to evaluate Classroom instruction quality, but Classroom instruction quality is analyzed by way of manual oversight, there can be subjectivity And unstability, it cannot accomplish accurately to analyze Classroom instruction quality.

It is above-mentioned as it can be seen that manual analysis Classroom instruction quality inefficiency, can not accomplish efficiently, reasonably analyze Classroom instruction quality.

Summary of the invention

The technical problem to be solved by the embodiment of the invention is that provide a kind of multimedia data processing method, device and The analysis efficiency and precision of analysis of Classroom instruction quality can be improved in electronic equipment.

First aspect of the embodiment of the present invention provides a kind of multimedia data processing method, including：

Obtain the first audio-frequency information and the second audio-frequency information belonged in classroom environment；First audio-frequency information and institute Stating the second audio-frequency information is that the radio reception device for being located at different location by two respectively collects；

The audio section for belonging to teacher's type is searched in first audio-frequency information, as the first audio section of target, and Target second audio section is searched in second audio-frequency information；The target second audio section and first audio section of target Corresponding temporal information does not overlap；

First audio section of target is converted into the first text information of target, according to first text information of target Identify the first type of teaching information corresponding with first audio section of target；

According to the first type of teaching information, the corresponding temporal information of the first audio section of the target, the target The corresponding temporal information of two audio sections, the corresponding audio types of the target second audio section and audio frequency characteristics, count the class Key link information in hall environment；The key link information includes teacher's behavioural analysis result and students ' behavior analysis knot Fruit.

Wherein, described that the audio section for belonging to teacher's type is searched in first audio-frequency information, as the first sound of target Frequency range, including：

According to audio frequency characteristics by the first audio-frequency information cutting be multiple the first audio sections of unit；Each unit first Audio section is the audio section comprising continuous sound or is mute audio section；

Each first audio section of unit is input in audio classification model, respectively identification and each unit The matched audio types of first audio section；

It is the first audio section of unit of teacher's type by the audio types in first audio-frequency information, It is determined as first audio section of target.

Wherein, the lookup target second audio section in second audio-frequency information, including：

According to audio frequency characteristics by the second audio-frequency information cutting be multiple the second audio sections of unit；Each the second audio of unit Section is the audio section comprising continuous sound or is mute audio section；

Each second audio section of unit is input in the audio classification model, is identified respectively and described each The matched audio types of the second audio section of unit；

It is not the second audio section of unit of teacher's type by the audio types in second audio-frequency information, as Audio section to be adjusted；

If the temporal information of audio section to be adjusted and the temporal information of first audio section of target be not be overlapped, it is determined that If being target second audio section for the audio section to be adjusted；

It, will be described if the temporal information of audio section to be adjusted is Chong Die with the temporal information of first audio section of target The not corresponding audio section of lap in audio section to be adjusted, is determined as the target second audio section.

Wherein, described according to first text information of target identification and first audio section of target corresponding first Type of teaching information, including：

First text information of target is inputted into the first textual classification model, is obtained and first audio section of target Matched teaching means information；

First text information of target is inputted into the second textual classification model, is obtained and first audio section of target Matched content of courses structure；

By the corresponding teaching means information of first audio section of target and the corresponding teaching of the first audio section of the target Content structure is determined as the first type of teaching information corresponding with first audio section of target.

Wherein, further include：

Target second audio section by audio types for single student's type or more people student's types is converted to target second Text information, by target first text information and the target second text envelope adjacent with second text information of target Breath is input in first textual classification model, is obtained and the matched teaching means information of the target second audio section；

By target first text information and the target second text information adjacent with second text information of target It is input in second textual classification model, obtains and the matched content of courses structure of the target second audio section；

By the corresponding teaching means information of the target second audio section and the corresponding teaching of the target second audio section Content structure is determined as the second type of teaching information corresponding with the target second audio section.

Wherein, further include：

Obtain the index model to match with the classroom environment；The index model is the classroom constructed by teaching notes In knowledge-point models；

The knowledge point text in the index model is extracted, in first text information of target, determination is known with described Know the object time stamp of the text information of point text matches, and the position of the object time stamp in first audio-frequency information Add Keyword Tag；Text in the Keyword Tag is identical as the knowledge point text；

When getting searching keyword, retrieved in first audio-frequency information matched with the searching keyword Keyword Tag, as target keywords label；

In first audio-frequency information, audio-frequency information corresponding with the timestamp of the target keywords label is exported, And determine the curriculum information of audio-frequency information corresponding with the target keywords label.

Wherein, it is described according to the first type of teaching information, the corresponding temporal information of the first audio section of the target, The corresponding temporal information of the target second audio section, the corresponding audio types of the target second audio section and audio frequency characteristics, The key link information in the classroom environment is counted, including：

According to the first type of teaching information, the corresponding temporal information of the first audio section of the target, teacher's row is generated To analyze result；

According to the second type of teaching information, the corresponding temporal information of the target second audio section, the target The corresponding audio types of two audio sections and audio frequency characteristics generate students ' behavior and analyze result；

Teacher's behavioural analysis result and students ' behavior analysis result are determined as the pass in the classroom environment Key link information.

It is wherein, described according to the first type of teaching information, the corresponding temporal information of the first audio section of the target, Teacher's behavioural analysis is generated as a result, including：

According in the first type of teaching information teaching means information and the first audio section of the target it is corresponding Temporal information counts the total duration with the first audio section of target of identical teaching means information, obtains each teaching means letter Cease corresponding the first parameter of teacher's behavioural analysis；

According to the teaching hand content structure and the first audio section of target correspondence in the first type of teaching information Temporal information, count have identical content of courses structure the first audio section of target total duration, obtain each content of courses Corresponding the second parameter of teacher's behavioural analysis of structure；

According to the teaching means information in the first type of teaching information, counting has identical teaching means information The quantity of the first segment of target obtains the corresponding teacher's behavioural analysis third parameter of each teaching means information；

According to the content of courses structure in the first type of teaching information, counting has identical content of courses structure The quantity of the first segment of target obtains corresponding the 4th parameter of teacher's behavioural analysis of each content of courses structure；

According to first parameter of teacher's behavioural analysis, the second parameter of teacher's behavioural analysis, teacher's behavioural analysis third ginseng Several, described the 4th parameter of teacher's behavioural analysis, generates teacher's behavioural analysis result.

Wherein, it is described according to the second type of teaching information, the corresponding temporal information of the target second audio section, The corresponding audio types of the target second audio section and audio frequency characteristics generate students ' behavior analysis as a result, including：

The quantity with the target second audio section of identical audio frequency characteristics and audio types is counted, as students ' behavior point Analyse the first parameter；

According in the second type of teaching information teaching means information and the target second audio section it is corresponding Temporal information counts the total duration with the target second audio section of identical teaching means information, obtains each teaching means letter It ceases corresponding students ' behavior and analyzes the second parameter；

According in the second type of teaching information content of courses structure and the target second audio section it is corresponding Temporal information counts the total duration with the target second audio section of identical content of courses structure, obtains each content of courses knot The corresponding students ' behavior of structure analyzes third parameter；

According to the teaching means information in the second type of teaching information, counting has identical teaching means information The quantity of the second segment of target obtains the corresponding students ' behavior of each teaching means information and analyzes the 4th parameter；

According to the content of courses structure in the second type of teaching information, counting has identical content of courses structure The quantity of the second segment of target obtains the corresponding students ' behavior of each content of courses structure and analyzes the 5th parameter；

The first parameter is analyzed according to the students ' behavior, the students ' behavior analyzes the second parameter, the students ' behavior point It analyses third parameter, the 4th parameter of students ' behavior analysis, the students ' behavior and analyzes the 5th parameter, generate student's row To analyze result.

Second aspect of the embodiment of the present invention provides a kind of apparatus for processing multimedia data, including：

First obtains module, for obtaining the first audio-frequency information and the second audio-frequency information that belong in classroom environment；It is described First audio-frequency information and second audio-frequency information are that the radio reception device for being located at different location by two respectively collects；

First searching module, for searching the audio section for belonging to teacher's type in first audio-frequency information, as mesh Mark the first audio section；

Second searching module, for searching target second audio section in second audio-frequency information；The target second Audio section temporal information corresponding with first audio section of target does not overlap；

Conversion module, for first audio section of target to be converted to the first text information of target；

Identification module, for according to target text information identification and first audio section of target corresponding first Type of teaching information；

Statistical module, for being believed according to the first type of teaching information, the first audio section of the target corresponding time Breath, the corresponding temporal information of the target second audio section, the corresponding audio types of the target second audio section and audio are special Sign, counts the key link information in the classroom environment；The key link information includes teacher's behavioural analysis result and Raw behavioural analysis result.

Wherein, first searching module, including：

First cutting unit is used to according to audio frequency characteristics be multiple the first sounds of unit by the first audio-frequency information cutting Frequency range；Each the first audio section of unit is the audio section comprising continuous sound or is mute audio section；

First recognition unit is known respectively for each first audio section of unit to be input in audio classification model Not with the matched audio types of each first audio section of unit；

First determination unit, for being teacher's type by the audio types in first audio-frequency information The first audio section of unit is determined as first audio section of target.

Wherein, second searching module, including：

Second cutting unit is used to according to audio frequency characteristics be multiple the second audio sections of unit by the second audio-frequency information cutting； Each the second audio section of unit is the audio section comprising continuous sound or is mute audio section；

Second recognition unit, for each second audio section of unit to be input in the audio classification model, point It Shi Bie not be with the matched audio types of each second audio section of unit；

Second determination unit, for not being the list of teacher's type by the audio types in second audio-frequency information The second audio section of position, as audio section to be adjusted；

Second determination unit, if be also used to audio section to be adjusted temporal information and first audio section of target Temporal information be not overlapped, it is determined that if being target second audio section for the audio section to be adjusted；

Second determination unit, if be also used to audio section to be adjusted temporal information and first audio section of target Temporal information overlapping be determined as the target the then by the not corresponding audio section of lap in the audio section to be adjusted Two audio sections.

Wherein, the identification module, including：

Acquiring unit obtains and the mesh for first text information of target to be inputted the first textual classification model Mark the matched teaching means information of the first audio section；

The acquiring unit, for first text information of target to be inputted the second textual classification model, acquisition and institute State the matched content of courses structure of the first audio section of target；

Third determination unit, for by the corresponding teaching means information of the first audio section of target and the target the The corresponding content of courses structure determination of one audio section is the first type of teaching information corresponding with first audio section of target.

Wherein, further include：

The conversion module is also used to audio types be single student's type or the target second of more people student's types Audio section is converted to the second text information of target, by target first text information adjacent with second text information of target It is input in first textual classification model, obtains and the target second audio section with second text information of target Matched teaching means information；

First obtains module, be also used to target first text information adjacent with second text information of target and Second text information of target is input in second textual classification model, is obtained and the target second audio section The content of courses structure matched；

Determining module is used for the corresponding teaching means information of the target second audio section and the second sound of the target The corresponding content of courses structure determination of frequency range is the second type of teaching information corresponding with the target second audio section.

Wherein, further include：

Second obtains module, for obtaining the index model to match with the classroom environment；The index model is logical Cross the knowledge-point models in the classroom of teaching notes building；

Adding module, for extracting the knowledge point text in the index model, in first text information of target In, the determining object time with the text information of the knowledge point text matches stabs, and the mesh in first audio-frequency information Add Keyword Tag in the position for marking timestamp；Text in the Keyword Tag is identical as the knowledge braille text；

Retrieval module, for being retrieved and the inquiry in first audio-frequency information when getting searching keyword The Keyword Tag of Keywords matching, as target keywords label；

Output module, for exporting the timestamp pair with the target keywords label in first audio-frequency information The audio-frequency information answered, and determine the curriculum information of audio-frequency information corresponding with the target keywords label.

Wherein, the statistical module, including：

First generation unit, when being used for corresponding according to the first type of teaching information, the first audio section of the target Between information, generate teacher's behavioural analysis result；

Second generation unit, when being used for corresponding according to the second type of teaching information, the target second audio section Between information, the corresponding audio types of the target second audio section and audio frequency characteristics, generate students ' behavior analyze result；

4th determination unit, for teacher's behavioural analysis result and students ' behavior analysis result to be determined as Key link information in the classroom environment.

Wherein, first generation unit, including：

First statistics subelement, for according in the first type of teaching information teaching means information and the mesh The corresponding temporal information of the first audio section is marked, the total duration with the first audio section of target of identical teaching means information is counted, Obtain corresponding the first parameter of teacher's behavioural analysis of each teaching means information；

The first statistics subelement, is also used to according to the teaching hand content structure in the first type of teaching information, Temporal information corresponding with first audio section of target counts first audio section of target with identical content of courses structure Total duration, obtain corresponding the second parameter of teacher's behavioural analysis of each content of courses structure；

The first statistics subelement, is also used to according to the teaching means information in the first type of teaching information, system The quantity with the first segment of target of identical teaching means information is counted, it is corresponding old to obtain each teaching means information Teacher's behavioural analysis third parameter；

The first statistics subelement, is also used to according to the content of courses structure in the first type of teaching information, system The quantity with the first segment of target of identical content of courses structure is counted, it is corresponding old to obtain each content of courses structure The 4th parameter of teacher's behavioural analysis；

First generates subelement, for being joined according to first parameter of teacher's behavioural analysis, teacher's behavioural analysis second Number, teacher's behavioural analysis third parameter, the 4th parameter of teacher's behavioural analysis, generate teacher's behavioural analysis result.

Wherein, second generation unit, including：

Second statistics subelement, for counting the target second audio section with identical audio frequency characteristics and audio types Quantity analyzes the first parameter as students ' behavior；

The second statistics subelement, is also used to according to the teaching means information in the second type of teaching information, and The corresponding temporal information of the target second audio section, counts the target second audio section with identical teaching means information Total duration obtains the corresponding students ' behavior of each teaching means information and analyzes the second parameter；

The second statistics subelement, is also used to according to the content of courses structure in the second type of teaching information, and The corresponding temporal information of the target second audio section, counts the target second audio section with identical content of courses structure Total duration obtains the corresponding students ' behavior analysis third parameter of each content of courses structure；

The second statistics subelement, is also used to according to the teaching means information in the second type of teaching information, system The quantity with the second segment of target of identical teaching means information is counted, corresponding of each teaching means information is obtained Raw the 4th parameter of behavioural analysis；

The second statistics subelement, is also used to according to the content of courses structure in the second type of teaching information, system The quantity with the second segment of target of identical content of courses structure is counted, corresponding of each content of courses structure is obtained Raw the 5th parameter of behavioural analysis；

Second generates subelement, for analyzing the first parameter, students ' behavior analysis second according to the students ' behavior Parameter, students ' behavior analysis third parameter, the students ' behavior analyze the 4th parameter, students ' behavior analysis the 5th Parameter generates the students ' behavior analysis result.

The third aspect of the embodiment of the present invention provides a kind of electronic equipment, including：Processor and memory, the processing Device is connected with memory, wherein the memory supports electronic equipment to execute in first aspect of the embodiment of the present invention for storing The program code of method, the processor are configured for executing the method in first aspect of the embodiment of the present invention.

Fourth aspect of the embodiment of the present invention provides a kind of computer storage medium, which is characterized in that the computer is deposited Storage media is stored with computer program, and the computer program includes program instruction, refers to when the processor executes described program The method in first aspect of the embodiment of the present invention is executed when enabling.

The first sound that the embodiment of the present invention is collected by obtaining the different radio reception device in two positions in classroom environment Frequency information and the second audio-frequency information；The audio section for belonging to teacher's type is searched in the first audio-frequency information as the first sound of target Frequency range is searched with the nonoverlapping audio section of temporal information of the first audio section of target in the second audio-frequency information as target the Two audio sections；The first audio section of target is converted into the first text information of target, according to the identification of the first text information of target and mesh Mark the corresponding first type of teaching information of the first audio section；According to the first type of teaching information, the time of the first audio section of target Information, the audio types of target second audio section and audio frequency characteristics count the key link information in classroom environment.Due to During handling audio-frequency information to obtain classroom key link information, mesh can be automatically searched without manually participating in Mark with phonetic symbols frequency range and the type of teaching information for identifying target sound frequency range, then the key link information in programming count classroom environment, into And it can be to avoid the bring tedious steps due to key link information of manual analysis classroom, to improve the analysis of Classroom instruction quality Efficiency.Meanwhile using the alternate analysis between the first audio-frequency information and the second audio-frequency information, can rationally, accurate analysis class Hall quality.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only Some embodiments of the present invention, for those of ordinary skill in the art, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.

Fig. 1 a is a kind of flow diagram of multimedia data processing method provided in an embodiment of the present invention；

Fig. 1 b is a kind of schematic diagram of determining the first audio section of target provided in an embodiment of the present invention；

Fig. 2 is a kind of flow diagram for obtaining audio-frequency information provided in an embodiment of the present invention；

Fig. 3 is a kind of flow diagram for identifying type of teaching information provided in an embodiment of the present invention；

Fig. 4 is the flow diagram of another multimedia data processing method provided in an embodiment of the present invention；

Fig. 5 is the flow diagram of another multimedia data processing method provided in an embodiment of the present invention；

Fig. 6 is a kind of structural schematic diagram of apparatus for processing multimedia data provided in an embodiment of the present invention；

Fig. 7 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.

Term " includes " in description and claims of this specification and above-mentioned attached drawing and " having " and they appoint What is deformed, it is intended that is covered and non-exclusive is included.Such as contain the process, method of a series of steps or units, system, Product or equipment are not limited to listed step or unit, but optionally further comprising the step of not listing or unit, Or optionally further comprising other step or units intrinsic for these process, methods, product or equipment.

The execution of multimedia data processing method as mentioned in the embodiments of the present invention depends on computer program, can run On the computer system of Feng Ruoyiman system.The computer program can integrate in the application, also can be used as independent work Has class application operation.It is described in detail separately below：

A referring to Figure 1 is a kind of flow diagram of multimedia data processing method provided in an embodiment of the present invention, such as Shown in Fig. 1 a, the multimedia data processing method is included at least：

Step S101 obtains the first audio-frequency information and the second audio-frequency information belonged in classroom environment；First audio Information and second audio-frequency information are that the radio reception device for being located at different location by two respectively collects；

Specifically, the first audio-frequency information and the second audio-frequency information in classroom environment are extracted in audio database, wherein First audio-frequency information is the audio-frequency information of the teacher collected by shotgun microphone, and shotgun microphone can be clearly collected into always Sound of the teacher in classroom environment；Second audio-frequency information is the audio letter completely collected in classroom environment by omnidirectional microphone Breath, shotgun microphone and omnidirectional microphone are placed on position different in classroom environment, for example, shotgun microphone can be placed For collecting the audio-frequency information of teacher on dais before classroom, omnidirectional microphone can be placed on the wall among classroom and use All audio-frequency informations in collection classroom environment.First audio-frequency information and the second audio-frequency information are with two-channel pulse code tune The format storage for making (Pulse Code Modulation, PCM) stream, by specific data sample format with LLRR (Left Left Right Right, left and right are right) it arranges to isolate the first audio-frequency information and the second audio-frequency information.First audio letter Breath may include the audio-frequency information of teacher, the audio-frequency information of student, noise audio-frequency information and mute audio in classroom environment Information；Second audio-frequency information may include the audio-frequency information of teacher (than the sound of the audio-frequency information of the teacher in the first audio-frequency information Amount wants small), audio-frequency information (volume than the audio-frequency information of the student in the first audio-frequency information is big), the classroom environment of student In noise audio-frequency information and mute audio information, the audio-frequency information of the teacher in certain first audio-frequency information it is the most clear；The The audio-frequency information of student in two audio-frequency informations is the most clear.Since the first audio-frequency information and the second audio-frequency information are same It is collected in a classroom environment in face of different objects, therefore the duration of the first audio-frequency information and the second audio-frequency information, and Each moment, corresponding audio types (for example, teacher's audio types, student audio type) were all consistent in audio-frequency information, In other words, it is complementary to one another between the content of the first audio-frequency information and the content of the second audio-frequency information.

Step S102 searches the audio section for belonging to teacher's type, as the first sound of target in first audio-frequency information Frequency range, and target second audio section is searched in second audio-frequency information；The target second audio section and the target the The corresponding temporal information of one audio section does not overlap；

Specifically, may include a variety of audio types in the first audio-frequency information (for example, teacher's type, single student's class Type, mute type etc.) audio-frequency information search the first audio-frequency information therefore first with the classification feature of audio classification model In belong to the audio section of teacher's type, as the first audio section of target, the audio section of teacher's type refers to that teacher's voice messaging accounts for The audio section of major part, for example, the audio section that teacher explains trigonometric function is target first in the first audio-frequency information Audio section.The nonoverlapping audio section of temporal information with the first audio section can also be further searched in the second audio-frequency information As target second audio section, temporal information refers to the initial time stamp of audio section and terminates timestamp.It can be seen that target The audio-frequency information of teacher's type is only included in first audio section, includes the audio of a variety of audio types in target second audio section Information (for example, single student's type, more people student's types, mute type etc.), can be by the first audio by step S102 Information and the second audio-frequency information are divided into multiple the first audio sections of target and multiple target second audio sections.

The detailed process for searching the first audio section of target and target second audio section is：It can will be deposited in first audio section In continuous sound and mute audio section as the first audio section of unit, be in the first audio section of unit otherwise be have connect Continue the audio section of continuous sound or is mute audio section.Based on audio classification model, each the first sound of unit is identified respectively The audio types of frequency range, audio types may include：Teacher's type, single student's type, more people student's types, mute type Deng.It will identify that the first audio section of unit for belonging to teacher's type come is the first sound of target as the first audio section of target The corresponding audio types of frequency range are teacher's type.It similarly, can be by there are continuous sound and mute sound in the second audio section Frequency range is in the second audio section of unit or is wanted with the audio section of continuous continuous sound as the second audio section of unit It is mute audio section.Based on above-mentioned audio classification model, the audio types of each the second audio of unit are identified respectively.It will know Not Chu Lai the second audio section of unit for being not belonging to teacher's audio types, as audio section to be adjusted.If audio section to be adjusted The temporal information of the first audio section of temporal information and target is not overlapped, then the audio section to be adjusted is just determined as target Two audio sections；If the temporal information of the first audio section of temporal information and target of audio section to be adjusted has overlapping, then just will The not corresponding audio section of lap in the audio section to be adjusted, is determined as target second audio section.

By taking first audio section of unit as an example, illustrate how audio types to be identified, to it based on audio classification model Remaining the second audio section of the first audio section of unit or unit, using the available corresponding audio types of identical method.With 20ms is that window is long, and the intersection between window and window is 1/2, utilizes Fourier transformation from time domain number the first audio section of unit According to being transformed to frequency domain data, low frequency enhanced processing is carried out by logarithmic function and is converted to Meier frequency spectrum, while by the frequency after conversion Numeric field data is mapped to 64 dimensions spatially, then the first audio section of unit is converted to 64 × 501 matrix.The matrix of generation is defeated Enter into trained audio classification model, the audio to match with the first audio section of unit is obtained from audio classification model Type.It is understood that audio classification model be it is trained in advance, training process is an audio section and corresponding Audio types are a training sample, pass through multiple audio sections and corresponding audio types (a plurality of training sample) building one A audio classification model.It, can be from audio point after inputting the first audio section of unit to audio classification model after the completion of building Matched audio types are exported in class model.For example, the first audio section of unit is：" for this problem, I thinks that A option is Correctly ", by above-mentioned the first audio section of unit input audio disaggregated model, the output of audio classification model and above-mentioned unit the One audio section " for this problem, I thinks that A option is correct " matched audio types are：Single student's type.

It is a kind of schematic diagram of determining the first audio section of target provided in an embodiment of the present invention please also refer to Fig. 1 b.Such as Shown in Fig. 1 b, by when a length of 2 points of 30 seconds first audio-frequency information 10a be divided into 5 the first audio section of unit (5 units first The temporal information of audio is respectively：00:00-00:20,00:20-01:10, 01:10-01:35,01:35-02:05,02:05- 02:30), by when a length of 2 points of 30 seconds second audio-frequency information 10b be divided into 4 the first audio sections of unit (4 the first sounds of unit The temporal information of frequency is respectively： 00:00-00:45,00:45-01:35,01:35-02:05,02:05-02:30).It identifies respectively The audio types of the first audio section of above-mentioned 5 units, wherein section 00:00-00:20 the first audio section of unit, section 01: 10-01:35 the first audio section of unit, section 02:05-02:The audio types of 30 the first audio section of unit are teacher's class Type, therefore the first audio section of above-mentioned 3 units is the first audio section of target.The second audio section of above-mentioned 4 units is identified respectively Audio types, wherein section 00:00-00:The audio types of 45 the second audio section of unit are single student's type, section 00:45-01:The audio types of 35 the second audio section of unit are more people student's types, section 01:35-02:05 unit The audio types of two audio sections are mute type, section 02:05-02:The audio types of 30 the second audio section of unit are teacher Type.To section 00:00-00:For 45 the second audio section of unit, due to section 00:00-00:20 audio section is target First audio section, therefore, section 00:20-00:The audio types of 45 the second audio section of unit are single student's type.To area Between 00:45-01:For 35 the second audio section of unit, due to section 01:10-01:35 audio section is the first audio of target Section, therefore, section 00:45-01:The audio types of 10 the second audio section of unit are more people student's types.To section 01:35- 02:For 05 the second audio section of unit, section 01:35-02:The audio types of 05 the second audio section of unit are mute class Type.To sum up, section 00:20-00:45 the second audio section of unit, section 00:45-01:10 the second audio section of unit, section 01:35-02:05 the second audio section of unit is target second audio section.

First audio section of target is converted to the first text information of target, according to the target first by step S103 Text information identifies type of teaching information corresponding with first audio section of target；

Specifically, identification the first audio section of target, obtains text information corresponding with the first audio section of target, referred to as mesh Mark the first text information.Multiple the first audio sections of target are then respectively converted into text by multiple the first audio sections of target if it exists After being converted to Textual information, text information continuous in time is pressed in order to improve the accuracy of type of teaching information for information It is combined into the first text information of target according to timestamps ordering group, remaining text information is identified as the first text information of target. The first text information of target is inputted in the first textual classification model, is obtained using the classification feature of the first textual classification model Teaching means information corresponding with the first audio section of target；The first text information of target is inputted in the second textual classification model, Content of courses structure corresponding with target second audio section is obtained using the classification feature of the second textual classification model, it will be above-mentioned The teaching means information corresponding with the first audio section of target and content of courses structure got is used as the first type of teaching Information.Teaching means information is that the strategy of education informations is mutually transmitted between teachers and students, and teaching means information may include：It reads, think Examine, discuss, individual put question to or group put question to etc..Content of courses structure is the classroom link during teacher gives lessons, the content of courses Structure may include：Evaluate feedback, studying new knowledge, connection consolidation, summary etc..For example, the first audio section of target is：" ask classmate Read Third Nature section in textbook page 5, and think deeply two problems on blackboard ", first by above-mentioned the first audio of target Section is converted to the first text information of target, by corresponding first text point of the first text information of target input teaching means information In class model, identification " asks classmates to read textbook the 5th with the first audio-frequency information of target from above-mentioned first textual classification model The corresponding teaching means information of Third Nature section in page, and think deeply two problems on blackboard " is：It reads；By target first Text information inputs in corresponding second textual classification model of content of courses structure, knows from above-mentioned second textual classification model " does not ask classmates to read the Third Nature section in textbook page 5 with the first audio-frequency information of target, and think deeply two on blackboard The corresponding content of courses structure of problem " is：Studying new knowledge.

Optionally, speech recognition audio type belongs to the target second audio section of single student or more people student's types, obtains To text information corresponding with target second audio section, referred to as the second text information of target.It will be with target the second text envelope manner of breathing Adjacent the first text information of target and second text information of target is input to together in above-mentioned first textual classification model, from It is obtained and the matched teaching means information of target second audio section in first textual classification model；It will be with the second text envelope of target The first text information of target of manner of breathing neighbour and second text information of target are input to above-mentioned second textual classification model together In, it is obtained and the matched content of courses structure of target second audio section from the second textual classification model.It is got above-mentioned Teaching means information corresponding with target second audio section and content of courses structure are used as the first type of teaching information.Classroom In environment, general teacher is in leading position, the target second audio section for belonging to single student or more people student's types it Before, or later, certainly exist the audio-frequency information of teacher.Therefore, the first text of target and the second text of target are provided commonly for The the second type of teaching information for analyzing target second audio section has reasonability, and the result accuracy rate analyzed is higher.

Step S104, according to the first type of teaching information, the corresponding temporal information of the first audio section of the target, The corresponding temporal information of the target second audio section, the corresponding audio types of the target second audio section and audio frequency characteristics, Count the key link information in the classroom environment；The key link information includes teacher's behavioural analysis result and student Behavioural analysis result.

Specifically, according to teaching means information and content of courses structure, the first sound of target in the first type of teaching information Teacher's behavioural analysis result in key link information in the corresponding temporal information statistics classroom environment of frequency range；According to second Type of teaching information, the corresponding audio types of target second audio section and temporal information and audio frequency characteristics statistics key link letter Students ' behavior analysis in breath is as a result, above-mentioned teacher's behavioural analysis result and students ' behavior analysis result are used to evaluate classroom matter Amount.Audio frequency characteristics refer to the affective style for including according to the tone and decibel of audio identification audio, such as the audio pair of high tone The audio frequency characteristics answered are high characteristic type, and the corresponding audio frequency characteristics of the audio of low decibel are low characteristic type.Teacher's behavior Analysis is the result is that all teaching behaviors of the analysis teacher in classroom are resulting as a result, may include：Teacher's hours of instruction mention Ask time to be answered, teacher's emotional arousal number etc.；Students ' behavior analysis is the result is that teaching of the analysis student in classroom participates in Behavior is resulting as a result, may include：Student group answers number, student's think time, student classroom emotional arousal number Deng.For example, statistics the first audio section of target sound intermediate frequency feature is the duration of high type feature, when as teacher's emotional arousal Long, above-mentioned teacher's emotional arousal duration belongs to teacher's behavioural analysis result.

Further, Fig. 2 is referred to, is a kind of flow diagram for obtaining audio-frequency information provided in an embodiment of the present invention. As shown in Fig. 2, the step of step S201- step S204, is retouched to the specific of step S101 in embodiment corresponding to above-mentioned Fig. 1 a It states, i.e. the step of step S201- step S204 is a kind of detailed process for obtaining audio-frequency information provided in an embodiment of the present invention, It can specifically include following steps：

Original first audio-frequency information cutting is original first audio-frequency information of multiple units by step S201, and by original the Two audio-frequency information cuttings are original second audio-frequency information of multiple units；

Specifically, a length of 30ms when being multiple by the original first audio-frequency information cutting got from shotgun microphone Original first audio-frequency information of unit, wherein original first audio-frequency information of each unit has 480 sampled points, unit original The sample rate of one audio-frequency information is 16000Hz.Similarly, the original second audio-frequency information cutting that will be got from omnidirectional microphone The unit of a length of 30ms original second audio-frequency information when being multiple, original second audio-frequency information of each unit have 480 samplings Point, the sample rate of original second audio-frequency information of unit are 16000Hz.

Step S202 calculates the noise probability coefficent of the subband in original first audio-frequency information of the unit；

Specifically, first including the original first audio-frequency information frequency reducing of unit of 480 sampled points, 16000Hz by each For original first audio-frequency information of unit of 240 sampled points, 8000Hz.80Hz- in original first audio-frequency information of unit of account The noise probability of 250Hz, 250Hz-500Hz, 500Hz-1000Hz, 1000Hz-2000Hz, 2000Hz-3000Hz6 subbands Coefficient, calculation method be mainly pass through Gauss Markov model calculate each subband noise probability coefficent and voice it is general Rate coefficient.The speech probability P of each subband is calculated first with formula (1.1)_speech(x) and formula (1.2) calculating is each The noise probability P of a subband_noisy(x),

Wherein, σ_speechFor the corresponding voice variance of subband,For the corresponding speech mean of subband；σ_noisyFor subband Corresponding noise variance,For the corresponding noise mean value of subband.Voice variances sigma_speech, speech meanNoise variance σ_noisy, noise mean valueIt is by initializing in advance.I is the integer between 1 to 6, indicates the corresponding language of each subband Sound probability and noise probability.

The speech probability P of each subband is calculated_speech(x) and noise probability P_noisy(x) after, formula is utilized (1.3) in original first audio-frequency information of unit of account each subband noise probability coefficent L_s(x_i)：

L_s(x_i) indicate original first audio-frequency information of unit in each subband noise probability coefficent.

Step S203, if the noise probability coefficent is greater than probability threshold value, by original first audio-frequency information of the unit It deletes, and unit original second audio-frequency information Chong Die with the temporal information of original first audio-frequency information of the unit is deleted；

Specifically, as long as there are the noise probability coefficent L of a subband in original first audio-frequency information of unit_s(x_i) be greater than Probability threshold value, as long as that is to say that there are the noise probability coefficents of a subband to be greater than generally in the noise probability coefficent of 6 subbands Rate threshold value determines that the noise probability coefficent of corresponding original first audio-frequency information of unit is greater than probability threshold value (corresponding unit Original first audio-frequency information is noise types).And original first audio-frequency information of unit that will be greater than probability threshold value is from original first It is deleted in audio-frequency information, teacher's audio-frequency information, student audio information and mute audio is only retained in original first audio-frequency information Information is conducive to subsequent voice and is converted to the accuracy rate of text and obtains type of teaching letter from convolutional neural networks model The accuracy of breath.In order to make the duration of original second audio-frequency information and delete original the of original first audio-frequency information of unit The duration of one audio-frequency information is equal, only need to will be Chong Die with the temporal information of original first audio-frequency information of the unit of noise types Original second audio-frequency information of unit is deleted, without the noise probability coefficent of subband in original second audio-frequency information of unit of account. For example, original first audio-frequency information can the original first audio-frequency information A as unit of cutting, the original first audio-frequency information B of unit, The original first audio-frequency information C of unit, if the noise probability coefficent of the original first audio-frequency information B of unit is greater than threshold value, by unit Original first audio-frequency information B is deleted from original first audio-frequency information；It, will be with unit meanwhile in original second audio-frequency information The original second audio-frequency information D of unit of the temporal information overlapping of original first audio-frequency information B is deleted.

Step S204 merges original first audio-frequency information of remaining unit according to the sequence of timestamp, as institute The first audio-frequency information is stated, and original second audio-frequency information of remaining unit is merged according to the sequence of timestamp, as institute State the second audio-frequency information.

Specifically, after deleting noise probability coefficent greater than original first audio-frequency information of unit of probability threshold value, it will be remaining Original first audio-frequency information of unit merged according to the timestamps ordering in audio-frequency information, as the first audio-frequency information；Together Reason, original second audio-frequency information of remaining unit is merged according to the timestamps ordering in audio-frequency information, as the second sound Frequency information.By deleting original first audio-frequency information of unit of noise types, in the first audio-frequency information and the second audio-frequency information only Retain non-noise audio information portion (for example, teacher's audio-frequency information, student audio information, mute audio information etc.).

Further, Fig. 3 is referred to, is that a kind of process for identifying type of teaching information provided in an embodiment of the present invention is shown It is intended to.As shown in figure 3, the step of step S301- step S303 is the tool to step S103 in embodiment corresponding to above-mentioned Fig. 1 a Body description, i.e. the step of step S301- step S303 are a kind of tools for identifying type of teaching information provided in an embodiment of the present invention Body process, may include steps of：

First audio section of target is converted to the first text information of target by step S301, by first text of target The first textual classification model of this information input obtains and the matched teaching means information of the first audio section of target；

Specifically, speech recognition first audio section of target, obtains the first text information of target.Be converted to target first After text information, the first text information of target after conversion is inputted into the first textual classification model, first textual classification model It can be convolutional neural networks model, the teaching to match with the first audio section of target obtained from the first textual classification model Means Information.It is understood that above-mentioned first textual classification model complete in advance by building, building process can be one Textual information and a teaching means information are a training data, and the normal distribution in [- 1,1] initializes term vector at random Matrix, above-mentioned term vector matrix are 128 dimensions；The trained text that is used for of input is initialized, convolutional layer is then passed to Convolution is carried out, convolutional layer is 3 layers, and each layer of convolution kernel is respectively：3 × 3,5 × 5,8 × 8；Pond layer is transferred to after convolution, The order of magnitude of feature after convolution is reduced using maximum pondization；The output of last each kernel is connected to a vector, and transmits To full articulamentum, a text information the first textual classification model corresponding with teaching means information is constructed.First text classification After model foundation, text information is inputted into the first textual classification model, so that it may get corresponding teaching means information.

First text information of target is inputted the second textual classification model by step S302, is obtained and the target the The matched content of courses structure of one audio section；

Specifically, after the first audio section of target is converted to the first text information of target, by the first text of target after conversion In the second textual classification model of information input, which is also possible to convolutional neural networks model, from second The content of courses structure to match with the first audio section of target is obtained in textual classification model.Second textual classification model has mentioned Preceding building is completed, and building process may refer to the process that the first textual classification model is constructed in the step S301 in above-mentioned Fig. 3. After second textual classification model is established, text information is inputted into the second textual classification model, so that it may get corresponding religion Learn content structure.Teaching means information is obtained according to the first text information of target and content of courses structure is independent from each other two A step, that is to say, that before the step S301 in Fig. 3 can execute the step S302 in Fig. 3, or execute in step After S302 or step S301 and step S302 is performed simultaneously.

Step S303, by the corresponding teaching means information of the first audio section of target and the first audio section of the target Corresponding content of courses structure determination is the first type of teaching information corresponding with first audio section of target.

Specifically, the teaching means information that will be obtained from the first textual classification model, and from the second textual classification model The content of courses structure determination of middle acquisition is the corresponding type of teaching information of the first audio section of target.If there is multiple targets first Audio section executes above-mentioned steps to each the first audio section of target respectively, obtains the corresponding teaching of each the first audio section of target Type information.

The first audio-frequency information and second collected by obtaining the different radio reception device in two positions in classroom environment Audio-frequency information；The audio section for belonging to teacher's type is searched in the first audio-frequency information as the first audio section of target, in the second sound The nonoverlapping audio section of temporal information with the first audio section of target is searched in frequency information as target second audio section；By mesh It marks the first audio section and is converted to the first text information of target, according to the identification of the first text information of target and the first audio section of target Corresponding first type of teaching information；According to the first type of teaching information, the temporal information of the first audio section of target, target The audio types type and audio frequency characteristics of two audio sections count the key link information in classroom environment.Due to believing to audio During breath processing is to obtain classroom key link information, target sound frequency range can be automatically searched without manually participating in And identify the type of teaching information of target sound frequency range, then the key link information in programming count classroom environment, and then can keep away Exempt from the bring tedious steps due to key link information of manual analysis classroom, to improve the efficiency of analysis Classroom instruction quality.Together When, using the alternate analysis between the first audio-frequency information and the second audio-frequency information, can rationally, accurately analyze Classroom instruction quality.

Further, Fig. 4 is referred to, is the stream of another multimedia data processing method provided in an embodiment of the present invention Journey schematic diagram.As shown in figure 4, multimedia data processing method may include：

Step S401 obtains the first audio-frequency information and the second audio-frequency information belonged in classroom environment；

Step S402 searches the audio section for belonging to teacher's type, as the first sound of target in first audio-frequency information Frequency range, and target second audio section is searched in second audio-frequency information；The target second audio section and the target the The corresponding temporal information of one audio section does not overlap；

First audio section of target is converted to the first text information of target, according to the target first by step S403 Text information identifies the first type of teaching information corresponding with first audio section of target；

Step S404, according to the first type of teaching information, the corresponding temporal information of the first audio section of the target, The corresponding temporal information of the target second audio section, the corresponding audio types of the target second audio section and audio frequency characteristics, Count the key link information in the classroom environment；The key link information includes teacher's behavioural analysis result and student Behavioural analysis result；

Wherein, the detailed process of step S401- step S404 may refer to the step S101- step in above-mentioned Fig. 1 a The description of S104, and the detailed process for obtaining audio-frequency information may refer to retouching for the step S201- step S204 in above-mentioned Fig. 2 It states, the detailed process of the first type of teaching information of identification may refer to the description of the step S301- step S303 in above-mentioned Fig. 3, It is no longer repeated herein.

Step S405 obtains the index model to match with the classroom environment, and the index model is by teaching notes structure The knowledge-point models in classroom built；

Specifically, obtain in advance building complete with the matched index model of the first audio-frequency information, index model is to pass through Knowledge-point models in the classroom of teaching notes building, therefore include knowledge point in index model.For example, with the first audio-frequency information The index model matched is the index model about single vowel, therefore knowledge point included in the index model about single vowel For：Tone, a sound, two sound, three sound, the four tones of standard Chinese pronunciation.

Step S406 extracts the knowledge point text in the index model, in first text information of target, really The fixed object time with the text informations of the knowledge point text matches stabs, and when target in first audio-frequency information Between stab position addition Keyword Tag；Text in the Keyword Tag is identical as the knowledge point text；

Specifically, the knowledge point text in index model is successively extracted, by knowledge point text and the first text information of target In word carry out the matching based on text string editing distance, if in the first text information of target exist and knowledge point text The word matched, then extract the timestamp of matched word stabbed as the object time, above-mentioned target in the first audio-frequency information Keyword Tag is added in the position of timestamp, and the text in the Keyword Tag is identical as knowledge point text, is known with establishing one Point is known to the index between teacher's audio-frequency information.Matched detailed process is：In calculation knowledge point text and target text information The editing distance of word regard the shorter text of text size in knowledge point text and above-mentioned word as benchmark text.Judgement Whether the ratio of editing distance and the text size of benchmark text is less than preset fractional threshold (in general, fractional threshold is small In 25%), if being less than fractional threshold, it is determined that the word is matched with knowledge point text.For example, knowledge point text The editing distance of A and the word B in target text information are 1, and the text size of knowledge point text A is 5, and the text of word B is long Degree is 6, therefore benchmark text is knowledge point text A, text size 5, due to the text of editing distance and knowledge point text A The ratio of length is less than 25% (1/5=20% of fractional threshold<25%), therefore knowledge point text A and word B is matched.

Step S407, when getting searching keyword, retrieval and the inquiry are crucial in first audio-frequency information The matched Keyword Tag of word, as target keywords label；

Specifically, user can work as acquisition according to the knowledge point in index model from input module input inquiry keyword To user input searching keyword when, in the first audio-frequency information retrieval with the matched keyword mark of above-mentioned key word of the inquiry Label, as target keywords label.Retrieving equally can be to close in the searching keyword and Keyword Tag that calculate input Editing distance between key word can determine the keyword and keyword mark when editing distance is less than preset distance threshold Label are matched.If there is no the Keyword Tags with Keywords matching in the first audio-frequency information, sends and be not present to user The prompting message of corresponding audio-frequency information.

Step S408 is exported corresponding with the timestamp of the target keywords label in first audio-frequency information Audio-frequency information, and determine the curriculum information of audio-frequency information corresponding with the target keywords label.

Specifically, audio-frequency information corresponding with the timestamp of target keywords label is searched in the first audio-frequency information, And above-mentioned corresponding audio-frequency information is exported, while the course letter of audio-frequency information corresponding with target keywords label can be exported Breath, for example, name, academic title and the course hours of output teacher.If target keywords label more than one, according to The height of the matching degree of target keywords label and the searching keyword of input exports corresponding audio-frequency information, that is to say, that The matching degree of target keywords label and the searching keyword of input is higher, then what is exported is corresponding with target keywords label Audio-frequency information sequence it is more forward.

The first audio-frequency information and second collected by obtaining the different radio reception device in two positions in classroom environment Audio-frequency information；The audio section for belonging to teacher's type is searched in the first audio-frequency information as the first audio section of target, in the second sound The nonoverlapping audio section of temporal information with the first audio section of target is searched in frequency information as target second audio section；By mesh It marks the first audio section and is converted to the first text information of target, according to the identification of the first text information of target and the first audio section of target Corresponding first type of teaching information；According to the first type of teaching information, the temporal information of the first audio section of target, target The audio types and audio frequency characteristics of two audio sections count the key link information in classroom environment.Due to audio-frequency information During reason is to obtain classroom key link information, without manually participating in automatically searching target sound frequency range and know The type of teaching information of other target sound frequency range, then the key link information in programming count classroom environment, so can to avoid because Manual analysis classroom key link information and bring tedious steps, to improve the efficiency of analysis Classroom instruction quality.Meanwhile it adopting With the alternate analysis between the first audio-frequency information and the second audio-frequency information, can rationally, accurately analyze Classroom instruction quality, and build Knowledge point in vertical teaching notes can quickly position the Keywords matching with input to the index relative between the first audio-frequency information Audio-frequency information, improve classroom environment in audio recall precision.

Further, Fig. 5 is referred to, is the stream of another multimedia data processing method provided in an embodiment of the present invention Journey schematic diagram.As shown in figure 5, multimedia data processing method may include steps of：

Step S501 obtains the first audio-frequency information and the second audio-frequency information belonged in classroom environment；

Step S502 searches the audio section for belonging to teacher's type, as the first sound of target in first audio-frequency information Frequency range, and target second audio section is searched in second audio-frequency information；

First audio section of target is converted to the first text information of target, according to the target first by step S503 Text information identifies the first type of teaching information corresponding with first audio section of target；

Wherein, the detailed process of step S501- step S503 may refer to the step S101- step in above-mentioned Fig. 1 a The description of S103, and the detailed process for obtaining audio-frequency information may refer to retouching for the step S201- step S204 in above-mentioned Fig. 2 It states, the detailed process of the first type of teaching information of identification may refer to the description of the step S301- step S303 in above-mentioned Fig. 3, It is no longer repeated herein.

Step S504, according to the first type of teaching information, the corresponding temporal information of the first audio section of the target, Generate teacher's behavioural analysis result；

Specifically, statistics in the first type of teaching information, has the first audio section of target of identical teaching means information Total duration, as corresponding the first parameter of teacher's behavioural analysis of the teaching means information (for example, statistics " seminar " religion The total duration for learning to do segment information, as the first parameter of teacher's behavioural analysis)；Statistics has phase in the first type of teaching information With the total duration of the first audio section of target of content of courses structure, as the corresponding teacher's behavioural analysis of the content of courses structure Second parameter (for example, total duration of statistics " studying new knowledge " content of courses structure, as the second parameter of teacher's behavioural analysis)； Statistics is in the first type of teaching information, the quantity of the first audio section of target with identical teaching means information, as the religion The corresponding teacher's behavioural analysis third parameter of segment information is learned to do (for example, the target of statistics " seminar " teaching means information The quantity of first audio, as teacher's behavioural analysis third parameter)；Meter has identical teaching in the first type of teaching information The quantity of the first audio section of target of content structure, as corresponding the 4th parameter of teacher's behavioural analysis of the content of courses structure (for example, the quantity of first audio of target of statistics " studying new knowledge " content of courses structure, joins as teacher's behavioural analysis the 4th Number).According to above-mentioned the first parameter of teacher's behavioural analysis, the second parameter of teacher's behavioural analysis, teacher's behavioural analysis third parameter, The 4th parameter of teacher's behavioural analysis generates teacher's behavioural analysis as a result, teacher's behavioural analysis result may include： It is that give lessons duration, classroom of teacher reviews one's lessons duration, seminar duration, puts question to duration to be answered and teacher's emotional arousal duration etc..

Step S505, according to the second type of teaching information, the corresponding temporal information of the target second audio section, The corresponding audio types of the target second audio section and audio frequency characteristics generate students ' behavior and analyze result；

Specifically, statistics in the second type of teaching information, has the target second of identical audio frequency characteristics and audio types The quantity of audio section analyzes the first parameter (for example, statistics " single student " audio types, " being heightened in spirits " as students ' behavior The quantity of the target second audio section of audio frequency characteristics analyzes the first parameter as students ' behavior)；Statistics is in the second type of teaching In information, the total duration of the target second audio section with identical teaching means information is corresponding as the teaching means information Students ' behavior analyzes the second parameter (for example, the total duration of statistics " individual is putd question to " teaching means information, divides as students ' behavior Analyse the second parameter)；Statistics has the target second audio section of identical content structure information in the second type of teaching information Total duration, as the corresponding students ' behavior analysis third parameter of the content of courses structure (for example, statistics " classroom summary " teaching The total duration of content structure analyzes third parameter as students ' behavior)；Statistics has identical in the second type of teaching information The quantity of the target second audio section of teaching means information, as the corresponding students ' behavior analysis the 4th of the teaching means information Parameter (for example, the audio segment number of statistics " individual is putd question to " teaching means information, analyzes the 4th parameter as students ' behavior)； Statistics is in the second type of teaching information, the quantity of the target second audio section with identical content of courses structure, as the religion It learns the corresponding students ' behavior of content structure and analyzes the 5th parameter (for example, the audio section of statistics " classroom summary " content of courses structure Quantity analyzes the 5th parameter as students ' behavior).The first parameter, the students ' behavior point are analyzed according to above-mentioned students ' behavior It analyses the second parameter, students ' behavior analysis third parameter, the students ' behavior and analyzes the 4th parameter, the students ' behavior point The 5th parameter is analysed, generates the students ' behavior analysis as a result, students ' behavior analysis result may include：It is student's emotional arousal Number, review one's lessons duration, seminar duration and student group answer number etc..

Teacher's behavioural analysis result and students ' behavior analysis result are determined as the classroom by step S506 Key link information in environment.

Specifically, the teacher's behavior outcome and students ' behavior result that above-mentioned steps generate are determined as in classroom environment Key link information.

Further, Fig. 6 is referred to, is a kind of structure of apparatus for processing multimedia data provided in an embodiment of the present invention Schematic diagram.As shown in fig. 6, the apparatus for processing multimedia data 1 includes at least：First obtains module 11, the first searching module 12, the second searching module 13, conversion module 14, identification module 15, statistical module 16；

First obtains module 11, for obtaining the first audio-frequency information and the second audio-frequency information that belong in classroom environment；Institute It states the first audio-frequency information and second audio-frequency information is that the radio reception device for being located at different location by two respectively collects；

First searching module 12, for searching the audio section for belonging to teacher's type in first audio-frequency information, as The first audio section of target；

Second searching module 13, for searching target second audio section in second audio-frequency information；The target Two audio sections temporal information corresponding with first audio section of target does not overlap；

Conversion module 14, for first audio section of target to be converted to the first text information of target；

Identification module 15, for according to target text information identification corresponding with first audio section of target the One type of teaching information；

Statistical module 16, for according to the first type of teaching information, the first audio section of the target corresponding time Information, the corresponding temporal information of the target second audio section, the corresponding audio types of the target second audio section and audio Feature counts the key link information in the classroom environment；The key link information include teacher's behavioural analysis result and Students ' behavior analyzes result.

Wherein, first module 11, the first searching module 12, the second searching module 13, conversion module 14, identification mould are obtained Block 15, statistical module 16 specific implementation can be found in embodiment corresponding to above-mentioned Fig. 1 a to step S101- step S104 is not discussed here.

Further, Fig. 6 is referred to, apparatus for processing multimedia data 1 can also include：Determining module 17；

The conversion module 14 is also used to target that audio types are single student's type or more people student's types the Two audio sections are converted to the second text information of target, by target first text envelope adjacent with second text information of target Breath and second text information of target are input in first textual classification model, are obtained and the target second audio The matched teaching means information of section；

First obtains module 11, is also used to target first text information adjacent with second text information of target It is input in second textual classification model, obtains and the target second audio section with second text information of target Matched content of courses structure；

Determining module 17 is used for the corresponding teaching means information of the target second audio section and the target second The corresponding content of courses structure determination of audio section is the second type of teaching information corresponding with the target second audio section.

Wherein, first acquisition module 11, conversion module 14, determining module 17 specific implementation can be found in above-mentioned figure To step S103 in embodiment corresponding to 1a, it is not discussed here.

Further, Fig. 6 is referred to, apparatus for processing multimedia data 1 may include：First acquisition module 11, first are looked into Look for module 12, the second searching module 13, conversion module 14, identification module 15, statistical module 16, determining module 17；It can also wrap It includes：Second obtains module 18, adding module 19, retrieval module 20, output module 21.

Second obtains module 18, for obtaining the index model to match with the classroom environment；The index model is The knowledge-point models in classroom constructed by teaching notes；

Adding module 19, for extracting the knowledge point text in the index model, in first text information of target In, the determining object time with the text information of the knowledge point text matches stabs, and the mesh in first audio-frequency information Add Keyword Tag in the position for marking timestamp；Text in the Keyword Tag is identical as the knowledge braille text；

Retrieval module 20, for retrieving in first audio-frequency information and being looked into described when getting searching keyword The Keyword Tag for asking Keywords matching, as target keywords label；

Output module 21, for exporting the timestamp with the target keywords label in first audio-frequency information Corresponding audio-frequency information, and determine the curriculum information of audio-frequency information corresponding with the target keywords label.

Wherein, the second acquisition module 18, adding module 19, retrieval module 20, the specific implementation of output module 21 can Referring to, to step S405- step S408, being not discussed here in embodiment corresponding to above-mentioned Fig. 4.

As shown in fig. 6, the first searching module 12 may include：First cutting unit 121, the first recognition unit 122, One determination unit 123；

First cutting unit 121 is used to according to audio frequency characteristics be multiple units first by the first audio-frequency information cutting Audio section；Each the first audio section of unit is the audio section comprising continuous sound or is mute audio section；

First recognition unit 122, for each first audio section of unit to be input in audio classification model, point It Shi Bie not be with the matched audio types of each first audio section of unit；

First determination unit 123, for being teacher's type by the audio types in first audio-frequency information The first audio section of unit, be determined as first audio section of target.

Wherein, the first cutting unit 121, the first recognition unit 122, the first determination unit 123 specific implementation can Referring to, to step S102, being not discussed here in embodiment corresponding to above-mentioned Fig. 1 a.

As shown in fig. 6, the second searching module 13 may include：Second cutting unit 131, the second recognition unit 132, Two determination units 133；

Second cutting unit 131 is used to according to audio frequency characteristics be multiple the second audios of unit by the second audio-frequency information cutting Section；Each the second audio section of unit is the audio section comprising continuous sound or is mute audio section；

Second recognition unit 132, for each second audio section of unit to be input to the audio classification model In, it identifies and the matched audio types of each second audio section of unit respectively；

Second determination unit 133, for not being teacher's type by the audio types in second audio-frequency information The second audio section of unit, as audio section to be adjusted；

Second determination unit 133, if being also used to the temporal information and first audio of target of audio section to be adjusted The temporal information of section is not overlapped, it is determined that if being target second audio section for the audio section to be adjusted；

Second determination unit 133, if being also used to the temporal information and first audio of target of audio section to be adjusted The temporal information overlapping of section is determined as the target then by the not corresponding audio section of lap in the audio section to be adjusted Second audio section.

Wherein, the second cutting unit 131, the second recognition unit 132, the second determination unit 133 specific implementation can Referring to, to step S102, being not discussed here in embodiment corresponding to above-mentioned Fig. 1 a.

As shown in fig. 6, identification module 15 may include：Acquiring unit 151, third determination unit 152；

Acquiring unit 151, for first text information of target to be inputted the first textual classification model, acquisition and institute State the matched teaching means information of the first audio section of target；

The acquiring unit 151 is obtained for first text information of target to be inputted the second textual classification model With the matched content of courses structure of the first audio section of target；

Third determination unit 152 is used for the corresponding teaching means information of the first audio section of target and the target The corresponding content of courses structure determination of first audio section is the first type of teaching corresponding with first audio section of target letter Breath.

Wherein, acquiring unit 151, the specific implementation of third determination unit 152 can be found in real corresponding to above-mentioned Fig. 3 It applies in example to step S301- step S303, is not discussed here.

As shown in fig. 6, statistical module 16 may include：First generation unit 161, the second generation unit the 162, the 4th are really Order member 163；

First generation unit 161, for corresponding according to the first type of teaching information, the first audio section of the target Temporal information, generate teacher's behavioural analysis result；

Second generation unit 162, for corresponding according to the second type of teaching information, the target second audio section Temporal information, the corresponding audio types of the target second audio section and audio frequency characteristics, generate students ' behavior analyze result；

4th determination unit 163, for determining teacher's behavioural analysis result and students ' behavior analysis result For the key link information in the classroom environment.

Wherein, the first generation unit 161, the second generation unit 162, the specific implementation of the 4th determination unit 163 can Referring to, to step S504- step S506, being not discussed here in embodiment corresponding to above-mentioned Fig. 5.

As shown in fig. 6, the first generation unit 161 may include：First statistics subelement 1611, first generates subelement 1612；

First statistics subelement 1611, for according in the first type of teaching information teaching means information and institute The corresponding temporal information of the first audio section of target is stated, the total of first audio section of target with identical teaching means information is counted Duration obtains corresponding the first parameter of teacher's behavioural analysis of each teaching means information；

The first statistics subelement 1611, is also used to according to the teaching hand content in the first type of teaching information Structure and the corresponding temporal information of the first audio section of the target count first sound of target with identical content of courses structure The total duration of frequency range obtains corresponding the second parameter of teacher's behavioural analysis of each content of courses structure；

The first statistics subelement 1611 is also used to according to the teaching means letter in the first type of teaching information Breath counts the quantity with the first segment of target of identical teaching means information, obtains each teaching means information and respectively correspond Teacher's behavioural analysis third parameter；

The first statistics subelement 1611, is also used to according to the content of courses knot in the first type of teaching information Structure counts the quantity with the first segment of target of identical content of courses structure, obtains each content of courses structure and respectively correspond The 4th parameter of teacher's behavioural analysis；

First generates subelement 1612, for according to first parameter of teacher's behavioural analysis, teacher's behavioural analysis second Parameter, teacher's behavioural analysis third parameter, the 4th parameter of teacher's behavioural analysis, generate teacher's behavioural analysis result.

Wherein, the specific implementation that the first statistics subelement 1611, first generates subelement 1612 can be found in above-mentioned figure To step S504 in embodiment corresponding to 5, it is not discussed here.

As shown in fig. 6, the second generation unit 162 may include：Second statistics subelement 1621, second generates subelement 1622；

Second statistics subelement 1621, for counting the target second audio with identical audio frequency characteristics and audio types The quantity of section analyzes the first parameter as students ' behavior；

The second statistics subelement 1621 is also used to according to the teaching means letter in the second type of teaching information Temporal information corresponding with the target second audio section is ceased, the target second audio with identical teaching means information is counted The total duration of section obtains the corresponding students ' behavior of each teaching means information and analyzes the second parameter；

The second statistics subelement 1621, is also used to according to the content of courses knot in the second type of teaching information Structure and the corresponding temporal information of the target second audio section count the target second audio with identical content of courses structure The total duration of section obtains the corresponding students ' behavior analysis third parameter of each content of courses structure；

The second statistics subelement 1621 is also used to according to the teaching means letter in the second type of teaching information Breath counts the quantity with the second segment of target of identical teaching means information, obtains each teaching means information and respectively correspond Students ' behavior analyze the 4th parameter；

The second statistics subelement 1621, is also used to according to the content of courses knot in the second type of teaching information Structure counts the quantity with the second segment of target of identical content of courses structure, obtains each content of courses structure and respectively correspond Students ' behavior analyze the 5th parameter；

Second generates subelement 1622, and for analyzing the first parameter according to the students ' behavior, the students ' behavior is analyzed Second parameter, students ' behavior analysis third parameter, the students ' behavior analyze the 4th parameter, students ' behavior analysis 5th parameter generates the students ' behavior analysis result.

Wherein, the specific implementation that the second statistics subelement 1621, second generates subelement 1622 can be found in above-mentioned figure To step S505 in embodiment corresponding to 5, it is not discussed here.

Further, Fig. 7 is referred to, is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.Such as figure Shown in 7, the electronic equipment 1000 may include：Processor 1001, network interface 1004 and memory 1005, in addition, described Electronic equipment 1000 can also include：User interface 1003 and at least one communication bus 1002.Wherein, communication bus 1002 For realizing the connection communication between these components.Wherein, user interface 1003 may include display screen (Display), keyboard (Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 can Choosing may include standard wireline interface and wireless interface (such as WI-FI interface).Memory 1005 can be high-speed RAM and deposit Reservoir is also possible to non-labile memory (non-volatile memory), for example, at least a magnetic disk storage.It deposits Reservoir 1005 optionally can also be that at least one is located remotely from the storage device of aforementioned processor 1001.As shown in fig. 7, making To may include operating system, network communication module, Subscriber Interface Module SIM in a kind of memory 1005 of computer storage medium And equipment controls application program.

In electronic equipment 1000 shown in Fig. 7, network interface 1004 can provide network communication function；And user interface 1003 are mainly used for providing the interface of input for user；And processor 1001 can be used for calling and store in memory 1005 Equipment controls application program, to realize：

In one embodiment, the processor 1001 is searched in first audio-frequency information in execution belongs to teacher The audio section of type specifically executes following steps when as the first audio section of target：

In one embodiment, the processor 1001 is executing the lookup target second in second audio-frequency information When audio section, following steps are specifically executed：

In one embodiment, the processor 1001 is being executed according to first text information of target identification and institute When stating the corresponding first type of teaching information of the first audio section of target, following steps are specifically executed：

In one embodiment, the processor 1001 also executes following steps：

In one embodiment, the processor 1001 is being executed according to the first type of teaching information, the target The corresponding temporal information of first audio section, the corresponding temporal information of the target second audio section, the target second audio section Corresponding audio types and audio frequency characteristics, it is specific to execute following step when counting the key link information in the classroom environment Suddenly：

In one embodiment, the processor 1001 is being executed according to the first type of teaching information, the target First audio section corresponding temporal information specifically executes following steps when generating teacher's behavioural analysis result：

In one embodiment, the processor 1001 is being executed according to the second type of teaching information, the target The corresponding temporal information of second audio section, the corresponding audio types of the target second audio section and audio frequency characteristics generate student When behavioural analysis result, following steps are specifically executed：

In addition, it need to be noted that be：The embodiment of the invention also provides a kind of computer storage medium, and it is described Computer program performed by the apparatus for processing multimedia data 1 being mentioned above is stored in computer storage medium, and described Computer program includes program instruction, when the processor executes described program instruction, is able to carry out Fig. 1 a to Fig. 5 above To the description of the multimedia data processing method in corresponding embodiment, therefore, will no longer repeat here.In addition, right It is described using the beneficial effect of same procedure, is also no longer repeated.It is real for computer storage medium according to the present invention Undisclosed technical detail in example is applied, the description of embodiment of the present invention method is please referred to.

The term used in the embodiment of the present application is only and to be not intended to limit merely for for the purpose of describing particular embodiments The application processed.The embodiment of the present application and the "an" of singular used in the attached claims, " described " and "the" is also intended to including most forms, unless the context clearly indicates other meaning.It is also understood that used herein Term "and/or" refers to and includes that one or more associated any or all of project listed may combine.

Through the above description of the embodiments, it is apparent to those skilled in the art that, for description It is convenienct and succinct, only the example of the division of the above functional modules, in practical application, it can according to need and incite somebody to action Above-mentioned function distribution is completed by different functional modules, i.e., the internal structure of device is divided into different functional modules, with complete At all or part of function described above.The device of foregoing description and the specific work process of unit can refer to aforementioned Corresponding process in embodiment of the method, details are not described herein.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution essence of the application On all or part of the part that contributes to existing technology or the technical solution can be with the shape of software product in other words Formula embodies, which is stored in a storage medium, including some instructions are used so that a calculating Machine equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the application The all or part of the steps of each embodiment the method.And storage medium above-mentioned includes：USB flash disk, read-only is deposited mobile hard disk Reservoir (Read Only Memory；Hereinafter referred to as：ROM), random access memory (Random Access Memory；Below Referred to as：RAM), the various media that can store program code such as magnetic or disk.

The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, and is appointed What those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, answer Cover within the scope of protection of this application.Therefore, the protection scope of the application should be with the scope of protection of the claims It is quasi-.

Claims

1. a kind of multimedia data processing method, which is characterized in that including：

Obtain the first audio-frequency information and the second audio-frequency information belonged in classroom environment；First audio-frequency information and described second Audio-frequency information is that the radio reception device for being located at different location by two respectively collects；

The audio section for belonging to teacher's type is searched in first audio-frequency information, as the first audio section of target, and described Target second audio section is searched in second audio-frequency information；The target second audio section and first audio section of target are right respectively The temporal information answered does not overlap；

First audio section of target is converted into the first text information of target, according to first text information of target identification with The corresponding first type of teaching information of first audio section of target；

According to the first type of teaching information, the corresponding temporal information of the first audio section of the target, the second sound of the target The corresponding temporal information of frequency range, the corresponding audio types of the target second audio section and audio frequency characteristics, count the classroom ring Key link information in border；The key link information includes teacher's behavioural analysis result and students ' behavior analysis result.

2. the method according to claim 1, wherein described search in first audio-frequency information belongs to teacher The audio section of type, as the first audio section of target, including：

According to audio frequency characteristics by the first audio-frequency information cutting be multiple the first audio sections of unit；Each the first audio section of unit It for the audio section comprising continuous sound or is mute audio section；

Each first audio section of unit is input in audio classification model, respectively identification and each first sound of unit The audio types of frequency band match；

It is the first audio section of unit of teacher's type by the audio types in first audio-frequency information, is determined as First audio section of target.

3. according to the method described in claim 2, it is characterized in that, described search target second in second audio-frequency information Audio section, including：

According to audio frequency characteristics by the second audio-frequency information cutting be multiple the second audio sections of unit；Each the second audio section of unit is packet Audio section containing continuous sound is mute audio section；

Each second audio section of unit is input in the audio classification model, respectively identification and each unit the The matched audio types of two audio sections；

It is not the second audio section of unit of teacher's type by the audio types in second audio-frequency information, as to be adjusted Audio section；

If the temporal information of audio section to be adjusted and the temporal information of first audio section of target be not be overlapped, it is determined that be described If audio section to be adjusted is target second audio section；

If the temporal information of audio section to be adjusted is Chong Die with the temporal information of first audio section of target, will be described to be adjusted The not corresponding audio section of lap in audio section, is determined as the target second audio section.

4. according to the method described in claim 3, it is characterized in that, described according to first text information of target identification and institute The corresponding first type of teaching information of the first audio section of target is stated, including：

First text information of target is inputted into the first textual classification model, is obtained matched with first audio section of target Teaching means information；

First text information of target is inputted into the second textual classification model, is obtained matched with first audio section of target Content of courses structure；

By the corresponding teaching means information of first audio section of target and the corresponding content of courses of the first audio section of the target Structure determination is the first type of teaching information corresponding with first audio section of target.

5. according to the method described in claim 4, it is characterized in that, further including：

Target second audio section by audio types for single student's type or more people student's types is converted to the second text of target Information inputs target first text information adjacent with second text information of target and the second text information of the target To in first textual classification model, obtain and the matched teaching means information of the target second audio section；

Target first text information adjacent with second text information of target and the second text information of the target are inputted To in second textual classification model, obtain and the matched content of courses structure of the target second audio section；

By the corresponding teaching means information of the target second audio section and the corresponding content of courses of the target second audio section Structure determination is the second type of teaching information corresponding with the target second audio section.

6. the method according to claim 1, wherein further including：

Obtain the index model to match with the classroom environment；The index model is knowing in the classroom constructed by teaching notes Know point model；

The knowledge point text in the index model is extracted, in first text information of target, the determining and knowledge point The object time of the text information of text matches stabs, and the position of the object time stamp in first audio-frequency information is added and closed Key word label；Text in the Keyword Tag is identical as the knowledge point text；

When getting searching keyword, retrieval and the matched keyword of the searching keyword in first audio-frequency information Label, as target keywords label；

In first audio-frequency information, audio-frequency information corresponding with the timestamp of the target keywords label is exported, and really The curriculum information of fixed audio-frequency information corresponding with the target keywords label.

7. according to the method described in claim 5, it is characterized in that, described according to the first type of teaching information, the mesh Mark the corresponding temporal information of the first audio section, the corresponding temporal information of the target second audio section, the target second audio The corresponding audio types of section and audio frequency characteristics, count the key link information in the classroom environment, including：

According to the first type of teaching information, the corresponding temporal information of the first audio section of the target, teacher's behavior point is generated Analyse result；

According to the second type of teaching information, the corresponding temporal information of the target second audio section, the second sound of the target The corresponding audio types of frequency range and audio frequency characteristics generate students ' behavior and analyze result；

Teacher's behavioural analysis result and students ' behavior analysis result are determined as the crucial ring in the classroom environment Save information.

8. the method according to the description of claim 7 is characterized in that described according to the first type of teaching information, the mesh The corresponding temporal information of the first audio section is marked, generates teacher's behavioural analysis as a result, including：

According to the teaching means information and the corresponding time letter of the first audio section of the target in the first type of teaching information Breath counts the total duration with the first audio section of target of identical teaching means information, obtains each teaching means information difference Corresponding the first parameter of teacher's behavioural analysis；

According in the first type of teaching information teaching hand content structure and the first audio section of the target corresponding time Information counts the total duration with the first audio section of target of identical content of courses structure, obtains each content of courses structure point Not corresponding the second parameter of teacher's behavioural analysis；

According to the teaching means information in the first type of teaching information, the target the with identical teaching means information is counted The quantity of one segment obtains the corresponding teacher's behavioural analysis third parameter of each teaching means information；

According to the content of courses structure in the first type of teaching information, the target the with identical content of courses structure is counted The quantity of one segment obtains corresponding the 4th parameter of teacher's behavioural analysis of each content of courses structure；

According to first parameter of teacher's behavioural analysis, the second parameter of teacher's behavioural analysis, teacher's behavioural analysis third parameter, institute The 4th parameter of teacher's behavioural analysis is stated, teacher's behavioural analysis result is generated.

9. the method according to the description of claim 7 is characterized in that described according to the second type of teaching information, the mesh The corresponding temporal information of the second audio section, the corresponding audio types of the target second audio section and audio frequency characteristics are marked, generates and learns Raw behavioural analysis is as a result, include：

The quantity with the target second audio section of identical audio frequency characteristics and audio types is counted, as students ' behavior analysis first Parameter；

According to the teaching means information and the corresponding time letter of the target second audio section in the second type of teaching information Breath counts the total duration with the target second audio section of identical teaching means information, obtains each teaching means information difference Corresponding students ' behavior analyzes the second parameter；

According to the content of courses structure and the corresponding time letter of the target second audio section in the second type of teaching information Breath counts the total duration with the target second audio section of identical content of courses structure, obtains each content of courses structure difference Corresponding students ' behavior analyzes third parameter；

According to the teaching means information in the second type of teaching information, the target the with identical teaching means information is counted The quantity of two segments obtains the corresponding students ' behavior of each teaching means information and analyzes the 4th parameter；

According to the content of courses structure in the second type of teaching information, the target the with identical content of courses structure is counted The quantity of two segments obtains the corresponding students ' behavior of each content of courses structure and analyzes the 5th parameter；

The first parameter is analyzed according to the students ' behavior, the students ' behavior analyzes the second parameter, students ' behavior analysis the Three parameters, the students ' behavior analyze the 4th parameter, the students ' behavior analyzes the 5th parameter, generate the students ' behavior analysis As a result.

10. a kind of apparatus for processing multimedia data, which is characterized in that including：

First obtains module, for obtaining the first audio-frequency information and the second audio-frequency information that belong in classroom environment；Described first Audio-frequency information and second audio-frequency information are that the radio reception device for being located at different location by two respectively collects；

First searching module belongs to the audio section of teacher's type for searching in first audio-frequency information, as target the One audio section；

Identification module, for according to target text information identification the first teaching class corresponding with first audio section of target Type information；

Statistical module, for according to the first type of teaching information, the corresponding temporal information of the first audio section of the target, institute The corresponding temporal information of target second audio section, the corresponding audio types of the target second audio section and audio frequency characteristics are stated, are united Count the key link information in the classroom environment；The key link information includes teacher's behavioural analysis result and students ' behavior Analyze result.

11. device according to claim 10, which is characterized in that first searching module, including：

First cutting unit is used to according to audio frequency characteristics be multiple the first audio sections of unit by the first audio-frequency information cutting； Each the first audio section of unit is the audio section comprising continuous sound or is mute audio section；

First recognition unit, for each first audio section of unit to be input in audio classification model, respectively identification with The matched audio types of first audio section of each unit；

First determination unit, for by unit that the audio types in first audio-frequency information are teacher's type One audio section is determined as first audio section of target.

12. device according to claim 11, which is characterized in that second searching module, including：

Second cutting unit is used to according to audio frequency characteristics be multiple the second audio sections of unit by the second audio-frequency information cutting；Each The second audio section of unit is the audio section comprising continuous sound or is mute audio section；

Second recognition unit is known respectively for each second audio section of unit to be input in the audio classification model Not with the matched audio types of each second audio section of unit；

Second determination unit, for not being the unit second of teacher's type by the audio types in second audio-frequency information Audio section, as audio section to be adjusted；

Second determination unit, if being also used to the temporal information of audio section to be adjusted and the time of first audio section of target Information is not overlapped, it is determined that if being target second audio section for the audio section to be adjusted；

Second determination unit, if being also used to the temporal information of audio section to be adjusted and the time of first audio section of target Information overlap is determined as the target second audio then by the not corresponding audio section of lap in the audio section to be adjusted Section.

13. device according to claim 12, which is characterized in that the identification module, including：

Acquiring unit obtains and the target for first text information of target to be inputted the first textual classification model The matched teaching means information of one audio section；

The acquiring unit obtains and the mesh for first text information of target to be inputted the second textual classification model Mark the matched content of courses structure of the first audio section；

Third determination unit is used for the corresponding teaching means information of the first audio section of target and the first audio of the target The corresponding content of courses structure determination of section is the first type of teaching information corresponding with first audio section of target.

14. device according to claim 13, which is characterized in that further include：

The conversion module is also used to audio types be single student's type or the target second audio section of more people student's types The second text information of target is converted to, by target first text information adjacent with second text information of target and the mesh It marks the second text information to be input in first textual classification model, obtain and the matched teaching of target second audio section Means Information；

First obtains module, is also used to target first text information adjacent with second text information of target and the mesh It marks the second text information to be input in second textual classification model, obtain and the matched teaching of target second audio section Content structure；

Determining module, for the corresponding teaching means information of the target second audio section and the target second audio section is right The content of courses structure determination answered is the second type of teaching information corresponding with the target second audio section.

15. device according to claim 10, which is characterized in that further include：

Second obtains module, for obtaining the index model to match with the classroom environment；The index model is to pass through religion Knowledge-point models in the classroom of case building；

Adding module in first text information of target, is determined for extracting the knowledge point text in the index model It is stabbed with the object time of the text information of the knowledge point text matches, and the object time stamp in first audio-frequency information Position add Keyword Tag；Text in the Keyword Tag is identical as the knowledge braille text；

Retrieval module, for when getting searching keyword, retrieval and the inquiry to be crucial in first audio-frequency information The matched Keyword Tag of word, as target keywords label；

Output module, for exporting corresponding with the timestamp of the target keywords label in first audio-frequency information Audio-frequency information, and determine the curriculum information of audio-frequency information corresponding with the target keywords label.

16. device according to claim 14, which is characterized in that the statistical module, including：

First generation unit, for being believed according to the first type of teaching information, the first audio section of the target corresponding time Breath generates teacher's behavioural analysis result；

Second generation unit, for being believed according to the second type of teaching information, the target second audio section corresponding time Breath, the corresponding audio types of the target second audio section and audio frequency characteristics generate students ' behavior and analyze result；

4th determination unit, for teacher's behavioural analysis result and students ' behavior analysis result to be determined as the class Key link information in hall environment.

17. device according to claim 16, which is characterized in that first generation unit, including：

First statistics subelement, for according to the teaching means information and the target the in the first type of teaching information The corresponding temporal information of one audio section counts the total duration with the first audio section of target of identical teaching means information, obtains Corresponding the first parameter of teacher's behavioural analysis of each teaching means information；

It is described first statistics subelement, be also used to according in the first type of teaching information teaching hand content structure and institute State the corresponding temporal information of the first audio section of target, count first audio section of target with identical content of courses structure it is total when It is long, obtain corresponding the second parameter of teacher's behavioural analysis of each content of courses structure；

The first statistics subelement is also used to according to the teaching means information in the first type of teaching information, statistics tool There is the quantity of the first segment of target of identical teaching means information, obtains the corresponding teacher's behavior of each teaching means information Analyze third parameter；

The first statistics subelement is also used to according to the content of courses structure in the first type of teaching information, statistics tool There is the quantity of the first segment of target of identical content of courses structure, obtains the corresponding teacher's behavior of each content of courses structure Analyze the 4th parameter；

First generates subelement, for according to first parameter of teacher's behavioural analysis, the second parameter of teacher's behavioural analysis, teacher Behavioural analysis third parameter, the 4th parameter of teacher's behavioural analysis, generate teacher's behavioural analysis result.

18. device according to claim 16, which is characterized in that second generation unit, including：

Second statistics subelement, for counting the quantity of the target second audio section with identical audio frequency characteristics and audio types, The first parameter is analyzed as students ' behavior；

The second statistics subelement is also used to according to teaching means information in the second type of teaching information and described The corresponding temporal information of target second audio section, count have identical teaching means information target second audio section it is total when It is long, it obtains the corresponding students ' behavior of each teaching means information and analyzes the second parameter；

The second statistics subelement is also used to according to content of courses structure in the second type of teaching information and described The corresponding temporal information of target second audio section, count have identical content of courses structure target second audio section it is total when It is long, obtain the corresponding students ' behavior analysis third parameter of each content of courses structure；

The second statistics subelement is also used to according to the teaching means information in the second type of teaching information, statistics tool There is the quantity of the second segment of target of identical teaching means information, obtains the corresponding students ' behavior of each teaching means information Analyze the 4th parameter；

The second statistics subelement is also used to according to the content of courses structure in the second type of teaching information, statistics tool There is the quantity of the second segment of target of identical content of courses structure, obtains the corresponding students ' behavior of each content of courses structure Analyze the 5th parameter；

Second generate subelement, for according to the students ' behavior analyze the first parameter, the students ' behavior analyze the second parameter, The students ' behavior analysis third parameter, the students ' behavior analyzes the 4th parameter, the students ' behavior analyzes the 5th parameter, raw Result is analyzed at the students ' behavior.

19. a kind of electronic equipment, which is characterized in that including：Processor and memory, the processor are connected with memory, In, the memory is configured for calling said program code for storing program code, the processor, executes such as right It is required that the described in any item methods of 1-9.

20. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with computer program, described Computer program includes program instruction, is executed when the processor executes described program instruction such as any one of claim 1-9 The method.