CN112905748A - Speech effect evaluation system - Google Patents

Speech effect evaluation system Download PDF

Info

Publication number
CN112905748A
CN112905748A CN202110252324.4A CN202110252324A CN112905748A CN 112905748 A CN112905748 A CN 112905748A CN 202110252324 A CN202110252324 A CN 202110252324A CN 112905748 A CN112905748 A CN 112905748A
Authority
CN
China
Prior art keywords
speech
information
guest
performance
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110252324.4A
Other languages
Chinese (zh)
Inventor
黄海
刘堃
聂镭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Longma Zhixin Zhuhai Hengqin Technology Co ltd
Original Assignee
Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Longma Zhixin Zhuhai Hengqin Technology Co ltd filed Critical Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority to CN202110252324.4A priority Critical patent/CN112905748A/en
Publication of CN112905748A publication Critical patent/CN112905748A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application is suitable for artificial intelligence technical field, provides a speech effect evaluation system, includes: the acquisition equipment is used for acquiring the voice information of the preset area and sending the voice information to the evaluation equipment; the evaluation equipment is connected with the acquisition equipment and used for determining the target guest for speaking according to the voice information and generating an acquisition instruction; the acquisition equipment is also used for receiving the acquisition instruction from the evaluation equipment and acquiring the speech performance information of the target speech honoured guest in response to the acquisition instruction; and the evaluation equipment is also used for acquiring the speech performance information of the target speech honoured guest to obtain the speech effect score of the target speech honoured guest. Therefore, in the application, on one hand, the speech effect of the guest speeches is automatically evaluated in the conference holding process, so that the reasonable field cost is conveniently and subsequently evaluated for the guest speeches according to the speech effect; on the other hand, the performance of the guest speeches in the speech process is considered from multiple dimensions, so that the speech effect of the guest speeches is accurately obtained.

Description

Speech effect evaluation system
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a speech effect evaluation system.
Background
When a training conference is held, the cost of the reporter paying the lecture guests is simply
The estimation is carried out according to the background (such as historical speech information, industry awareness and the like) of the guest speech, and the speech effect of the guest speech is not combined, so that the field cost paid to the guest speech is unreasonable, and the estimation of the speech effect during the speech of the guest speech becomes additionally important for reasonably paying the field cost of the guest speech. In the prior art, the conference effect of the guest lecturer is only roughly evaluated according to the number of participants, and a relatively accurate lecturer effect evaluation mode is lacked.
Disclosure of Invention
The embodiment of the application provides a speech effect evaluation system, which can solve the problem that in the prior art, the conference effect of a speech honoured guest is evaluated inaccurately.
The embodiment of the application provides a speech effect evaluation system, including:
the acquisition equipment is used for acquiring voice information of a preset area and sending the voice information to the evaluation equipment;
the evaluation device is connected with the acquisition device and used for acquiring the voice information from the acquisition device, determining a target lecture honoured guest according to the voice information, generating an acquisition instruction and sending the acquisition instruction to the acquisition device;
the acquisition device is further configured to receive the acquisition instruction from the evaluation device, and in response to the acquisition instruction, acquire speech performance information of the target speech honoured guest, where the speech performance information includes speech content information, speech etiquette information, speech skill information, and speech time control information;
the evaluation equipment is further used for obtaining the speech performance information of the target speech honoured guest and obtaining the speech effect score of the target speech honoured guest.
In a possible implementation manner, the acquisition device is specifically configured to: acquiring a wake-up instruction, and sending the wake-up instruction to evaluation equipment;
the evaluation device is specifically configured to:
acquiring a wake-up instruction from the acquisition equipment, wherein the wake-up instruction is used for starting a determination process for identifying a target speech guest from candidate target speech guests;
responding to the awakening instruction, and acquiring characteristic information of the candidate user within preset time;
and determining a target speaking guest in the candidate user according to the characteristic information of the candidate user.
In a possible implementation manner, the evaluation device is specifically configured to:
scoring the speech performance information to obtain speech performance scores;
and calculating the score of the speech effect according to the score of the speech performance.
In a possible implementation manner, the evaluation device is specifically configured to:
acquiring a comprehensive weight value corresponding to the speech performance information, wherein the comprehensive weight value comprises a first weight value corresponding to the speech content information, a second weight value corresponding to the speech existential information, a third weight value corresponding to the speech skill information and a fourth weight value corresponding to the speech performance information;
coding the speech expression information to obtain a comprehensive coded value, wherein the comprehensive coded value comprises a first coded value, a second coded value, a third coded value and a fourth coded value;
and substituting the comprehensive coding value into the following formula to calculate the speech performance score:
Figure 543412DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 152247DEST_PATH_IMAGE002
representing the lecture representation score of the guest of the target lecture, i represents the performance information of the second lecture, wherein i1Information indicating the content of speech, i2Representing speech state information, i3Indicating speech skills information, i4Representing lecture time control information; n represents the number of speech performance information; ziRepresents the comprehensive coding value corresponding to the speech performance information, wherein Z1A first coding value, Z, corresponding to the presentation content information2A second coded value, Z, corresponding to presentation instrument state information3Third coded value, Z, corresponding to presentation skill information4A fourth coding value corresponding to the presentation time information is represented; miRepresenting the comprehensive weight value corresponding to the speech performance information, wherein M1A first weight value, M, corresponding to the speech content information2Second weight value, M, corresponding to presentation instrument state information3Third weight value, M, corresponding to presentation skill information4And a fourth weight value corresponding to the speech skill information.
In a possible implementation manner, the evaluation device is specifically configured to:
substituting the speech performance score into the following formula to calculate the speech effect score:
Figure 732265DEST_PATH_IMAGE003
wherein, PniRepresenting the performance score calculated by the lecture guests with the sequence number n at the meeting with the sequence number i,
Figure 742946DEST_PATH_IMAGE004
represents the performance score of the speech of the guest of the speech,
Figure 710902DEST_PATH_IMAGE005
a level of a property of the conference is indicated,
Figure 807034DEST_PATH_IMAGE006
indicating the difficulty of the pre-set evaluation,
Figure 49796DEST_PATH_IMAGE007
representing the historical speech performance level of the speech guests.
Compared with the prior art, the embodiment of the application has the advantages that:
in the embodiment of the application, the evaluation system evaluates the speech effect of the guest speech according to the speech performance information of the guest speech acquired by the acquisition equipment by virtue of the evaluation equipment, on one hand, the speech effect of the guest speech is automatically evaluated in the conference holding process, so that the reasonable field cost is conveniently and subsequently evaluated for the guest speech according to the speech effect; on the other hand, the performance of the guest speeches in the speech process is considered from multiple dimensions, so that the speech effect of the guest speeches is accurately obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flow chart of a speech effect evaluation system provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a speech effect evaluation method provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of a specific implementation of step S201 in fig. 2 according to an embodiment of the present application;
fig. 4 is a schematic flowchart of step S203 in fig. 2 according to an embodiment of the present disclosure;
fig. 5 is a block diagram illustrating a structure of a speech effect evaluation apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an evaluation device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
Referring to fig. 1, a schematic structural diagram of a speech effect evaluation system provided in the embodiment of the present application includes an evaluation device 10, and
the system comprises a collecting device 10 and an evaluating device 20 connected with the collecting device, wherein the collecting device can be an audio collecting device and a camera device which are arranged in a preset range of a preset area, the evaluating device can be a server or a terminal device, the server can be a computing device such as a cloud server, and the terminal device can be a computing device such as a desktop computer, a notebook computer and a palm computer.
And the acquisition equipment is used for acquiring the voice information of the preset area and sending the voice information to the evaluation equipment.
The preset area may be a lecture area, such as a podium.
The evaluation equipment is connected with the acquisition equipment and used for acquiring the voice information from the acquisition equipment, determining the target speaking honoured guest according to the voice information, generating an acquisition instruction and sending the acquisition instruction to the acquisition equipment;
the acquisition equipment is also used for receiving the acquisition instruction from the evaluation equipment and acquiring the speech performance information of the target speech honoured guest in response to the acquisition instruction, wherein the speech performance information comprises speech content information, speech appearance information, speech skill information and speech time control information;
and the evaluation equipment is also used for acquiring the speech performance information of the target speech honoured guest to obtain the speech effect score of the target speech honoured guest.
In a possible implementation, the acquisition device is specifically configured to: acquiring a wake-up instruction, and sending the wake-up instruction to evaluation equipment;
the evaluation device is specifically configured to:
acquiring a wake-up instruction from acquisition equipment, wherein the wake-up instruction is used for starting a determination process for identifying a target speaking guest from candidate target speaking guests;
responding to the awakening instruction, and acquiring characteristic information of the candidate user within preset time;
and determining a target speaking guest in the candidate user according to the characteristic information of the candidate user.
It is to be understood that the wake-up instruction may be a predetermined keyword in the voice text corresponding to the moderator, for example, "training everybody below. In a specific application, the implementation manner of obtaining the wake-up instruction may be: before a conference begins, pre-recording voice information of a host, and inputting a body shadow of the host into a voiceprint recognition model to obtain voiceprint characteristics of the host; after the conference begins, the assessment device acquires voice information from an audio acquisition device arranged in a preset area (such as a speech area), and after the voice information is recognized as voice information of a host according to a voiceprint recognition model, the voice information of the host is converted into a voice text through a voice recognition technology, and then a preset keyword in the voice text is found out through a text recognition technology, so that a wake-up instruction is generated.
In a possible implementation, the evaluation device is specifically configured to:
scoring the speech performance information to obtain speech performance scores;
and calculating the score of the speech effect according to the score of the speech performance.
In a possible implementation, the evaluation device is specifically configured to:
acquiring a comprehensive weight value corresponding to the speech performance information, wherein the comprehensive weight value comprises a first weight value corresponding to speech content information, a second weight value corresponding to speech appearance information, a third weight value corresponding to speech skill information and a fourth weight value corresponding to the speech performance information;
coding the speech performance information to obtain a comprehensive coded value, wherein the comprehensive coded value comprises a first coded value, a second coded value, a third coded value and a fourth coded value;
and substituting the comprehensive coding value into the following formula to calculate the speech performance score:
Figure 914984DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 53841DEST_PATH_IMAGE002
representing the lecture representation score of the guest of the target lecture, i represents the performance information of the second lecture, wherein i1Information indicating the content of speech, i2Representing speech state information, i3Indicating speech skills information, i4Representing lecture time control information; n represents the number of speech performance information; ziRepresents the comprehensive coding value corresponding to the speech performance information, wherein Z1A first coding value, Z, corresponding to the presentation content information2A second coded value, Z, corresponding to presentation instrument state information3Third coded value, Z, corresponding to presentation skill information4A fourth coding value corresponding to the presentation time information is represented; miRepresenting the comprehensive weight value corresponding to the speech performance information, wherein M1A first weight value, M, corresponding to the speech content information2Second weight value, M, corresponding to presentation instrument state information3Third weight value, M, corresponding to presentation skill information4In a possible implementation manner, the evaluation device is specifically configured to:
and substituting the speech performance score into the following formula to calculate the speech effect score:
Figure 637270DEST_PATH_IMAGE003
wherein, PniRepresenting the performance score calculated by the lecture guests with the sequence number n at the meeting with the sequence number i,
Figure 293510DEST_PATH_IMAGE004
represents the performance score of the speech of the guest of the speech,
Figure 278784DEST_PATH_IMAGE005
a level of a property of the conference is indicated,
Figure 588542DEST_PATH_IMAGE006
indicating the difficulty of the pre-set evaluation,
Figure 190425DEST_PATH_IMAGE007
representing the historical speech performance level of the speech guests.
In the embodiment of the application, the evaluation system evaluates the speech effect of the guest speech according to the speech performance information of the guest speech acquired by the acquisition equipment by virtue of the evaluation equipment, on one hand, the speech effect of the guest speech is automatically evaluated in the conference holding process, so that the reasonable field cost is conveniently and subsequently evaluated for the guest speech according to the speech effect; on the other hand, the performance of the guest speeches in the speech process is considered from multiple dimensions, so that the speech effect of the guest speeches is accurately obtained.
The workflow on the evaluation device side is described below.
Referring to fig. 2, a schematic flow chart of a speech effect evaluation method provided in an embodiment of the present application is applied to an evaluation device, where the evaluation device is a terminal device or a server, the terminal device may be a mobile phone, a notebook, and other computing devices, and the server may be a cloud server and other computing devices, and the method includes the following steps:
step S201, determining a target speaking guest.
It can be understood that a conference includes many conference roles, for example, a conference host staff, a conference host, a lecture guest, and conference audience, and the identity of a target lecture guest needs to be determined, so as to facilitate subsequent lecture effect evaluation on the target guest.
In a specific application, as shown in fig. 3, for a specific implementation flow diagram of step S201 in fig. 2 of the speech effect evaluation method provided in the embodiment of the present application, determining a target guest for speech includes:
and S301, acquiring a wake-up instruction.
The awakening instruction is used for starting a determining process for identifying the target speaking guests from the candidate target speaking guests.
For example, the wake-up instruction may be a preset keyword in the voice text corresponding to the moderator, such as "below. In a specific application, the implementation manner of obtaining the wake-up instruction may be: before a conference begins, pre-recording voice information of a host, and inputting a body shadow of the host into a voiceprint recognition model to obtain voiceprint characteristics of the host; after the conference begins, the assessment device acquires voice information from an audio acquisition device arranged in a preset area (such as a speech area), and after the voice information is recognized as voice information of a host according to a voiceprint recognition model, the voice information of the host is converted into a voice text through a voice recognition technology, and then a preset keyword in the voice text is found out through a text recognition technology, so that a wake-up instruction is generated.
Step S302, responding to the awakening instruction, and acquiring the feature information of the candidate user within preset time.
The feature information of the candidate user may be voiceprint information, face information, and the like.
For example, the evaluation device may obtain voice information of all participants (e.g., a conference host staff, a conference host, a lecture guest, a participant audience, etc.) through an audio obtaining device disposed at the conference entrance, and input the voice information corresponding to all the participants into the voiceprint recognition model; and simultaneously, acquiring the face information of all participants, such as conference host staff, conference host, lecture guest, participant audience and the like) through a camera device arranged at a conference entrance, and inputting the face information corresponding to all the participants into a face recognition model.
And S303, determining a target speaking guest in the candidate user according to the characteristic information of the candidate user.
Illustratively, the evaluation device acquires voice information through an audio acquisition device arranged in a preset area (e.g., a speech area), identifies the voice information as a guest in speech according to a voiceprint recognition model, acquires face information through a camera arranged in the preset area (e.g., the speech area), and further determines whether the guest in speech is according to the face information.
And S202, obtaining the speech performance information of the target speech honoured guest.
The speech performance information comprises speech content information, speech ceremony state information, speech skill information and speech time control information.
It can be understood that, in the embodiment of the present application, the performance of the target lecture guests in the lecture process is mainly considered from the following dimensions:
(1) the speech content comprises speech content, speech architecture and speech words, for example, the speech content is clear in subject, the speech architecture is clear in clause, and the speech words are accurate and reasonable;
(2) the presentation appearance comprises the body movement, the expression and the like, for example, the gesture is used properly, and the expression is rich.
(3) The speech skills include speech speed, intonation, volume, etc., for example, the voice information is bright and bright, and the pronunciation is restrained from rising and falling.
(4) The speech time is controlled, for example, the control time is appropriate and conforms to the duration of the speech outline.
In specific application, the manner of obtaining the speech content information may be that an audio acquisition device arranged in a preset area (for example, a speech area) obtains voice information of a target guest in the speech, and then converts the voice information into text information according to a voice recognition technology, and then extracts speech content, a speech architecture and speech words in the text information according to a natural language processing technology.
The method for acquiring the state information of the lecture may be that the camera device arranged in a preset area (e.g., a lecture area) performs gesture recognition and expression recognition on the target guest during the lecture.
The way of acquiring the time mastered information is to calculate the speech time of the target speech guest and check whether the speech time of the target speech guest conforms to the time of the outline of the speech.
And step 203, identifying the speech performance information to obtain the speech effect score of the target speech honoured guest.
In a specific application, as shown in fig. 4, for a specific implementation flow diagram of step S203 in fig. 2 of the speech effect evaluation method provided in the embodiment of the present application, identifying speech performance information to obtain a speech effect score of a target guest speech, the method includes:
and S401, scoring the speech performance information to obtain speech performance scores.
Illustratively, scoring the speech performance information to obtain a speech performance score includes:
the method comprises the steps of firstly, obtaining a comprehensive weight value corresponding to speech performance information, wherein the comprehensive weight value comprises a first weight value corresponding to speech content information, a second weight value corresponding to speech appearance information, a third weight value corresponding to speech skill information and a fourth weight value corresponding to speech time control information.
It should be noted that the first weight value, the second weight value, the third weight value, and the fourth weight value may be preset fixed values, for example, set differently according to the requirements of the host on the lecture content, the lecture ceremony, the lecture skill, and the lecture time.
Preferably, the first weight value and the third weight value can be modified according to the performance of the participator audience in the speech of the speech guest. It is understood that not only the performance of the speaking guests, but also the feedback of the participating audience is considered when evaluating the performance of the speech.
Illustratively, the evaluation device acquires facial expression information, limb action information, and the like of the participating audience through a camera device of the conference site, and then recognizes whether positive emotion (e.g., concentration, participation, boredom, exploration, confidence, trust, pleasure, and the like) and specific positive action (e.g., applause, nodding, and the like) occur in the facial expression information of the participating audience through a pre-trained neural network model, thereby adding a corresponding value on the basis of the first weight value and adding a corresponding value on the basis of the second weight value.
And secondly, coding the speech expression information to obtain a comprehensive coded value, wherein the comprehensive coded value comprises a first coded value, a second coded value, a third coded value and a fourth coded value.
Specifically, the process of encoding the speech content information to obtain the first encoded value may be:
and recognizing speech content, speech architecture and speech words extracted from the text information according to a natural language processing technology, and obtaining a 3-dimensional vector [1, 1, 0] by adopting an onehot coding mode according to whether the speech content, the speech architecture and the speech words meet a preset standard (1 meets the preset standard and 0 does not meet the preset standard).
Specifically, the process of encoding the speech ceremony state information to obtain the second encoded value may be:
and recognizing the gesture and the expression of the guest speech, and obtaining a 2-dimensional vector [1, 1] by adopting an onehot coding mode according to whether the gesture of the guest speech meets the standard (1 is met and 0 is not met).
Specifically, the process of encoding the lecture skill information to obtain the third encoded value may be:
and identifying whether the speech speed, the tone, the volume and the like of the guest speech meet the preset standards (meet 1 and do not meet 0), and obtaining a 3-dimensional vector [1, 0, 1] by adopting an oneht coding mode.
And identifying whether the speech duration of the guest speeches accords with the speech duration in the speech outline or not, and obtaining a 1-dimensional vector 1 by adopting an onehot coding mode.
And finally, carrying out unified dimension processing on all the vectors, and taking the highest dimension as a unified dimension standard.
Thirdly, substituting the comprehensive coding value into the following formula to calculate the speech performance score:
Figure 243831DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 83611DEST_PATH_IMAGE002
representing the lecture representation score of the guest of the target lecture, i represents the performance information of the second lecture, wherein i1Information indicating the content of speech, i2Representing speech state information, i3Indicating speech skills information, i4Indicating speech timeControlling the message; n represents the total number of the lecture performance information; ziRepresents the comprehensive coding value corresponding to the speech performance information, wherein Z1A first coding value, Z, corresponding to the presentation content information2A second coded value, Z, corresponding to presentation instrument state information3Third coded value, Z, corresponding to presentation skill information4A fourth coding value corresponding to the presentation time information is represented; miRepresenting the comprehensive weight value corresponding to the speech performance information, wherein M1A first weight value, M, corresponding to the speech content information2Second weight value, M, corresponding to presentation instrument state information3Third weight value, M, corresponding to presentation skill information4And a fourth weight value corresponding to the speech skill information.
And step S402, calculating a speech effect score according to the speech performance score.
Illustratively, calculating the speech effect score according to the speech performance score comprises:
and substituting the speech performance score into the following formula to calculate the speech effect score:
Figure 564271DEST_PATH_IMAGE003
wherein, PniRepresenting that the guest n in the speech obtains the score of speech effect at the meeting i,
Figure 856712DEST_PATH_IMAGE004
represents the performance score of the speech of the guest of the speech,
Figure 120335DEST_PATH_IMAGE008
indicating a level of conference nature (the level of conference nature being classified as mini-conference, mid-conference, macro-conference or super-conference),
Figure 814621DEST_PATH_IMAGE006
indicating the preset evaluation difficulty (the evaluation difficulty is a fixed value and can be set according to the requirements of the host),
Figure 466182DEST_PATH_IMAGE007
and representing the historical speech performance level of the guest speech (the historical speech performance level is the difference between the historical performance of the guest speech and a standard performance threshold value as a performance level).
It can be understood that in the embodiment of the application, the calculation of the speech effect of the speech honoured guest not only considers the speech performance capability of the speech honoured guest, but also comprehensively considers the conference property level, the preset evaluation difficulty and the historical speech performance level of the speech honoured guest, so that the evaluation result is more accurate.
In the embodiment of the application, the speech effect of the guest lecture is evaluated according to the speech performance information of the guest lecture by means of the evaluation equipment (including the terminal equipment or the server and other hardware equipment), on one hand, the speech effect of the guest lecture is automatically evaluated in the conference holding process, and the reasonable field cost is conveniently and subsequently evaluated for the guest lecture according to the speech effect; on the other hand, the performance of the guest speeches in the speech process is considered from multiple dimensions, so that the speech effect of the guest speeches is accurately obtained.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 5 shows a block diagram of a speech effect evaluation device provided in the embodiment of the present application, which corresponds to the speech effect evaluation method described in the foregoing embodiment, and only shows the relevant parts in the embodiment of the present application for convenience of description.
Referring to fig. 5, the apparatus includes:
a determining module 51, configured to determine a target lecture guest;
the obtaining module 52 is configured to obtain speech performance information of the target guest speech, where the speech performance information includes speech content information, speech performance state information, speech skill information, and speech time control information;
and the identifying module 53 is configured to identify the speech performance information to obtain a speech effect score of the target guest speech.
In one possible implementation, the determining module includes:
the device comprises an acquisition unit, a processing unit and a control unit, wherein the acquisition unit is used for acquiring a wake-up instruction, and the wake-up instruction is used for starting a determination process for identifying a target speech guest from candidate target speech guests;
the response unit is used for responding to the awakening instruction and acquiring the characteristic information of the candidate user within preset time;
and the determining unit is used for determining a target lecture guest in the candidate user according to the characteristic information of the candidate user.
In one possible implementation, the identification module includes:
the scoring unit is used for scoring the speech performance information to obtain a speech performance score;
and the calculating unit is used for calculating the speech effect score according to the speech performance score.
In one possible implementation, the scoring unit includes:
the acquisition subunit acquires a comprehensive weight value corresponding to the speech performance information, wherein the comprehensive weight value includes a first weight value corresponding to the speech content information, a second weight value corresponding to the speech existential information, a third weight value corresponding to the speech skill information, and a fourth weight value corresponding to the speech performance information;
the processing subunit is configured to perform encoding processing on the speech performance information to obtain an integrated encoded value, where the integrated encoded value includes a first encoded value, a second encoded value, a third encoded value, and a fourth encoded value;
a first calculating subunit, configured to calculate the speech performance score by substituting the comprehensive code value into the following formula:
Figure 511499DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 906708DEST_PATH_IMAGE002
representing the lecture representation score of the guest of the target lecture, i represents the performance information of the second lecture, wherein i1Information indicating the content of speech, i2Representing speech state information, i3Indicating speech skills information, i4Representing lecture time control information; n represents the number of speech performance information; ziRepresents the comprehensive coding value corresponding to the speech performance information, wherein Z1A first coding value, Z, corresponding to the presentation content information2A second coded value, Z, corresponding to presentation instrument state information3Third coded value, Z, corresponding to presentation skill information4A fourth coding value corresponding to the presentation time information is represented; miRepresenting the comprehensive weight value corresponding to the speech performance information, wherein M1A first weight value, M, corresponding to the speech content information2Second weight value, M, corresponding to presentation instrument state information3Third weight value, M, corresponding to presentation skill information4And a fourth weight value corresponding to the speech skill information.
In one possible implementation, the computing unit includes:
a second calculating subunit, configured to calculate the speech effect score by substituting the speech performance score into the following expression:
Figure 986659DEST_PATH_IMAGE003
wherein, PniRepresenting the performance score calculated by the lecture guests with the sequence number n at the meeting with the sequence number i,
Figure 74701DEST_PATH_IMAGE004
represents the performance score of the speech of the guest of the speech,
Figure 341734DEST_PATH_IMAGE005
a level of a property of the conference is indicated,
Figure 540635DEST_PATH_IMAGE006
indicating the difficulty of the pre-set evaluation,
Figure 848994DEST_PATH_IMAGE009
representing the historical speech performance level of the speech guests.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
Fig. 6 is a schematic structural diagram of an evaluation apparatus provided in an embodiment of the present application. As shown in fig. 6, the evaluation device 6 of this embodiment includes: at least one processor 60, a memory 61 and a computer program 62 stored in the memory 61 and executable on the at least one processor 60, the processor 60 implementing the steps in any of the above-described method embodiments when executing the computer program 62.
The evaluation device 6 may be a terminal device or a server, the terminal device may be a computing device such as a desktop computer, a notebook, a palm computer, etc., and the server may be a computing device such as a cloud server, but includes, but is not limited to, the processor 60 and the memory 61. Those skilled in the art will appreciate that fig. 6 is merely an example of the evaluation device 6, and does not constitute a limitation on the evaluation device 6, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.
The Processor 60 may be a Central Processing Unit (CPU), and the Processor 60 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may in some embodiments be an internal storage unit of the evaluation device 6, such as a hard disk or a memory of the evaluation device 6. The memory 61 may also be an external storage device of the evaluation device 6 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the evaluation device 6. Further, the memory 61 may also comprise both an internal memory unit of the evaluation device 6 and an external memory device. The memory 61 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 61 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The embodiment of the present application further provides a readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps that can be implemented in the above method embodiments.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (5)

1. A speech effectiveness evaluation system, comprising:
the acquisition equipment is used for acquiring voice information of a preset area and sending the voice information to the evaluation equipment;
the evaluation device is connected with the acquisition device and used for acquiring the voice information from the acquisition device, determining a target lecture honoured guest according to the voice information, generating an acquisition instruction and sending the acquisition instruction to the acquisition device;
the acquisition device is further configured to receive the acquisition instruction from the evaluation device, and in response to the acquisition instruction, acquire speech performance information of the target speech honoured guest, where the speech performance information includes speech content information, speech etiquette information, speech skill information, and speech time control information;
the evaluation equipment is further used for obtaining the speech performance information of the target speech honoured guest and obtaining the speech effect score of the target speech honoured guest.
2. The speech effectiveness evaluation system according to claim 1, wherein the acquisition device is specifically configured to: acquiring a wake-up instruction, and sending the wake-up instruction to evaluation equipment;
the evaluation device is specifically configured to:
acquiring a wake-up instruction from the acquisition equipment, wherein the wake-up instruction is used for starting a determination process for identifying a target speech guest from candidate target speech guests;
responding to the awakening instruction, and acquiring characteristic information of the candidate user within preset time;
and determining a target speaking guest in the candidate user according to the characteristic information of the candidate user.
3. The speech effect evaluation system according to claim 1, wherein the evaluation device is specifically configured to:
scoring the speech performance information to obtain speech performance scores;
and calculating the score of the speech effect according to the score of the speech performance.
4. The speech effect evaluation system of claim 3, wherein the evaluation device is specifically configured to:
acquiring a comprehensive weight value corresponding to the speech performance information, wherein the comprehensive weight value comprises a first weight value corresponding to the speech content information, a second weight value corresponding to the speech existential information, a third weight value corresponding to the speech skill information and a fourth weight value corresponding to the speech performance information;
coding the speech expression information to obtain a comprehensive coded value, wherein the comprehensive coded value comprises a first coded value, a second coded value, a third coded value and a fourth coded value;
and substituting the comprehensive coding value into the following formula to calculate the speech performance score:
Figure 68562DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 634673DEST_PATH_IMAGE002
representing the lecture representation score of the guest of the target lecture, i represents the performance information of the second lecture, wherein i1Information indicating the content of speech, i2Representing speech state information, i3Indicating speech skills information, i4Representing lecture time control information; n represents the number of speech performance information; ziRepresents the comprehensive coding value corresponding to the speech performance information, wherein Z1A first coding value, Z, corresponding to the presentation content information2Information pair for representing speech ceremony stateCorresponding second coding value, Z3Third coded value, Z, corresponding to presentation skill information4A fourth coding value corresponding to the presentation time information is represented; miRepresenting the comprehensive weight value corresponding to the speech performance information, wherein M1A first weight value, M, corresponding to the speech content information2Second weight value, M, corresponding to presentation instrument state information3Third weight value, M, corresponding to presentation skill information4And a fourth weight value corresponding to the speech skill information.
5. The speech effect evaluation system of claim 3, wherein the evaluation device is specifically configured to:
substituting the speech performance score into the following formula to calculate the speech effect score:
Figure 467499DEST_PATH_IMAGE003
wherein, PniRepresenting the performance score calculated by the lecture guests with the sequence number n at the meeting with the sequence number i,
Figure 828074DEST_PATH_IMAGE004
represents the performance score of the speech of the guest of the speech,
Figure 215193DEST_PATH_IMAGE005
a level of a property of the conference is indicated,
Figure 584994DEST_PATH_IMAGE006
indicating the difficulty of the pre-set evaluation,
Figure 882114DEST_PATH_IMAGE007
representing the historical speech performance level of the speech guests.
CN202110252324.4A 2021-03-08 2021-03-08 Speech effect evaluation system Withdrawn CN112905748A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110252324.4A CN112905748A (en) 2021-03-08 2021-03-08 Speech effect evaluation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110252324.4A CN112905748A (en) 2021-03-08 2021-03-08 Speech effect evaluation system

Publications (1)

Publication Number Publication Date
CN112905748A true CN112905748A (en) 2021-06-04

Family

ID=76107962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110252324.4A Withdrawn CN112905748A (en) 2021-03-08 2021-03-08 Speech effect evaluation system

Country Status (1)

Country Link
CN (1) CN112905748A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437824A (en) * 2023-12-13 2024-01-23 江西拓世智能科技股份有限公司 Lecture training method and related device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437824A (en) * 2023-12-13 2024-01-23 江西拓世智能科技股份有限公司 Lecture training method and related device
CN117437824B (en) * 2023-12-13 2024-05-14 江西拓世智能科技股份有限公司 Lecture training method and related device

Similar Documents

Publication Publication Date Title
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN110069608B (en) Voice interaction method, device, equipment and computer storage medium
CN107481720B (en) Explicit voiceprint recognition method and device
CN110457457B (en) Training method of dialogue generation model, dialogue generation method and device
CN112233698B (en) Character emotion recognition method, device, terminal equipment and storage medium
CN114127849A (en) Speech emotion recognition method and device
CN111832308A (en) Method and device for processing consistency of voice recognition text
CN111159358A (en) Multi-intention recognition training and using method and device
CN112468659A (en) Quality evaluation method, device, equipment and storage medium applied to telephone customer service
JP7178394B2 (en) Methods, apparatus, apparatus, and media for processing audio signals
CN109785846A (en) The role recognition method and device of the voice data of monophonic
CN111383138B (en) Restaurant data processing method, device, computer equipment and storage medium
CN111243604B (en) Training method for speaker recognition neural network model supporting multiple awakening words, speaker recognition method and system
CN111126084B (en) Data processing method, device, electronic equipment and storage medium
CN113361396A (en) Multi-modal knowledge distillation method and system
CN110647613A (en) Courseware construction method, courseware construction device, courseware construction server and storage medium
CN111444321B (en) Question answering method, device, electronic equipment and storage medium
CN109961152B (en) Personalized interaction method and system of virtual idol, terminal equipment and storage medium
CN114138960A (en) User intention identification method, device, equipment and medium
CN113903338A (en) Surface labeling method and device, electronic equipment and storage medium
CN112905748A (en) Speech effect evaluation system
CN111522937B (en) Speaking recommendation method and device and electronic equipment
CN110610697B (en) Voice recognition method and device
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
CN111680514A (en) Information processing and model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210604

WW01 Invention patent application withdrawn after publication