CN110619035B

CN110619035B - Method, device, equipment and storage medium for identifying keywords in interview video

Info

Publication number: CN110619035B
Application number: CN201910706481.0A
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2023-07-25
Anticipated expiration: 2039-08-01
Also published as: CN110619035A; WO2021017296A1

Abstract

The application relates to the field of neural networks, and provides a method, a device, equipment and a storage medium for identifying keywords in interview videos, wherein the method comprises the following steps: training the multi-view self-training neural network model by using a plurality of training texts, and converting the collected voice signals into texts to be identified; extracting a plurality of words to be recognized from a text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by an interviewer; generating prompt information, displaying the prompt information and the words to be identified, wherein the prompt information is used for prompting an interviewer to mark the words to be identified; calculating keyword probability of each word to be identified as a keyword by using a multi-view self-training neural network model; when the probability of the keywords is within the range of the probability threshold, marking the words to be identified within the range of the probability threshold as keywords; the keywords and notification messages are sent to at least one interview server. By adopting the scheme, the accuracy of recognizing the keywords in the text can be improved.

Description

Method, device, equipment and storage medium for identifying keywords in interview video

Technical Field

The present disclosure relates to the field of neural networks, and in particular, to a method, an apparatus, a device, and a storage medium for identifying keywords in an interview video.

Background

With the rapid development of information technology, AI technology has been applied to various industries. The human resource field is a typical field which is widely used. The fact that the AI technology is rapidly developed is that the image module and the voice module are both the image module and the voice module, particularly in the voice module, various voice systems are endless, voice recognition, voice conversion, voice interaction, voice synthesis and the like are gradually mature, and unprecedented opportunities are brought to the development of the voice technology.

However, the existing voice technology is only in a stage of roughly estimating voice semantics, and although partial keywords can be identified by comparing the similarity of the voice signal and the visual lip of the speaker to acquire keywords in the voice signal, the detailed parts in the keywords can not be accurately positioned, so that precise information can not be acquired in many cases, and therefore, the voice technology can not be popularized and used in more fields, particularly in the field of video interviews.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for identifying keywords in interview videos, which can solve the problem that the accuracy of acquiring keywords of a speaker in a voice signal is not high in the prior art.

In a first aspect, the present application provides a method for identifying keywords in an interview video, the method comprising:

inputting a plurality of acquired training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, wherein the training texts are used for training the multi-view self-training neural network model;

collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized;

extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees;

generating prompt information, and displaying the prompt information and the words to be identified, wherein the prompt information is used for prompting an interviewer to mark the words to be identified;

inputting the multiple words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword;

comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold;

and sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

In some possible designs, the inputting the collected plurality of training texts into the multi-view self-training neural network model includes:

dividing the training texts into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;

converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;

extracting first features of the first word vector and extracting second features of the second word vector;

inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;

comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;

when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;

when any one of the first loss probability or the second loss probability is higher than the upper limit of the preset probability threshold, the first loss probability or the second loss probability is set as the upper limit of the new probability threshold.

In some possible designs, the extracting the first feature of the first word vector and the extracting the second feature of the second word vector includes:

respectively inputting the first word vector and the second word vector into a GRU encoder;

and respectively converting and extracting the first word vector and the second word vector in the GRU coder to obtain the first feature in the first word vector and the second feature in the second word vector.

In some possible designs, the converting and feature extracting operations performed on the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector respectively include:

calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;

calculating a candidate hidden layer according to the reset gate, the first word vector at the moment t and the second word vector at the moment t;

respectively calculating hidden layer information in a first word vector and a second word vector according to the candidate hidden layers;

extracting a first feature of a first word vector and a second feature of a second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:

Wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>for calculating the function, the first feature and the second feature obtained by calculation are features of the keywords.

In some possible designs, the inputting the first feature and the second feature into the multi-view self-training neural network model, respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature, includes:

calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability->Is->First feature of time of day->To activate the function +.>、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;

and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>As a first feature of the method, a first feature,for the first probability- >For the number of first features +.>As a loss function.

and comprehensively calculating a second probability from the main view angle and the auxiliary view angle by adopting a second probability formula, wherein the second probability formula is as follows:

wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->A second characteristic of the moment of time is that,calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is thatA second feature of the moment in time. />、/>、/>、/>From left to rightRight, arranging according to time sequence;

and adjusting the second probability by using a loss function corresponding to the four auxiliary view angles to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:

wherein,, For the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、、/>，/>Second probability corresponding to four auxiliary views, < ->For the number of second features +.>Is a loss function corresponding to the four auxiliary views.

In some possible designs, the extracting a plurality of words to be recognized from the text to be recognized includes:

dividing a text to be recognized into words and making part-of-speech identification on the words after word division, reserving the part-of-speech identification as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;

calculating weight values of all nodes in the text to be identified, wherein the weight value calculation formula is as follows:

wherein,,for node->Weight value in text to be recognized, < +.>Is a damping coefficient, is a preset constant, < ->For node->And node->Weights between->For node->Set of pointed nodes, node->For node->Directed node +.>For node->And node->Weights between->For node->A weight value in the text to be identified;

dividing the weight value of each node by the maximum weight value in the weight value set to obtain a normalized weight value of each node in the text to be identified, and taking the word corresponding to the node with the normalized weight value larger than the preset weight value threshold in the text to be identified as the word to be identified.

In a second aspect, the present application provides an apparatus for identifying keywords in an interview video, having a function of implementing a method for identifying keywords in an interview video corresponding to the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.

In one possible design, the means for identifying keywords in the interview video includes:

the input/output module is used for inputting a plurality of acquired training texts into the multi-view self-training neural network model so as to train the multi-view self-training neural network model, and the training texts are used for training the multi-view self-training neural network model;

the processing module is used for collecting voice signals, calling the voice recognition system and converting the voice signals into texts to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees; generating prompt information;

the display module is used for displaying the prompt information and the words to be identified, and the prompt information is used for prompting an interviewer to mark the words to be identified;

The processing module is also used for inputting the plurality of words to be identified into the multi-view self-training neural network model through the input/output module, and calculating the keyword probability of each word to be identified as a keyword; comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold; and sending the keyword and the notification message to at least one interview server in the interview server list through the input and output module according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

In some possible designs, the processing module is specifically configured to:

Inputting the first feature and the second feature into a multi-view self-training neural network model through the input/output module, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;

In some possible designs, the processing module is specifically configured to:

wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>to calculate a function. The first feature and the second feature obtained through calculation are features of keywords.

In some possible designs, the processing module is specifically configured to:

calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability->Is->First feature of time of day->To activate the function +. >、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;

and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>As a first feature of the method, a first feature,for the first probability->For the number of first features +.>As a loss function.

In some possible designs, the processing module is specifically configured to:

wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->A second characteristic of the moment of time is that,calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is thatA second feature of the moment in time. />、/>、/>、/>From left to right, arranged in time sequence;

Wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、、/>，/>Second probability corresponding to four auxiliary views, < ->For the number of second features +.>Is a loss function corresponding to the four auxiliary views.

In some possible designs, the processing module is specifically configured to:

In yet another aspect, the present application provides a computer device comprising at least one connected processor, a memory and a transceiver, wherein the memory is configured to store program code, and the processor is configured to invoke the program code in the memory to perform the method according to the first aspect.

A further aspect of the present application provides a computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

Compared with the prior art, in the scheme provided by the application, the multi-view self-training model is trained, the voice signal is converted into the text to be identified, the keywords in the text to be identified are identified based on the multi-view self-training model, namely, the neural network model is trained from the main view and the auxiliary views respectively, so that the accuracy and the hit rate of identifying the keywords can be improved, and the identification precision of the neural network model is improved. In addition, the purpose of accurately positioning the keywords in the text can be achieved by improving the text feature extraction mode of the GRU encoder, and the keyword extraction accuracy of the text is further improved.

Drawings

FIG. 1 is a schematic flow chart of a method for identifying keywords in an interview video according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an apparatus for identifying keywords in an interview video according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device in an embodiment of the present application.

The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those listed or explicitly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules that may not be listed or inherent to such process, method, article, or apparatus, the partitioning of such modules by the present application may include only one logical partitioning, and may be implemented in another manner by such that a plurality of modules may be combined or integrated in another system, or such that certain features may be omitted or not implemented.

The application provides a method, a device, equipment and a storage medium for identifying keywords in interview videos, which can be used for video interviews or voice interviews and also can be used for emotion analysis of a speaker, and the application scene of the scheme is not limited.

Referring to fig. 1, a method for identifying keywords in an interview video according to an embodiment of the present application is described below, where the method includes:

101. the acquired plurality of training texts are input into a multi-view self-training neural network model to train the multi-view self-training neural network model.

The training text is used for training the multi-view self-training neural network model.

The multi-view self-training neural network model

The training texts are obtained from a text database provided by a service demand party, the text database is a preset text storage library, the service demand party provides the text storage library, and a plurality of training texts are stored in the text storage library. The training text includes keywords that meet the interview requirements for work ability and quality.

In some embodiments, the inputting the collected plurality of training texts into the multi-view self-training neural network model comprises:

Therefore, all the first keyword probabilities and the second keyword probabilities calculated through the multi-view self-training neural network model are respectively compared with a preset probability threshold, and the range of the probability threshold is adjusted, so that a more accurate keyword recognition range can be obtained.

In some embodiments, the extracting the first feature of the first word vector and the extracting the second feature of the second word vector includes:

inputting the first word vector and the second word vector into a gating loop unit (Gated Recurrent Unit, GRU) encoder, respectively;

102. And collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized.

Specifically, the voice signal can be detected in real time through the voice receiving device, the voice signal is interview voice sent by the interview operator in the interview environment, and the voice signal triggers the voice recognition system to convert the voice signal into the text to be recognized and is used as the basis of the recognition keywords. The main purpose of step 102 is to convert the speech signal into text to be recognized, which reduces the difficulty of speech recognition and facilitates easier recognition of keywords.

The voice recognition system is a preset system, specifically, hundred-degree voice recognition, signal flight voice recognition, or alicloud voice recognition can be selected, and the application is not limited.

103. Extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewee.

In some embodiments, the extracting a plurality of words to be recognized from the text to be recognized includes:

104. Generating prompt information, displaying the prompt information and words to be recognized, inputting the words to be recognized into the multi-view self-training neural network model, and calculating keyword probability that each word to be recognized is a keyword.

The prompt information is used for prompting interviewee staff to mark words to be identified.

The keyword refers to a word with a weight value higher than a preset weight value threshold in the text to be identified, and the weight value threshold of the keyword can be specifically set according to dimensions such as word frequency, part of speech, text subject and the like, which is not limited in the application.

The keyword probability refers to the probability that the word in the text to be recognized is a keyword.

It should be noted that, in the embodiment of the present application, whether the interviewer marks the words to be identified may be freely selected by the interviewer, and the interviewer may mark the words in text, or may not mark the words, which is not limited in the present application.

The keyword probabilities of the corresponding words to be identified as keywords are also classified according to labels according to whether the texts to be identified are marked texts or not, and particularly the keyword probabilities of the words to be identified as keywords in the method can be classified into a first keyword probability and a second keyword probability.

Optionally, in some embodiments of the present application, the multi-view self-training neural network model calculates the first keyword probability and the second keyword probability from a main view, an auxiliary view. The main view angle refers to the current time, the auxiliary view angle comprises a future time, a past time, a previous time and a next time, the future time does not comprise the next time, and the past time does not comprise the previous time.

The following describes a procedure for calculating the first keyword probability and the second keyword probability from the viewpoint of the types of the features for the first feature and the second feature, respectively.

(1) If the first feature is the first feature, the process of calculating the probability of the first keyword is as follows:

and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows: The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>For the first feature->For the first probability->For the number of first features +.>As a loss function.

(2) If the second feature is the second feature, the process of calculating the probability of the second keyword is as follows:

wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、、/>，/>Second probability corresponding to four auxiliary views, < - >For the number of second features +.>Is a loss function corresponding to the four auxiliary views.

In this embodiment, the probability threshold range for determining whether the text to be identified is a keyword in the process of steps 102 to 105 may be continuously adjusted by calculating the keyword probability of the keyword.

105. And comparing the keyword probability with a probability threshold, and marking the words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold.

In some embodiments, when the labeling instruction of the interviewer for the word to be recognized is detected, the keyword probability of the word to be recognized as the keyword is calculated by adopting the calculation method for calculating the first keyword probability in step 104.

In other embodiments, when the candidate word is detected not to be labeled by the interviewer, the keyword probability of the candidate word as the keyword is calculated by using the calculation method for calculating the second keyword probability in step 104.

106. And sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

Compared with the existing mechanism, in the embodiment of the application, the multi-view self-training model is trained, the voice signal is converted into the text to be identified, the keywords in the text to be identified are identified based on the multi-view self-training model, namely, the neural network model is trained from the main view and the auxiliary views respectively, so that the accuracy and hit rate of identifying the keywords can be improved, namely, the identification precision of the neural network model is improved.

Optionally, in some embodiments of the present application, the converting and feature extracting operations performed on the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector respectively include:

Therefore, the purpose of accurately positioning the keywords in the text can be achieved by improving the text feature extraction mode of the GRU encoder, and the keyword extraction accuracy rate of the text is further improved.

The technical features mentioned in the embodiment or implementation corresponding to fig. 1 are also applicable to the embodiments corresponding to fig. 2 and 3 in the present application, and the details of the similar parts will not be described in detail.

The method for identifying keywords in the interview video is described above, and a device for executing the method for identifying keywords in the interview video is described below.

A schematic structure of an apparatus 20 for recognizing keywords in interview videos is shown in fig. 2, which is applicable to video interviews. The apparatus 20 in the embodiment of the present application can implement the steps corresponding to the method for identifying keywords in interview videos performed in the embodiment corresponding to fig. 1 described above. The functions implemented by the apparatus 20 may be implemented by hardware, or may be implemented by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware. The apparatus 20 may include an input/output module 201, a processing module 202, and a display module 203, where the functional implementation of the processing module 202, the input/output module 201, and the display module 203 may refer to operations performed in the embodiment corresponding to fig. 1, which are not described herein. The processing module 202 may be configured to control input and output operations of the input and output module 201, and to control display operations of the display module 203.

In some embodiments, the input/output module 201 may be configured to input the collected plurality of training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, where the training texts are used to train the multi-view self-training neural network model;

the processing module 202 may be configured to collect a voice signal, invoke a voice recognition system, and convert the voice signal into a text to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees; generating prompt information;

the display module 203 may be configured to display the prompt information and the word to be identified, where the prompt information is used to prompt the interviewer to mark the word to be identified;

the processing module 202 is further configured to input the plurality of words to be identified to the multi-view self-training neural network model through the input/output module 201, and calculate a keyword probability that each word to be identified is a keyword; comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module 201 according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.

Compared with the existing mechanism, in the embodiment of the application, the processing module 202 trains the multi-view self-training model, converts the voice signal into the text to be identified, and identifies the keywords in the text to be identified based on the multi-view self-training model, namely trains the neural network model from the main view and the plurality of auxiliary views respectively, so that the accuracy and hit rate of identifying the keywords can be improved, namely the identification precision of the neural network model is improved.

In some embodiments, the processing module 202 is specifically configured to:

inputting the first feature and the second feature into a multi-view self-training neural network model through the input/output module 201, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;

In some embodiments, the processing module 202 is specifically configured to:

inputting the first word vector and the second word vector into a GRU encoder through the input-output module 201, respectively;

In some embodiments, the processing module 202 is specifically configured to:

wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>to calculate a function. The first feature and the second feature obtained through calculation are features of keywords. />

In some embodiments, the processing module 202 is specifically configured to:

and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1) >For the first keyword probability,/a>For the first feature->For the first probability->For the number of first features +.>As a loss function.

In some embodiments, the processing module 202 is specifically configured to:

wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、、/>，/>Second probability corresponding to four auxiliary views, < ->For the number of second features +. >Is a loss function corresponding to the four auxiliary views.

In some embodiments, the processing module 202 is specifically configured to:

The physical device corresponding to the input/output module 201 shown in fig. 2 is an input/output unit shown in fig. 3, and the input/output unit can implement part or all of the functions of the input/output module 201, or implement the same or similar functions as the input/output module 201.

The physical device corresponding to the processing module 202 shown in fig. 2 is a processor shown in fig. 3, which can implement part or all of the functions of the processing module 202, or implement the same or similar functions as the processing module 202.

The physical device corresponding to the display module 203 shown in fig. 2 is a processor shown in fig. 3, and the processor can implement a part or all of the functions of the display module 203, or implement the same or similar functions as the display module 203.

The foregoing describes 20 of the embodiments of the present application from the perspective of a modular functional entity, and a computer device from the perspective of hardware, as shown in fig. 3, includes: a processor, a memory, a transceiver (which may also be an input-output unit, not identified in fig. 3) and a computer program stored in the memory and executable on the processor. For example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1. For example, when the computer apparatus implements the functions of the device 20 shown in fig. 2, the processor, when executing the computer program, implements the steps in the method for identifying keywords in an interview video performed by the device 20 in the embodiment corresponding to fig. 2; alternatively, the processor may implement the functions of the modules in the apparatus 20 according to the embodiment corresponding to fig. 2 when executing the computer program. For another example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the computer device, connecting various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the computer device by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The transceiver may also be replaced by a receiver and a transmitter, which may be the same or different physical entities. Which are the same physical entities, may be collectively referred to as transceivers. The transceiver may be an input-output unit.

The memory may be integrated in the processor or may be provided separately from the processor.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described in connection with the accompanying drawings, but the present application is not limited to the specific embodiments described above, which are intended to be exemplary only, and not to be limiting, and many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the application and the appended claims, which are to be accorded the full scope of the present application, using the equivalent structures or equivalent flow transformations of the present application and the contents of the accompanying drawings, or using them directly or indirectly in other related technical fields.

Claims

1. A method of identifying keywords in an interview video, the method comprising:

sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time;

Inputting the plurality of words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability that each word to be recognized is a keyword, wherein the method comprises the following steps:

dividing the plurality of words to be identified according to whether the words to be identified are marked and extracting the features to obtain first features and/or second features, wherein the first features are keyword features corresponding to the words to be identified with marks, and the second features are keyword features corresponding to the words to be identified without marks;

calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability of being a first one,is->First feature of time of day->To activate the function +.>、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;

and adjusting the first probability by using a loss function from an auxiliary view angle to obtain a first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>For the first feature->For the first probability->For the number of first features +.>As a loss function;

wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->Second feature of time of day->Calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is->Second feature of time of day->、/>、/>、/>From left to right, arranged in time sequence;

and adjusting the second probability by using a loss function corresponding to the four auxiliary view angles to obtain a second keyword probability, wherein a calculation formula of the second keyword probability is as follows:

wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、/>、，/>Second probability corresponding to four auxiliary views, < ->For the number of second features +.>Is a loss function corresponding to the four auxiliary views.

2. The method of claim 1, wherein the inputting the collected plurality of training texts into the multi-view self-training neural network model comprises:

3. The method of claim 2, wherein extracting the first feature of the first word vector and extracting the second feature of the second word vector comprises:

4. The method of claim 3, wherein converting and feature extracting the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector, respectively, comprises:

5. The method of claim 1, wherein the extracting a plurality of words to be recognized from the text to be recognized comprises:

wherein,,for node->Weight value in text to be recognized, < +.>Is a damping coefficient, is a preset constant, < ->For node->And node->Weights between->For node->Set of pointed nodes, node->For node->Directed node +.>For node->And node->Weights between- >For node->A weight value in the text to be identified;

6. An apparatus for identifying keywords in an interview video, wherein the apparatus performs the method of any one of claims 1-5, the apparatus comprising:

7. A computer device, the computer device comprising:

at least one processor, memory, and transceiver;

wherein the memory is for storing program code and the processor is for invoking the program code stored in the memory to perform the method of any of claims 1-5.

8. A computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-5.