CN110619035B - Method, device, equipment and storage medium for identifying keywords in interview video - Google Patents

Method, device, equipment and storage medium for identifying keywords in interview video Download PDF

Info

Publication number
CN110619035B
CN110619035B CN201910706481.0A CN201910706481A CN110619035B CN 110619035 B CN110619035 B CN 110619035B CN 201910706481 A CN201910706481 A CN 201910706481A CN 110619035 B CN110619035 B CN 110619035B
Authority
CN
China
Prior art keywords
probability
feature
keyword
words
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910706481.0A
Other languages
Chinese (zh)
Other versions
CN110619035A (en
Inventor
金戈
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910706481.0A priority Critical patent/CN110619035B/en
Priority to PCT/CN2019/117928 priority patent/WO2021017296A1/en
Publication of CN110619035A publication Critical patent/CN110619035A/en
Application granted granted Critical
Publication of CN110619035B publication Critical patent/CN110619035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the field of neural networks, and provides a method, a device, equipment and a storage medium for identifying keywords in interview videos, wherein the method comprises the following steps: training the multi-view self-training neural network model by using a plurality of training texts, and converting the collected voice signals into texts to be identified; extracting a plurality of words to be recognized from a text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by an interviewer; generating prompt information, displaying the prompt information and the words to be identified, wherein the prompt information is used for prompting an interviewer to mark the words to be identified; calculating keyword probability of each word to be identified as a keyword by using a multi-view self-training neural network model; when the probability of the keywords is within the range of the probability threshold, marking the words to be identified within the range of the probability threshold as keywords; the keywords and notification messages are sent to at least one interview server. By adopting the scheme, the accuracy of recognizing the keywords in the text can be improved.

Description

Method, device, equipment and storage medium for identifying keywords in interview video
Technical Field
The present disclosure relates to the field of neural networks, and in particular, to a method, an apparatus, a device, and a storage medium for identifying keywords in an interview video.
Background
With the rapid development of information technology, AI technology has been applied to various industries. The human resource field is a typical field which is widely used. The fact that the AI technology is rapidly developed is that the image module and the voice module are both the image module and the voice module, particularly in the voice module, various voice systems are endless, voice recognition, voice conversion, voice interaction, voice synthesis and the like are gradually mature, and unprecedented opportunities are brought to the development of the voice technology.
However, the existing voice technology is only in a stage of roughly estimating voice semantics, and although partial keywords can be identified by comparing the similarity of the voice signal and the visual lip of the speaker to acquire keywords in the voice signal, the detailed parts in the keywords can not be accurately positioned, so that precise information can not be acquired in many cases, and therefore, the voice technology can not be popularized and used in more fields, particularly in the field of video interviews.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for identifying keywords in interview videos, which can solve the problem that the accuracy of acquiring keywords of a speaker in a voice signal is not high in the prior art.
In a first aspect, the present application provides a method for identifying keywords in an interview video, the method comprising:
inputting a plurality of acquired training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, wherein the training texts are used for training the multi-view self-training neural network model;
collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized;
extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees;
generating prompt information, and displaying the prompt information and the words to be identified, wherein the prompt information is used for prompting an interviewer to mark the words to be identified;
inputting the multiple words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword;
comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold;
and sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
In some possible designs, the inputting the collected plurality of training texts into the multi-view self-training neural network model includes:
dividing the training texts into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting first features of the first word vector and extracting second features of the second word vector;
inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
when any one of the first loss probability or the second loss probability is higher than the upper limit of the preset probability threshold, the first loss probability or the second loss probability is set as the upper limit of the new probability threshold.
In some possible designs, the extracting the first feature of the first word vector and the extracting the second feature of the second word vector includes:
respectively inputting the first word vector and the second word vector into a GRU encoder;
and respectively converting and extracting the first word vector and the second word vector in the GRU coder to obtain the first feature in the first word vector and the second feature in the second word vector.
In some possible designs, the converting and feature extracting operations performed on the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector respectively include:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the moment t and the second word vector at the moment t;
respectively calculating hidden layer information in a first word vector and a second word vector according to the candidate hidden layers;
extracting a first feature of a first word vector and a second feature of a second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
Wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>for calculating the function, the first feature and the second feature obtained by calculation are features of the keywords.
In some possible designs, the inputting the first feature and the second feature into the multi-view self-training neural network model, respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature, includes:
calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability->Is->First feature of time of day->To activate the function +.>、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;
and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>As a first feature of the method, a first feature,for the first probability- >For the number of first features +.>As a loss function.
In some possible designs, the inputting the first feature and the second feature into the multi-view self-training neural network model, respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature, includes:
and comprehensively calculating a second probability from the main view angle and the auxiliary view angle by adopting a second probability formula, wherein the second probability formula is as follows:
wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->A second characteristic of the moment of time is that,calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is thatA second feature of the moment in time. />、/>、/>、/>From left to rightRight, arranging according to time sequence;
and adjusting the second probability by using a loss function corresponding to the four auxiliary view angles to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein,, For the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、/>,/>Second probability corresponding to four auxiliary views, < ->For the number of second features +.>Is a loss function corresponding to the four auxiliary views.
In some possible designs, the extracting a plurality of words to be recognized from the text to be recognized includes:
dividing a text to be recognized into words and making part-of-speech identification on the words after word division, reserving the part-of-speech identification as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;
calculating weight values of all nodes in the text to be identified, wherein the weight value calculation formula is as follows:
wherein,,for node->Weight value in text to be recognized, < +.>Is a damping coefficient, is a preset constant, < ->For node->And node->Weights between->For node->Set of pointed nodes, node->For node->Directed node +.>For node->And node->Weights between->For node->A weight value in the text to be identified;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain a normalized weight value of each node in the text to be identified, and taking the word corresponding to the node with the normalized weight value larger than the preset weight value threshold in the text to be identified as the word to be identified.
In a second aspect, the present application provides an apparatus for identifying keywords in an interview video, having a function of implementing a method for identifying keywords in an interview video corresponding to the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.
In one possible design, the means for identifying keywords in the interview video includes:
the input/output module is used for inputting a plurality of acquired training texts into the multi-view self-training neural network model so as to train the multi-view self-training neural network model, and the training texts are used for training the multi-view self-training neural network model;
the processing module is used for collecting voice signals, calling the voice recognition system and converting the voice signals into texts to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees; generating prompt information;
the display module is used for displaying the prompt information and the words to be identified, and the prompt information is used for prompting an interviewer to mark the words to be identified;
The processing module is also used for inputting the plurality of words to be identified into the multi-view self-training neural network model through the input/output module, and calculating the keyword probability of each word to be identified as a keyword; comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold; and sending the keyword and the notification message to at least one interview server in the interview server list through the input and output module according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
In some possible designs, the processing module is specifically configured to:
dividing the training texts into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting first features of the first word vector and extracting second features of the second word vector;
Inputting the first feature and the second feature into a multi-view self-training neural network model through the input/output module, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
when any one of the first loss probability or the second loss probability is higher than the upper limit of the preset probability threshold, the first loss probability or the second loss probability is set as the upper limit of the new probability threshold.
In some possible designs, the processing module is specifically configured to:
respectively inputting the first word vector and the second word vector into a GRU encoder;
and respectively converting and extracting the first word vector and the second word vector in the GRU coder to obtain the first feature in the first word vector and the second feature in the second word vector.
In some possible designs, the processing module is specifically configured to:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the moment t and the second word vector at the moment t;
respectively calculating hidden layer information in a first word vector and a second word vector according to the candidate hidden layers;
extracting a first feature of a first word vector and a second feature of a second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>to calculate a function. The first feature and the second feature obtained through calculation are features of keywords.
In some possible designs, the processing module is specifically configured to:
calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability->Is->First feature of time of day->To activate the function +. >、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;
and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>As a first feature of the method, a first feature,for the first probability->For the number of first features +.>As a loss function.
In some possible designs, the processing module is specifically configured to:
and comprehensively calculating a second probability from the main view angle and the auxiliary view angle by adopting a second probability formula, wherein the second probability formula is as follows:
wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->A second characteristic of the moment of time is that,calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is thatA second feature of the moment in time. />、/>、/>、/>From left to right, arranged in time sequence;
and adjusting the second probability by using a loss function corresponding to the four auxiliary view angles to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
Wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、/>,/>Second probability corresponding to four auxiliary views, < ->For the number of second features +.>Is a loss function corresponding to the four auxiliary views.
In some possible designs, the processing module is specifically configured to:
dividing a text to be recognized into words and making part-of-speech identification on the words after word division, reserving the part-of-speech identification as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;
calculating weight values of all nodes in the text to be identified, wherein the weight value calculation formula is as follows:
wherein,,for node->Weight value in text to be recognized, < +.>Is a damping coefficient, is a preset constant, < ->For node->And node->Weights between->For node->Set of pointed nodes, node->For node->Directed node +.>For node->And node->Weights between->For node->A weight value in the text to be identified;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain a normalized weight value of each node in the text to be identified, and taking the word corresponding to the node with the normalized weight value larger than the preset weight value threshold in the text to be identified as the word to be identified.
In yet another aspect, the present application provides a computer device comprising at least one connected processor, a memory and a transceiver, wherein the memory is configured to store program code, and the processor is configured to invoke the program code in the memory to perform the method according to the first aspect.
A further aspect of the present application provides a computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.
Compared with the prior art, in the scheme provided by the application, the multi-view self-training model is trained, the voice signal is converted into the text to be identified, the keywords in the text to be identified are identified based on the multi-view self-training model, namely, the neural network model is trained from the main view and the auxiliary views respectively, so that the accuracy and the hit rate of identifying the keywords can be improved, and the identification precision of the neural network model is improved. In addition, the purpose of accurately positioning the keywords in the text can be achieved by improving the text feature extraction mode of the GRU encoder, and the keyword extraction accuracy of the text is further improved.
Drawings
FIG. 1 is a schematic flow chart of a method for identifying keywords in an interview video according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an apparatus for identifying keywords in an interview video according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device in an embodiment of the present application.
The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those listed or explicitly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules that may not be listed or inherent to such process, method, article, or apparatus, the partitioning of such modules by the present application may include only one logical partitioning, and may be implemented in another manner by such that a plurality of modules may be combined or integrated in another system, or such that certain features may be omitted or not implemented.
The application provides a method, a device, equipment and a storage medium for identifying keywords in interview videos, which can be used for video interviews or voice interviews and also can be used for emotion analysis of a speaker, and the application scene of the scheme is not limited.
Referring to fig. 1, a method for identifying keywords in an interview video according to an embodiment of the present application is described below, where the method includes:
101. the acquired plurality of training texts are input into a multi-view self-training neural network model to train the multi-view self-training neural network model.
The training text is used for training the multi-view self-training neural network model.
The multi-view self-training neural network model
The training texts are obtained from a text database provided by a service demand party, the text database is a preset text storage library, the service demand party provides the text storage library, and a plurality of training texts are stored in the text storage library. The training text includes keywords that meet the interview requirements for work ability and quality.
In some embodiments, the inputting the collected plurality of training texts into the multi-view self-training neural network model comprises:
dividing the training texts into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
Converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting first features of the first word vector and extracting second features of the second word vector;
inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
when any one of the first loss probability or the second loss probability is higher than the upper limit of the preset probability threshold, the first loss probability or the second loss probability is set as the upper limit of the new probability threshold.
Therefore, all the first keyword probabilities and the second keyword probabilities calculated through the multi-view self-training neural network model are respectively compared with a preset probability threshold, and the range of the probability threshold is adjusted, so that a more accurate keyword recognition range can be obtained.
In some embodiments, the extracting the first feature of the first word vector and the extracting the second feature of the second word vector includes:
inputting the first word vector and the second word vector into a gating loop unit (Gated Recurrent Unit, GRU) encoder, respectively;
and respectively converting and extracting the first word vector and the second word vector in the GRU coder to obtain the first feature in the first word vector and the second feature in the second word vector.
102. And collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized.
Specifically, the voice signal can be detected in real time through the voice receiving device, the voice signal is interview voice sent by the interview operator in the interview environment, and the voice signal triggers the voice recognition system to convert the voice signal into the text to be recognized and is used as the basis of the recognition keywords. The main purpose of step 102 is to convert the speech signal into text to be recognized, which reduces the difficulty of speech recognition and facilitates easier recognition of keywords.
The voice recognition system is a preset system, specifically, hundred-degree voice recognition, signal flight voice recognition, or alicloud voice recognition can be selected, and the application is not limited.
103. Extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewee.
In some embodiments, the extracting a plurality of words to be recognized from the text to be recognized includes:
dividing a text to be recognized into words and making part-of-speech identification on the words after word division, reserving the part-of-speech identification as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;
calculating weight values of all nodes in the text to be identified, wherein the weight value calculation formula is as follows:
wherein,,for node->Weight value in text to be recognized, < +.>Is a damping coefficient, is a preset constant, < ->For node->And node->Weights between->For node->Set of pointed nodes, node->For node->Directed node +.>For node->And node->Weights between->For node->A weight value in the text to be identified;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain a normalized weight value of each node in the text to be identified, and taking the word corresponding to the node with the normalized weight value larger than the preset weight value threshold in the text to be identified as the word to be identified.
104. Generating prompt information, displaying the prompt information and words to be recognized, inputting the words to be recognized into the multi-view self-training neural network model, and calculating keyword probability that each word to be recognized is a keyword.
The prompt information is used for prompting interviewee staff to mark words to be identified.
The keyword refers to a word with a weight value higher than a preset weight value threshold in the text to be identified, and the weight value threshold of the keyword can be specifically set according to dimensions such as word frequency, part of speech, text subject and the like, which is not limited in the application.
The keyword probability refers to the probability that the word in the text to be recognized is a keyword.
It should be noted that, in the embodiment of the present application, whether the interviewer marks the words to be identified may be freely selected by the interviewer, and the interviewer may mark the words in text, or may not mark the words, which is not limited in the present application.
The keyword probabilities of the corresponding words to be identified as keywords are also classified according to labels according to whether the texts to be identified are marked texts or not, and particularly the keyword probabilities of the words to be identified as keywords in the method can be classified into a first keyword probability and a second keyword probability.
Optionally, in some embodiments of the present application, the multi-view self-training neural network model calculates the first keyword probability and the second keyword probability from a main view, an auxiliary view. The main view angle refers to the current time, the auxiliary view angle comprises a future time, a past time, a previous time and a next time, the future time does not comprise the next time, and the past time does not comprise the previous time.
The following describes a procedure for calculating the first keyword probability and the second keyword probability from the viewpoint of the types of the features for the first feature and the second feature, respectively.
(1) If the first feature is the first feature, the process of calculating the probability of the first keyword is as follows:
calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability->Is->First feature of time of day->To activate the function +.>、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;
and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows: The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>For the first feature->For the first probability->For the number of first features +.>As a loss function.
(2) If the second feature is the second feature, the process of calculating the probability of the second keyword is as follows:
and comprehensively calculating a second probability from the main view angle and the auxiliary view angle by adopting a second probability formula, wherein the second probability formula is as follows:
wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->A second characteristic of the moment of time is that,calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is thatA second feature of the moment in time. />、/>、/>、/>From left to right, arranged in time sequence;
and adjusting the second probability by using a loss function corresponding to the four auxiliary view angles to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、/>,/>Second probability corresponding to four auxiliary views, < - >For the number of second features +.>Is a loss function corresponding to the four auxiliary views.
In this embodiment, the probability threshold range for determining whether the text to be identified is a keyword in the process of steps 102 to 105 may be continuously adjusted by calculating the keyword probability of the keyword.
105. And comparing the keyword probability with a probability threshold, and marking the words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold.
In some embodiments, when the labeling instruction of the interviewer for the word to be recognized is detected, the keyword probability of the word to be recognized as the keyword is calculated by adopting the calculation method for calculating the first keyword probability in step 104.
In other embodiments, when the candidate word is detected not to be labeled by the interviewer, the keyword probability of the candidate word as the keyword is calculated by using the calculation method for calculating the second keyword probability in step 104.
106. And sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
Compared with the existing mechanism, in the embodiment of the application, the multi-view self-training model is trained, the voice signal is converted into the text to be identified, the keywords in the text to be identified are identified based on the multi-view self-training model, namely, the neural network model is trained from the main view and the auxiliary views respectively, so that the accuracy and hit rate of identifying the keywords can be improved, namely, the identification precision of the neural network model is improved.
Optionally, in some embodiments of the present application, the converting and feature extracting operations performed on the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector respectively include:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the moment t and the second word vector at the moment t;
respectively calculating hidden layer information in a first word vector and a second word vector according to the candidate hidden layers;
extracting a first feature of a first word vector and a second feature of a second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
Wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>to calculate a function. The first feature and the second feature obtained through calculation are features of keywords.
Therefore, the purpose of accurately positioning the keywords in the text can be achieved by improving the text feature extraction mode of the GRU encoder, and the keyword extraction accuracy rate of the text is further improved.
The technical features mentioned in the embodiment or implementation corresponding to fig. 1 are also applicable to the embodiments corresponding to fig. 2 and 3 in the present application, and the details of the similar parts will not be described in detail.
The method for identifying keywords in the interview video is described above, and a device for executing the method for identifying keywords in the interview video is described below.
A schematic structure of an apparatus 20 for recognizing keywords in interview videos is shown in fig. 2, which is applicable to video interviews. The apparatus 20 in the embodiment of the present application can implement the steps corresponding to the method for identifying keywords in interview videos performed in the embodiment corresponding to fig. 1 described above. The functions implemented by the apparatus 20 may be implemented by hardware, or may be implemented by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware. The apparatus 20 may include an input/output module 201, a processing module 202, and a display module 203, where the functional implementation of the processing module 202, the input/output module 201, and the display module 203 may refer to operations performed in the embodiment corresponding to fig. 1, which are not described herein. The processing module 202 may be configured to control input and output operations of the input and output module 201, and to control display operations of the display module 203.
In some embodiments, the input/output module 201 may be configured to input the collected plurality of training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, where the training texts are used to train the multi-view self-training neural network model;
the processing module 202 may be configured to collect a voice signal, invoke a voice recognition system, and convert the voice signal into a text to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees; generating prompt information;
the display module 203 may be configured to display the prompt information and the word to be identified, where the prompt information is used to prompt the interviewer to mark the word to be identified;
the processing module 202 is further configured to input the plurality of words to be identified to the multi-view self-training neural network model through the input/output module 201, and calculate a keyword probability that each word to be identified is a keyword; comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold; and sending the keyword and the notification message to at least one interview server in the interview server list through the input/output module 201 according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
Compared with the existing mechanism, in the embodiment of the application, the processing module 202 trains the multi-view self-training model, converts the voice signal into the text to be identified, and identifies the keywords in the text to be identified based on the multi-view self-training model, namely trains the neural network model from the main view and the plurality of auxiliary views respectively, so that the accuracy and hit rate of identifying the keywords can be improved, namely the identification precision of the neural network model is improved.
In some embodiments, the processing module 202 is specifically configured to:
dividing the training texts into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting first features of the first word vector and extracting second features of the second word vector;
inputting the first feature and the second feature into a multi-view self-training neural network model through the input/output module 201, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
Comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
when any one of the first loss probability or the second loss probability is higher than the upper limit of the preset probability threshold, the first loss probability or the second loss probability is set as the upper limit of the new probability threshold.
In some embodiments, the processing module 202 is specifically configured to:
inputting the first word vector and the second word vector into a GRU encoder through the input-output module 201, respectively;
and respectively converting and extracting the first word vector and the second word vector in the GRU coder to obtain the first feature in the first word vector and the second feature in the second word vector.
In some embodiments, the processing module 202 is specifically configured to:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the moment t and the second word vector at the moment t;
Respectively calculating hidden layer information in a first word vector and a second word vector according to the candidate hidden layers;
extracting a first feature of a first word vector and a second feature of a second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>to calculate a function. The first feature and the second feature obtained through calculation are features of keywords. />
In some embodiments, the processing module 202 is specifically configured to:
calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability->Is->First feature of time of day->To activate the function +.>、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;
and adjusting the first probability by using a loss function from an auxiliary view angle to obtain the first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1) >For the first keyword probability,/a>For the first feature->For the first probability->For the number of first features +.>As a loss function.
In some embodiments, the processing module 202 is specifically configured to:
and comprehensively calculating a second probability from the main view angle and the auxiliary view angle by adopting a second probability formula, wherein the second probability formula is as follows:
wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->A second characteristic of the moment of time is that,calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is thatA second feature of the moment in time. />、/>、/>、/>From left to right, arranged in time sequence;
and adjusting the second probability by using a loss function corresponding to the four auxiliary view angles to obtain the second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、/>,/>Second probability corresponding to four auxiliary views, < ->For the number of second features +. >Is a loss function corresponding to the four auxiliary views.
In some embodiments, the processing module 202 is specifically configured to:
dividing a text to be recognized into words and making part-of-speech identification on the words after word division, reserving the part-of-speech identification as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;
calculating weight values of all nodes in the text to be identified, wherein the weight value calculation formula is as follows:
wherein,,for node->Weight value in text to be recognized, < +.>Is a damping coefficient, is a preset constant, < ->For node->And node->Weights between->For node->Set of pointed nodes, node->For node->Directed node +.>For node->And node->Weights between->For node->A weight value in the text to be identified;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain a normalized weight value of each node in the text to be identified, and taking the word corresponding to the node with the normalized weight value larger than the preset weight value threshold in the text to be identified as the word to be identified.
The physical device corresponding to the input/output module 201 shown in fig. 2 is an input/output unit shown in fig. 3, and the input/output unit can implement part or all of the functions of the input/output module 201, or implement the same or similar functions as the input/output module 201.
The physical device corresponding to the processing module 202 shown in fig. 2 is a processor shown in fig. 3, which can implement part or all of the functions of the processing module 202, or implement the same or similar functions as the processing module 202.
The physical device corresponding to the display module 203 shown in fig. 2 is a processor shown in fig. 3, and the processor can implement a part or all of the functions of the display module 203, or implement the same or similar functions as the display module 203.
The foregoing describes 20 of the embodiments of the present application from the perspective of a modular functional entity, and a computer device from the perspective of hardware, as shown in fig. 3, includes: a processor, a memory, a transceiver (which may also be an input-output unit, not identified in fig. 3) and a computer program stored in the memory and executable on the processor. For example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1. For example, when the computer apparatus implements the functions of the device 20 shown in fig. 2, the processor, when executing the computer program, implements the steps in the method for identifying keywords in an interview video performed by the device 20 in the embodiment corresponding to fig. 2; alternatively, the processor may implement the functions of the modules in the apparatus 20 according to the embodiment corresponding to fig. 2 when executing the computer program. For another example, the computer program may be a program corresponding to the method for identifying keywords in the interview video in the embodiment corresponding to fig. 1.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the computer device, connecting various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the computer device by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
The transceiver may also be replaced by a receiver and a transmitter, which may be the same or different physical entities. Which are the same physical entities, may be collectively referred to as transceivers. The transceiver may be an input-output unit.
The memory may be integrated in the processor or may be provided separately from the processor.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method described in the embodiments of the present application.
The embodiments of the present application have been described in connection with the accompanying drawings, but the present application is not limited to the specific embodiments described above, which are intended to be exemplary only, and not to be limiting, and many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the application and the appended claims, which are to be accorded the full scope of the present application, using the equivalent structures or equivalent flow transformations of the present application and the contents of the accompanying drawings, or using them directly or indirectly in other related technical fields.

Claims (8)

1. A method of identifying keywords in an interview video, the method comprising:
inputting a plurality of acquired training texts into a multi-view self-training neural network model to train the multi-view self-training neural network model, wherein the training texts are used for training the multi-view self-training neural network model;
collecting a voice signal, calling a voice recognition system, and converting the voice signal into a text to be recognized;
extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees;
generating prompt information, and displaying the prompt information and the words to be identified, wherein the prompt information is used for prompting an interviewer to mark the words to be identified;
inputting the multiple words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability of each word to be recognized as a keyword;
comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold;
sending the keyword and the notification message to at least one interview server in the interview server list according to a preset interview server name, wherein the notification message is used for prompting the interview server to upload a final interview result in time;
Inputting the plurality of words to be recognized into the multi-view self-training neural network model, and calculating the keyword probability that each word to be recognized is a keyword, wherein the method comprises the following steps:
dividing the plurality of words to be identified according to whether the words to be identified are marked and extracting the features to obtain first features and/or second features, wherein the first features are keyword features corresponding to the words to be identified with marks, and the second features are keyword features corresponding to the words to be identified without marks;
calculating a first probability of a first feature from a main view angle through SoftMax by adopting a first probability formula, wherein the first probability formula is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first probability of being a first one,is->First feature of time of day->To activate the function +.>、/>Is a probability matrix, which is a preset matrix, < ->Is a key probability parameter, is a preset constant, is used for compensating the error of the first key probability calculation, +.>Is a calculation function;
and adjusting the first probability by using a loss function from an auxiliary view angle to obtain a first keyword probability, wherein the first keyword probability is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the first keyword probability,/a>For the first feature->For the first probability->For the number of first features +.>As a loss function;
And comprehensively calculating a second probability from the main view angle and the auxiliary view angle by adopting a second probability formula, wherein the second probability formula is as follows:
wherein,,for the second probability of the previous moment, +.>For a second probability of the latter moment, +.>For a second probability of future time, +.>For a second probability of past time, +.>Is->Second feature of time of day->Calculation function of the second probability for the previous moment,/->Calculation function for the second probability of the following moment,/->Computing a function for a second probability of future time, < +.>For the calculation function of the past time +.>Is->Second feature of time of day->、/>、/>、/>From left to right, arranged in time sequence;
and adjusting the second probability by using a loss function corresponding to the four auxiliary view angles to obtain a second keyword probability, wherein a calculation formula of the second keyword probability is as follows:
wherein,,for the second keyword probability,/a>For four auxiliary viewing angles, including->、/>、/>,/>Second probability corresponding to four auxiliary views, < ->For the number of second features +.>Is a loss function corresponding to the four auxiliary views.
2. The method of claim 1, wherein the inputting the collected plurality of training texts into the multi-view self-training neural network model comprises:
Dividing the training texts into a first training text and a second training text according to the labels, wherein the first training text is the training text with the labels, and the second training text is the training text without the labels;
converting the first training text into a first word vector according to the coding rule, and converting the second training text into a second word vector;
extracting first features of the first word vector and extracting second features of the second word vector;
inputting the first feature and the second feature into a multi-view self-training neural network model, and respectively calculating to obtain a first keyword probability corresponding to the first feature and a second keyword probability corresponding to the second feature;
comparing the first keyword probability and the second keyword probability with a preset probability threshold respectively;
when any first loss probability or second loss probability is lower than the lower limit of a preset probability threshold, setting the first loss probability or the second loss probability as the lower limit of a new probability threshold;
when any one of the first loss probability or the second loss probability is higher than the upper limit of the preset probability threshold, the first loss probability or the second loss probability is set as the upper limit of the new probability threshold.
3. The method of claim 2, wherein extracting the first feature of the first word vector and extracting the second feature of the second word vector comprises:
respectively inputting the first word vector and the second word vector into a GRU encoder;
and respectively converting and extracting the first word vector and the second word vector in the GRU coder to obtain the first feature in the first word vector and the second feature in the second word vector.
4. The method of claim 3, wherein converting and feature extracting the first word vector and the second word vector in the GRU encoder to obtain the first feature in the first word vector and the second feature in the second word vector, respectively, comprises:
calculating a reset gate and an update gate according to the hidden layer at the time t-1, the first word vector at the time t and the second word vector at the time t;
calculating a candidate hidden layer according to the reset gate, the first word vector at the moment t and the second word vector at the moment t;
respectively calculating hidden layer information in a first word vector and a second word vector according to the candidate hidden layers;
Extracting a first feature of a first word vector and a second feature of a second word vector according to the hidden layer information; wherein the formula for extracting the first feature and the second feature is as follows:
wherein,,for the first feature, the second feature, +.>For hiding layer information->Is a characteristic weight matrix, which is a preset matrix,>for calculating the function, the first feature and the second feature obtained by calculation are features of the keywords.
5. The method of claim 1, wherein the extracting a plurality of words to be recognized from the text to be recognized comprises:
dividing a text to be recognized into words and making part-of-speech identification on the words after word division, reserving the part-of-speech identification as words of nouns, verbs, adjectives and adverbs, and taking each word as a node;
calculating weight values of all nodes in the text to be identified, wherein the weight value calculation formula is as follows:
wherein,,for node->Weight value in text to be recognized, < +.>Is a damping coefficient, is a preset constant, < ->For node->And node->Weights between->For node->Set of pointed nodes, node->For node->Directed node +.>For node->And node->Weights between- >For node->A weight value in the text to be identified;
dividing the weight value of each node by the maximum weight value in the weight value set to obtain a normalized weight value of each node in the text to be identified, and taking the word corresponding to the node with the normalized weight value larger than the preset weight value threshold in the text to be identified as the word to be identified.
6. An apparatus for identifying keywords in an interview video, wherein the apparatus performs the method of any one of claims 1-5, the apparatus comprising:
the input/output module is used for inputting a plurality of acquired training texts into the multi-view self-training neural network model so as to train the multi-view self-training neural network model, and the training texts are used for training the multi-view self-training neural network model;
the processing module is used for collecting voice signals, calling the voice recognition system and converting the voice signals into texts to be recognized; extracting a plurality of words to be recognized from the text to be recognized, wherein the words to be recognized are tagged words to be recognized fed back by interviewees; generating prompt information;
the display module is used for displaying the prompt information and the words to be identified, and the prompt information is used for prompting an interviewer to mark the words to be identified;
The processing module is also used for inputting the plurality of words to be identified into the multi-view self-training neural network model through the input/output module, and calculating the keyword probability of each word to be identified as a keyword; comparing the keyword probability with a probability threshold, and marking words to be recognized in the range of the probability threshold as keywords when the keyword probability is in the range of the probability threshold; and sending the keyword and the notification message to at least one interview server in the interview server list through the input and output module according to a preset interview server list, wherein the notification message is used for prompting the interview server to upload a final interview result in time.
7. A computer device, the computer device comprising:
at least one processor, memory, and transceiver;
wherein the memory is for storing program code and the processor is for invoking the program code stored in the memory to perform the method of any of claims 1-5.
8. A computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-5.
CN201910706481.0A 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video Active CN110619035B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910706481.0A CN110619035B (en) 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video
PCT/CN2019/117928 WO2021017296A1 (en) 2019-08-01 2019-11-13 Information recognition method, device, apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910706481.0A CN110619035B (en) 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video

Publications (2)

Publication Number Publication Date
CN110619035A CN110619035A (en) 2019-12-27
CN110619035B true CN110619035B (en) 2023-07-25

Family

ID=68921514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910706481.0A Active CN110619035B (en) 2019-08-01 2019-08-01 Method, device, equipment and storage medium for identifying keywords in interview video

Country Status (2)

Country Link
CN (1) CN110619035B (en)
WO (1) WO2021017296A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377965B (en) * 2021-06-30 2024-02-23 中国农业银行股份有限公司 Method and related device for sensing text keywords
CN115049372B (en) * 2022-08-15 2022-12-02 山东心法科技有限公司 Method, apparatus and medium for constructing digital infrastructure for human resource information
CN116366801B (en) * 2023-06-03 2023-10-13 深圳市小麦飞扬科技有限公司 Multi-terminal interaction system for recruitment information
CN116862318B (en) * 2023-09-04 2023-11-17 国电投华泽(天津)资产管理有限公司 New energy project evaluation method and device based on text semantic feature extraction
CN116882416B (en) * 2023-09-08 2023-11-21 江西省精彩纵横采购咨询有限公司 Information identification method and system for bidding documents

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103943107A (en) * 2014-04-03 2014-07-23 北京大学深圳研究生院 Audio/video keyword identification method based on decision-making level fusion
CN105740900A (en) * 2016-01-29 2016-07-06 百度在线网络技术(北京)有限公司 Information identification method and apparatus
CN108419123A (en) * 2018-03-28 2018-08-17 广州市创新互联网教育研究院 A kind of virtual sliced sheet method of instructional video
CN108549626A (en) * 2018-03-02 2018-09-18 广东技术师范学院 A kind of keyword extracting method for admiring class
CN108806668A (en) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of audio and video various dimensions mark and model optimization method
CN109147763A (en) * 2018-07-10 2019-01-04 深圳市感动智能科技有限公司 A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting
CN109697973A (en) * 2019-01-22 2019-04-30 清华大学深圳研究生院 A kind of method, the method and device of model training of prosody hierarchy mark

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11074495B2 (en) * 2013-02-28 2021-07-27 Z Advanced Computing, Inc. (Zac) System and method for extremely efficient image and pattern recognition and artificial intelligence platform
CN108962247B (en) * 2018-08-13 2023-01-31 南京邮电大学 Multi-dimensional voice information recognition system and method based on progressive neural network
CN109871446B (en) * 2019-01-31 2023-06-06 平安科技(深圳)有限公司 Refusing method in intention recognition, electronic device and storage medium
CN109979439B (en) * 2019-03-22 2021-01-29 泰康保险集团股份有限公司 Voice recognition method, device, medium and electronic equipment based on block chain

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103943107A (en) * 2014-04-03 2014-07-23 北京大学深圳研究生院 Audio/video keyword identification method based on decision-making level fusion
CN105740900A (en) * 2016-01-29 2016-07-06 百度在线网络技术(北京)有限公司 Information identification method and apparatus
CN108549626A (en) * 2018-03-02 2018-09-18 广东技术师范学院 A kind of keyword extracting method for admiring class
CN108419123A (en) * 2018-03-28 2018-08-17 广州市创新互联网教育研究院 A kind of virtual sliced sheet method of instructional video
CN108806668A (en) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of audio and video various dimensions mark and model optimization method
CN109147763A (en) * 2018-07-10 2019-01-04 深圳市感动智能科技有限公司 A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting
CN109697973A (en) * 2019-01-22 2019-04-30 清华大学深圳研究生院 A kind of method, the method and device of model training of prosody hierarchy mark

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MAX-POOLING LOSS TRAINING OF LONG SHORT-TERM MEMORY NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING;Ming Sun et al;《arXiv》;第1-7页 *
Zero-shot keyword spotting for visual speech recognition in-the-wild;Themos Stafylakis et al;《Proceedings of the European Conference on Computer Vision (ECCV)》;第513-529页 *

Also Published As

Publication number Publication date
CN110619035A (en) 2019-12-27
WO2021017296A1 (en) 2021-02-04

Similar Documents

Publication Publication Date Title
CN110619035B (en) Method, device, equipment and storage medium for identifying keywords in interview video
CN109117777B (en) Method and device for generating information
AU2016256753B2 (en) Image captioning using weak supervision and semantic natural language vector space
US9858340B1 (en) Systems and methods for queryable graph representations of videos
CN110909725A (en) Method, device and equipment for recognizing text and storage medium
US20180157743A1 (en) Method and System for Multi-Label Classification
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
CN111190939A (en) User portrait construction method and device
CN113949582B (en) Network asset identification method and device, electronic equipment and storage medium
US9317887B2 (en) Similarity calculating method and apparatus
CN113434716A (en) Cross-modal information retrieval method and device
CN115130711A (en) Data processing method and device, computer and readable storage medium
CN111476189B (en) Identity recognition method and related device
CN112235470A (en) Incoming call client follow-up method, device and equipment based on voice recognition
CN116306679A (en) Semantic configurable multi-mode intelligent customer service dialogue based method and system
CN112948550B (en) Schedule creation method and device and electronic equipment
CN114390368A (en) Live video data processing method and device, equipment and readable medium
CN110265024A (en) Requirement documents generation method and relevant device
CN116701637B (en) Zero sample text classification method, system and medium based on CLIP
CN111538998B (en) Text encryption method and device, electronic equipment and computer readable storage medium
CN112333182B (en) File processing method, device, server and storage medium
CN115035453A (en) Video title and tail identification method, device and equipment and readable storage medium
CN115311664A (en) Method, device, medium and equipment for identifying text type in image
CN115019915A (en) Method, device, equipment and medium for generating flow regulation report based on semantic recognition
CN114639044A (en) Label determining method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant