CN111428488A

CN111428488A - Resume data information analyzing and matching method and device, electronic equipment and medium

Info

Publication number: CN111428488A
Application number: CN202010151399.9A
Authority: CN
Inventors: 侯丽; 周慧娟
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-03-06
Filing date: 2020-03-06
Publication date: 2020-07-17
Also published as: WO2021174919A1

Abstract

The invention provides a resume data information analyzing and matching method, a resume data information analyzing and matching device, electronic equipment and a medium. The method can preprocess the called resume to obtain the resume to be analyzed, construct a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary to segment the resume to be analyzed, further, the word segmentation result of the resume to be analyzed can be quickly obtained, the resume text is obtained, the co-occurrence matrix is further constructed according to the resume text, determining keywords of the resume text based on the co-occurrence matrix, acquiring word sequences in the keywords, performing word representation processing on the word sequences by using a word representation model to obtain word representations of the word sequences, improving the analytic effect, inputting the word representations into a resume label analytic model, and obtaining a resume label sequence, further calculating the similarity between each label in the resume label sequence and the label of each post to determine the resume matched with each post, and realizing quick and accurate intelligent matching of the post and the resume.

Description

Resume data information analyzing and matching method and device, electronic equipment and medium

Technical Field

The invention relates to the technical field of data processing, in particular to a resume data information analyzing and matching method, a resume data information analyzing and matching device, electronic equipment and a resume data information matching medium.

Background

In the prior art, manual screening is usually required when resume matching is performed, and resumes associated with posts are matched, which not only consumes a large amount of labor cost, but also consumes a long time.

However, the current intelligent screening of resumes only stays in the primary stage of removing some resumes which do not meet the requirements (for example, screening resumes which do not meet the learning condition), and automatic matching of the posts and the resumes cannot be realized.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a resume data information parsing and matching method, device, electronic device and medium, which can implement fast and accurate intelligent matching between a post and a resume.

A resume data information analyzing and matching method comprises the following steps:

retrieving the resume from the database, and preprocessing the retrieved resume to obtain the resume to be analyzed;

constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;

constructing a co-occurrence matrix according to the resume text subjected to word segmentation processing, and determining keywords of the resume text based on the co-occurrence matrix;

acquiring a word sequence in the keyword, and processing the word sequence by using a word representation model to obtain a word representation of the word sequence;

inputting the word representation into a constructed resume label analysis model to obtain a predicted resume label sequence;

and calculating the similarity between each label in the resume label sequence and the label of each position, and determining the resume matched with each position from the resume to be analyzed according to the calculated similarity.

According to a preferred embodiment of the present invention, the preprocessing the retrieved resume includes:

and performing stop word processing on the called resume by adopting a stop word list filtering method.

According to a preferred embodiment of the present invention, the constructing a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:

constructing the co-occurrence matrix according to the occurrence frequency of each word segmentation in the resume text;

extracting the word frequency and the angle of each participle from the co-occurrence matrix;

calculating the score of each participle according to the word frequency and the angle of each participle;

and outputting each word segmentation in a descending order according to the score of each word segmentation to obtain the keywords of the resume text.

According to the preferred embodiment of the present invention, after obtaining the keywords of the resume text, the method further includes:

and when the adjacent times of two keywords in the same document are more than a preset value, combining the two keywords into a new keyword.

According to a preferred embodiment of the present invention, the performing word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence includes:

inputting a word sequence in the keyword into the word representation model, and generating a first vector containing the word sequence and the above information of the word sequence by reading the word sequence in a forward direction, and generating a second vector containing the word sequence and the below information of the word sequence by reading the word sequence in a reverse direction;

and connecting the first vector and the second vector to obtain a word representation comprising the word sequence and the context information of the word sequence.

According to a preferred embodiment of the invention, the method further comprises:

acquiring resume data;

splitting the resume data to obtain a training set and a verification set;

training a CRF (cross domain similarity) model by using the verification set, and predicting a target label sequence by using a conditional log-likelihood function and a maximum score formula;

validating the target tag sequence with the validation set;

and when the target label sequence passes verification, stopping training and obtaining the resume label analysis model.

According to a preferred embodiment of the present invention, the calculating the similarity between each tag in the resume tag sequence and the tag of each position, and determining the resume matching each position from the resume to be parsed according to the calculated similarity includes:

calculating the cosine distance between each label and the label of each post;

when the cosine distance between the target label and the target position is smaller than or equal to a preset distance, calling a target resume corresponding to the target label from the resume to be analyzed;

determining that the target resume matches the target post.

A resume data information parsing and matching device, the device comprising:

the preprocessing unit is used for calling the resume from the database and preprocessing the called resume to obtain the resume to be analyzed;

the construction unit is used for constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;

the determining unit is used for constructing a co-occurrence matrix according to the resume texts subjected to word segmentation processing and determining keywords of the resume texts based on the co-occurrence matrix;

the processing unit is used for acquiring a word sequence in the keyword and processing the word sequence by using a word representation model to obtain a word representation of the word sequence;

the prediction unit is used for inputting the word representation into the constructed resume label analysis model to obtain a predicted resume label sequence;

the determining unit is further configured to calculate similarity between each tag in the resume tag sequence and each post tag, and determine a resume matched with each post from the to-be-analyzed resume according to the calculated similarity.

According to a preferred embodiment of the present invention, the preprocessing unit is specifically configured to:

According to a preferred embodiment of the present invention, the determining unit constructs a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:

According to a preferred embodiment of the invention, the apparatus further comprises:

and the merging unit is used for merging the two keywords into a new keyword when the adjacent times of the two keywords in the same document are more than a preset value after the keywords of the resume text are obtained.

According to a preferred embodiment of the present invention, the processing unit is specifically configured to:

an acquisition unit configured to acquire resume data;

the splitting unit is used for splitting the resume data to obtain a training set and a verification set;

the training unit is used for training a CRF model by using the verification set and predicting a target label sequence by using a conditional log-likelihood function and a maximum score formula;

a verification unit for verifying the target tag sequence with the verification set;

and the training unit is also used for stopping training and obtaining the resume label analysis model when the target label sequence passes verification.

According to a preferred embodiment of the present invention, the determining unit calculates the similarity between each tag in the resume tag sequence and the tag of each position, and determines the resume matching each position from the resume to be parsed according to the calculated similarity includes:

calculating the cosine distance between each label and the label of each post;

determining that the target resume matches the target post.

An electronic device, the electronic device comprising:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the resume data information analysis and matching method.

A computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executable by a processor in an electronic device to implement the resume data information parsing and matching method.

It can be seen from the above technical solutions that the present invention can retrieve resumes from a database, preprocess the retrieved resumes to obtain resumes to be parsed, construct a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, segment the resumes to be parsed according to the constructed word segmentation directed acyclic graph to obtain resume texts after word segmentation processing, further rapidly obtain word segmentation results of the resumes to be parsed, further construct a co-occurrence matrix according to the resume texts, determine keywords of the resume texts based on the co-occurrence matrix, obtain word sequences in the keywords, process the word sequences by using a word representation model to obtain word representations of the word sequences, enhance the parsing effect, input the word representations into the constructed resume label parsing model to obtain predicted resume label sequences, further calculate the similarity between each label in the resume label sequences and each post label, and determining the resume matched with each post from the resumes to be analyzed according to the calculated similarity, thereby realizing the quick and accurate intelligent matching of the posts and the resumes.

Drawings

FIG. 1 is a flowchart illustrating a method for parsing and matching resume data information according to a preferred embodiment of the present invention.

FIG. 2 is a functional block diagram of a preferred embodiment of the resume data information parsing and matching apparatus of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device implementing a resume data information parsing and matching method according to a preferred embodiment of the invention.

FIG. 4 is a diagram of a co-occurrence matrix in a preferred embodiment of the method for parsing and matching resume data information according to the invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flowchart illustrating a method for parsing and matching resume data information according to a preferred embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The resume data information analyzing and matching method is applied to one or more electronic devices, which are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), an intelligent wearable device, and the like.

The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a cloud computing (cloud computing) based cloud consisting of a large number of hosts or network servers.

The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

And S10, retrieving the resume from the database, and preprocessing the retrieved resume to obtain the resume to be analyzed.

In at least one embodiment of the present invention, the database may be a database in communication with the electronic device, or an internal database of the electronic device, and may be configured in a customized manner according to different requirements.

For example: the database may be a talent bank. And the electronic equipment calls and arranges the resume from the talent library to obtain a large number of resumes. The resume may be summarized as a set of nouns { name, gender, birthday, political, school, academic, specialty, contact, native, educational, skills … … }, each of which has an expanded description and each of which has a separator. Due to the specificity of social behavior of job hunting and human-to-human simulation, many job hunters have considerable commonality in describing their own characteristics. The electronic equipment analyzes the resume containing the contents of interest and concern of the resume selector from a large number of resumes with commonalities to form a limited resume set which is approximately converged and is used as the called resume.

In at least one embodiment of the invention, since the same person is likely to send a plurality of resumes in the job hunting process, the electronic device can firstly remove the repeated resumes, so as to realize the deduplication of the resumes.

Further, since there are some redundant stop words in the resume, which also adversely affect the parsing, it is also necessary to eliminate the stop words, i.e. to pre-process the called resume.

Specifically, the electronic device preprocessing the retrieved resume includes:

and the electronic equipment adopts a deactivation vocabulary filtering method to perform deactivation word processing on the called resume.

The stop words are words without practical meaning in the text data function words, have no influence on the classification of the text, but have high occurrence frequency, and specifically include common pronouns, prepositions and the like. The stop words may reduce the accuracy of the text classification effect.

Further, the electronic device may match words in the called resume with a pre-constructed stop word list one by one, and if the matching is successful, the word is a stop word, and the electronic device deletes the word.

And S11, constructing a word segmentation directed acyclic graph according to the pre-constructed word segmentation dictionary, and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain the resume text subjected to word segmentation.

In at least one embodiment of the present invention, the segmentation dictionary may include a prefix dictionary, a custom dictionary, or the like.

Wherein the prefix dictionary includes prefixes of each participle in a statistical dictionary, such as: prefixes of the word "Beijing university" in the dictionary are "Beijing", "Beijing Dada", respectively; the word "university" is prefixed by "large"; the custom dictionary may also be called a proper noun dictionary, which is a word that does not exist in the statistical dictionary but is specific and special in a certain field, such as resume, work experience, and the like.

Further, the electronic device constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, wherein each word corresponds to one directed edge in the graph and is assigned with a corresponding edge length (weight). Furthermore, the electronic device obtains length values in all paths from the starting point to the end point, and arranges the length values in a strict ascending order (that is, the values at any two different positions are not equal and are the same hereinafter), and the length values are sequentially the 1 st, 2 nd, … th, i th, … th and N th path sets as corresponding rough-scoring result sets. If the lengths of two or more paths are equal, the lengths of the two or more paths are listed as the ith, the coarse scoring result set is listed, the arrangement serial numbers of other paths are not influenced, and the size of the final coarse scoring result set is larger than or equal to N, so that the resume text subjected to word segmentation is obtained.

Through the implementation mode, the word segmentation result of the resume text can be quickly obtained by utilizing the word segmentation dictionary and the directed acyclic graph.

S12, constructing a co-occurrence matrix according to the resume text subjected to word segmentation processing, and determining the key words of the resume text based on the co-occurrence matrix.

In at least one embodiment of the present invention, the electronic device constructs a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:

the electronic equipment constructs the co-occurrence matrix according to the occurrence frequency of each participle in the resume text, extracts the word frequency (freq) and degree (deg) of each participle from the co-occurrence matrix, calculates the score of each participle according to the word frequency and degree of each participle, and further outputs each participle in a descending order according to the score of each participle to obtain the keywords of the resume text.

For example: and the electronic equipment outputs each word segmentation in a descending order according to the score of each word segmentation to obtain the top n words, and for example, the top 1/3 words are output in a descending order according to the size of score to serve as the keywords of the resume text.

The co-occurrence matrix is obtained by counting the co-occurrence times of words in a window with a preset size, and taking the times of the co-occurrence words around the words as the vector of the current words.

For example, when the resume text has the following corpora:

i are adept at studying. (the language material includes the participles of "I", "good", "research" and "", the two language materials below adopt similar participles and will not be listed one by one)

I adept at programming.

I enjoy reading.

The co-occurrence matrix X constructed according to the corpus in the resume text is shown in fig. 4. In at least one embodiment of the present invention, after obtaining the keywords of the resume text, the method further includes:

and when the adjacent times of two keywords in the same document are more than a preset value, the electronic equipment combines the two keywords into a new keyword.

Wherein the preset value may be 2 times and the like.

Through the implementation mode, similar keywords can be further combined, and redundant keywords are avoided.

And S13, acquiring the word sequence in the keyword, and performing word representation processing on the word sequence by using a word representation model to obtain the word representation of the word sequence.

In at least one embodiment of the present invention, the processing, by the electronic device, the word sequence by using a word representation model to obtain a word representation of the word sequence includes:

the electronic equipment inputs a word sequence in the keyword into the word representation model, generates a first vector containing the word sequence and the context information of the word sequence by reading the word sequence in a forward direction, generates a second vector containing the word sequence and the context information of the word sequence by reading the word sequence in a reverse direction, and connects the first vector and the second vector to obtain a word representation containing the word sequence and the context information of the word sequence.

For example: word sequence Char ═ for a given unstructured text resume containing n keywords (Char)₁,char₂…,char_n) Wherein char_nInputting the unstructured text word sequence into a word representation model, modeling the word sequence by using the word representation model, and reading the word sequence in the forward direction to generate a vector containing the word sequence and the text information on the word sequence, wherein the vector is represented by CharF_iSimilarly, by reading the word sequence in reverse, a vector is generated comprising the word sequence and information underlying the word sequence, denoted CharB_iThen, the CharF_iAnd CharB_iConcatenating to form a word representation comprising the word sequence and context information:

Wd＝[CharF_i:CharB_i]

accordingly, the electronic device obtains a word representation of the word sequence.

In natural language processing, symbolic information such as "words" can be expressed in a mathematical vector form using various word expression models. The vector representation of the word may be used as input to various machine learning models. Existing word representation models can include two broad categories: one is syntagmatic models and one is paradigmatic models.

Further, for the word expression, the electronic device may further perform formatting processing on the word expression by using regular expression matching, and further analyze, classify and store the word expression in a designated database for subsequent use.

And S14, inputting the word representation into the constructed resume label analysis model to obtain a predicted resume label sequence.

In at least one embodiment of the present invention, the resume label analysis model is obtained by training a large amount of resume data as a training sample and performing verification with a verification set. And analyzing the unstructured word representation by using the resume label analysis model, and outputting corresponding labels to form the resume label sequence.

For example: the tags in the resume tag sequence may include, but are not limited to: this student, the research student, the skilled WORD, etc.

In at least one embodiment of the invention, the method further comprises:

the electronic equipment acquires resume data, splits the resume data to obtain a training set and a verification set, further trains a CRF (learning random number) model by using the verification set, predicts a target label sequence by adopting a conditional log-likelihood function and a maximum score formula, verifies the target label sequence by using the verification set, and stops training and obtains a resume label analysis model when the target label sequence passes verification.

Wherein, the label sequence refers to the predicted most suitable label sequence.

Specifically, the electronic device is modeled using a CRF (conditional random field). Assume that the output target sequence (i.e. the corresponding tag sequence) for obtaining the keyword information of the unstructured text is: y ═ y₁,…y_n). In order to effectively obtain the target sequence of the unstructured text resume information, the score formula of the model is defined as follows:

wherein, P represents the output score moment of the bidirectional L STM algorithm (L ong short-term memory algorithm)An array of size n × k, k representing the number of target tags that are summary ratings for the resume, n representing the length of the word sequence, a representing the transition score matrix, when j is 0, y₀Indicating the start of a sequence, when j equals n, y_n+1Indicating an end of sequence marker, the size of the a square matrix is k + 2.

The probability of generating the target sequence y by the CRF is as follows on the label sequences of all resume information:

wherein, Y_WdRepresenting all possible tag sequences corresponding to the resume information sequence Wd. In the training process, in order to obtain the label sequence with correct resume information, a conditional log-likelihood function which maximizes the correct label sequence is adopted for calculation, and the most suitable label sequence is predicted by using a maximum score formula:

through the implementation mode, the accuracy of the model can be improved by combining the conditional log-likelihood function and the maximum score formula.

And S15, calculating the similarity between each label in the resume label sequence and the label of each position, and determining the resume matched with each position from the resume to be analyzed according to the calculated similarity.

In at least one embodiment of the present invention, the electronic device calculates similarity between each tag in the resume tag sequence and a tag of each position, and determining the resume matching each position from the resume to be parsed according to the calculated similarity includes:

and the electronic equipment calculates the cosine distance between each label and the label of each post, and when the cosine distance between a target label and the target post is smaller than or equal to a preset distance, the electronic equipment calls the target resume corresponding to the target label from the resume to be analyzed and determines that the target resume is matched with the target post.

Specifically, the cosine distance is a measure for measuring the difference between two individuals by using a cosine value of an included angle between two vectors in a vector space, and the closer the cosine value is to 1, the closer the included angle is to 0 degree, namely the more similar the two vectors are.

For example: calculating the obtained resume label sequence X and resume label sequence Y required by the position of job entry by using the following formula, wherein X is_iRepresents the ith vector, Y, in the resume tag sequence X_iThe ith vector in the resume label sequence Y required for representing the position of job entry:

the similarity range produced is from-1 to 1, where-1 means that the two vectors point in exactly opposite directions, 1 means that their points are exactly the same, 0 usually means that they are independent, and the value between them means moderate similarity or dissimilarity, according to which algorithm we can choose the highly similar profile of the label for each position to enter into the job of fast matching.

In at least one embodiment of the present invention, the electronic device may further express the resume label sequence by the score according to the obtained resume label sequence and the corresponding configured weight (for example, the weight occupied by the student label in the resume score is 0.2, and the weight occupied by the student label in the resume score is 0.1), and further quickly screen out the required staff according to the score.

It can be seen from the above technical solutions that the present invention can retrieve resumes from a database, preprocess the retrieved resumes to obtain resumes to be parsed, construct a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, segment the resumes to be parsed according to the constructed word segmentation directed acyclic graph to obtain resume texts after the word segmentation processing, further rapidly obtain word segmentation results of the resumes to be parsed, further construct a co-occurrence matrix according to the resume texts, determine keywords of the resume texts based on the co-occurrence matrix, obtain word sequences in the keywords, process the word sequences by using a word representation model to obtain word representations of the word sequences, enhance the parsing effect, input the word representations into the constructed resume label parsing model to obtain predicted resume label sequences, further calculate the similarity between each label in the resume label sequences and each post label, and determining the resume matched with each post from the resumes to be analyzed according to the calculated similarity, thereby realizing the quick and accurate intelligent matching of the posts and the resumes.

Fig. 2 is a functional block diagram of a preferred embodiment of the resume data information parsing and matching apparatus according to the present invention. The resume data information analyzing and matching device 11 includes a preprocessing unit 110, a constructing unit 111, a determining unit 112, a processing unit 113, a predicting unit 114, a merging unit 115, a training unit 116, an obtaining unit 117, a splitting unit 118, and a verifying unit 119. The module/unit referred to in the present invention refers to a series of computer program segments that can be executed by the processor 13 and that can perform a fixed function, and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The preprocessing unit 110 retrieves the resume from the database and preprocesses the retrieved resume to obtain the resume to be analyzed.

In at least one embodiment of the present invention, the database may be a database in communication with an electronic device, or an internal database of the electronic device, and may be configured in a customized manner according to different requirements.

For example: the database may be a talent bank. The preprocessing unit 110 retrieves and arranges the resume from the talent bank to obtain a large number of resumes. The resume may be summarized as a set of nouns { name, gender, birthday, political, school, academic, specialty, contact, native, educational, skills … … }, each of which has an expanded description and each of which has a separator. Due to the specificity of social behavior of job hunting and human-to-human simulation, many job hunters have considerable commonality in describing their own characteristics. The preprocessing unit 110 parses out the resume including the content of interest and concern of the resume picker from a large number of resumes having commonalities, and forms a limited resume set that is approximately converged as the retrieved resume.

In at least one embodiment of the invention, since the same person is likely to send a plurality of resumes in the job hunting process, repeated resumes can be removed first, thereby implementing the deduplication of resumes.

Specifically, the preprocessing unit 110 preprocesses the retrieved resume, including:

the preprocessing unit 110 performs stop word processing on the called resume by using a stop word list filtering method.

Further, the preprocessing unit 110 may match a word in the called resume with a pre-constructed stop word list one by one, and if the matching is successful, the word is a stop word, and the preprocessing unit 110 deletes the word.

The construction unit 111 constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, and segments the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation.

Further, the constructing unit 111 constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, wherein each word corresponds to one directed edge in the graph and is assigned to a corresponding edge length (weight). Further, the building unit 111 determines length values of all paths from the starting point to the ending point, and arranges the length values in a strictly ascending order (i.e. the values at any two different positions are not equal, and the same below), and sequentially sets the 1 st, 2 nd, … th, i th, … th and N th paths as corresponding rough-scoring result sets. If the lengths of two or more paths are equal, the lengths of the two or more paths are listed as the ith, the coarse scoring result set is listed, the arrangement serial numbers of other paths are not influenced, and the size of the final coarse scoring result set is larger than or equal to N, so that the resume text subjected to word segmentation is obtained.

The determining unit 112 constructs a co-occurrence matrix according to the resume text, and determines a keyword of the resume text based on the co-occurrence matrix.

In at least one embodiment of the present invention, the determining unit 112 constructs a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:

the determining unit 112 constructs the co-occurrence matrix according to the occurrence frequency of each participle in the resume text, extracts the word frequency (freq) and degree (deg) of each participle from the co-occurrence matrix, and the determining unit 112 calculates the score of each participle according to the word frequency and degree of each participle, and further performs descending output on each participle according to the score of each participle to obtain the keyword of the resume text.

For example: the determining unit 112 outputs each word in a descending order according to the score of each word, to obtain the top n words, for example, outputting the top 1/3 words in a descending order according to the score size as the keywords of the resume text.

For example, when the resume text has the following corpora:

I adept at programming.

I enjoy reading.

when the number of times that two keywords are adjacent in the same document is greater than a preset value, the merging unit 115 merges the two keywords into a new keyword.

Wherein the preset value may be 2 times and the like.

The processing unit 113 obtains a word sequence in the keyword, and performs word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence.

In at least one embodiment of the present invention, the processing unit 113 processes the word sequence by using a word representation model, and obtaining a word representation of the word sequence includes:

the processing unit 113 inputs a word sequence in the keyword into the word representation model, generates a first vector including the word sequence and the context information of the word sequence by reading the word sequence in a forward direction, generates a second vector including the word sequence and the context information of the word sequence by reading the word sequence in a reverse direction, and the processing unit 113 connects the first vector and the second vector to obtain a word representation including the word sequence and the context information of the word sequence.

For example: word sequence Char ═ for a given unstructured text resume containing n keywords (Char)₁,char₂…,char_n) Wherein char_nInputting the unstructured text word sequence into a word representation model, modeling the word sequence by using the word representation model, and reading the word sequence in the forward direction to generate a vector containing the word sequence and the text information on the word sequence, wherein the vector is represented by CharF_iSimilarly, by reading the word sequence in reverse, a vector is generated comprising the word sequence and information underlying the word sequence, denoted CharB_iThen, the CharG_iAnd CharB_iConcatenating to form a word representation comprising the word sequence and context information:

Wd＝[CharF_i:CharB_i]

accordingly, the processing unit 113 obtains a word representation of the word sequence.

The prediction unit 114 inputs the word representation into the constructed resume label analysis model to obtain a predicted resume label sequence.

In at least one embodiment of the invention, training the resume label parsing model comprises:

the obtaining unit 117 obtains resume data, the splitting unit 118 splits the resume data to obtain a training set and a verification set, further, the verification unit 119 trains a CRF model by using the verification set, the training unit 116 predicts a target label sequence by using a conditional log-likelihood function and a maximum score formula, verifies the target label sequence by using the verification set, and when the target label sequence passes verification, the training unit 116 stops training and obtains the resume label analysis model.

Specifically, the training unit 116 uses a CRF (conditional random field) for modeling. Assume that the output target sequence (i.e. the corresponding tag sequence) for obtaining the keyword information of the unstructured text is: y ═ y₁,…y_n). In order to effectively obtain the target sequence of the unstructured text resume information, the score formula of the model is defined as follows:

where P denotes the output score matrix of the bi-directional L STM algorithm (L ong short-term memory algorithm), with the size n × k, k denotes the number of target tags that are summary ratings for the resume, n denotes the length of the word sequence, a denotes the transition score matrix, when j is 0, y denotes the transition score matrix₀It is indicated that a flag for the start of a sequence,when j is n, y_n+1Indicating an end of sequence marker, the size of the a square matrix is k + 2.

wherein, Y_WdRepresenting all possible tag sequences corresponding to the resume information sequence Wd. In the training process, in order to obtain the label sequence with correct resume information, the training unit 116 calculates a conditional log-likelihood function that maximizes the correct label sequence, and predicts the most suitable label sequence using a maximum score formula:

The determining unit 112 calculates the similarity between each label in the resume label sequence and the label of each position, and determines the resume matched with each position from the resume to be analyzed according to the calculated similarity.

In at least one embodiment of the present invention, the determining unit 112 calculates a similarity between each tag in the resume tag sequence and a tag of each position, and determining the resume matching each position from the resume to be parsed according to the calculated similarity includes:

the determining unit 112 calculates a cosine distance between each tag and each post tag, and when the cosine distance between a target tag and a target post is less than or equal to a preset distance, the determining unit 112 retrieves a target resume corresponding to the target tag from the resume to be analyzed, and determines that the target resume is matched with the target post.

In at least one embodiment of the present invention, the determining unit 112 may further express the resume label sequence by the score according to the obtained resume label sequence and the corresponding configured weight (for example, the weight occupied by the student label in the resume score is 0.2, and the weight occupied by the student label in the resume score is 0.1), and further quickly screen out the required staff according to the score.

The technical scheme can show that the method can call the resume from the database, preprocesses the called resume to obtain the resume to be analyzed, constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, segments the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain the resume text, further can quickly obtain the word segmentation result of the resume to be analyzed, further constructs a co-occurrence matrix according to the resume text, determines the keywords of the resume text based on the co-occurrence matrix, obtains the word sequences in the keywords, processes the word sequences by using a word representation model to obtain the word representation of the word sequences, improves the analysis effect, inputs the word representation into the constructed resume label analysis model to obtain the predicted resume label sequence, further calculates the similarity between each label in the resume label sequence and each post label, and determining the resume matched with each post from the resumes to be analyzed according to the calculated similarity, thereby realizing the quick and accurate intelligent matching of the posts and the resumes.

Fig. 3 is a schematic structural diagram of an electronic device implementing a resume data information parsing and matching method according to a preferred embodiment of the present invention.

The electronic device 1 may include a memory 12, a processor 13 and a bus, and may further include a computer program, such as a resume data information parsing and matching program, stored in the memory 12 and executable on the processor 13.

It will be understood by those skilled in the art that the schematic diagram is merely an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-type structure, the electronic device 1 may further include more or less hardware or software than those shown in the figures, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, and the like.

It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.

The memory 12 includes at least one type of readable storage medium, which includes flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, for example a removable hard disk of the electronic device 1. The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 12 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of resume data information analysis and matching programs, but also to temporarily store data that has been output or is to be output.

The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the electronic device 1 by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a resume data information parsing and matching program and the like) stored in the memory 12 and calling data stored in the memory 12.

The processor 13 executes an operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps in the above embodiments of resume data information parsing and matching method, such as steps S10, S11, S12, S13, S14, and S15 shown in fig. 1.

Alternatively, the processor 13, when executing the computer program, implements the functions of the modules/units in the above device embodiments, for example:

constructing a co-occurrence matrix according to the resume text after word segmentation processing, and determining keywords of the resume text based on the co-occurrence matrix;

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the electronic device 1. For example, the computer program may be partitioned into a preprocessing unit 110, a construction unit 111, a determination unit 112, a processing unit 113, a prediction unit 114, a merging unit 115, a training unit 116, an acquisition unit 117, a splitting unit 118, a verification unit 119.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.

The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 or the like.

Although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 13 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard (Keyboard)), optionally, a standard wired interface, a wireless interface, optionally, in some embodiments, the Display may be an L ED Display, a liquid crystal Display, a touch-sensitive liquid crystal Display, an O L ED (Organic light-Emitting Diode) touch-sensitive device, etc.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

Fig. 3 only shows the electronic device 1 with components 12-13, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.

Referring to fig. 1, the memory 12 in the electronic device 1 stores a plurality of instructions to implement a resume data information parsing and matching method, and the processor 13 can execute the plurality of instructions to implement:

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A resume data information analyzing and matching method is characterized by comprising the following steps:

acquiring a word sequence in the keyword, and performing word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence;

2. The resume data information parsing and matching method of claim 1, wherein the preprocessing the retrieved resume comprises:

3. The resume data information parsing and matching method of claim 1, wherein the constructing a co-occurrence matrix according to the resume text and determining keywords of the resume text based on the co-occurrence matrix comprises:

4. The resume data information parsing and matching method of claim 3, wherein after obtaining the keywords of the resume text, the method further comprises:

5. The resume data information parsing and matching method of claim 1, wherein the performing word representation processing on the word sequence by using a word representation model to obtain word representations of the word sequence comprises:

6. The resume data information parsing and matching method of claim 1, wherein the method further comprises:

acquiring resume data;

splitting the resume data to obtain a training set and a verification set;

validating the target tag sequence with the validation set;

7. The resume data information parsing and matching method of claim 1, wherein the calculating the similarity between each tag in the resume tag sequence and each post tag, and determining the resume matching each post from the resume to be parsed according to the calculated similarity comprises:

calculating the cosine distance between each label and the label of each post;

determining that the target resume matches the target post.

8. A resume data information parsing and matching device is characterized in that the device comprises:

the processing unit is used for acquiring a word sequence in the keyword, and performing word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence;

9. An electronic device, characterized in that the electronic device comprises:

a memory storing at least one instruction; and

a processor executing instructions stored in the memory to implement the resume data information parsing and matching method of any of claims 1 to 7.

10. A computer-readable storage medium characterized by: the computer-readable storage medium stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the resume data information parsing and matching method according to any one of claims 1 to 7.