CN113111187B

CN113111187B - Method and system for mining employment platform comments

Info

Publication number: CN113111187B
Application number: CN202110369952.0A
Authority: CN
Inventors: 吴方同; 吴晓军
Original assignee: Hebei Jilian Human Resources Service Group Co ltd
Current assignee: Hebei Jilian Human Resources Service Group Co ltd
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2023-03-10
Anticipated expiration: 2041-04-07
Also published as: CN113111187A

Abstract

The invention provides a method and a system for mining employment platform comments, which are based on natural language processing, acquire employment platform worker comment data, store the comment data in a comment data table, identify the comment data as new data, construct a recruitment unit word library and a characteristic vector matrix based on the comment data, analyze recruitment unit co-occurrence frequency analysis, and output display co-occurrence frequency according to word frequency data, so that enterprises with similar working properties can be found by analyzing the recruitment unit co-occurrence frequency, resumes can be pushed to recruitment enterprises as required, and the calculation speed of an algorithm and the recruitment unit resume matching efficiency are improved.

Description

Method and system for mining employment platform comments

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method and a system for mining employment platform comments.

Background

Today, the form of network recruitment is gradually diversified, and accurate resume pushing is a new technical means capable of quickly obtaining working opportunities and quickly matching suitable workers for recruiters and applicants.

In the prior art, resume matching pushing is realized through a neural network algorithm, and worker data suitable for a certain industry and a certain enterprise are matched in a deep learning mode, which is common, but in the technologies, a complex mathematical model is generally required to be established, a large amount of big data calculation is carried out, the consumed time is long, and the efficiency is not high. Based on this, a solution for rapidly obtaining and pushing the resume of people with high accuracy is needed in the art.

Disclosure of Invention

Based on the problems, the invention provides a method and a system for mining employment platform comments, which are based on natural language processing, acquire employment platform comment data, construct a word bank of employment units and a feature vector matrix based on the comment data, analyze the co-occurrence frequency of the employment units so as to find out enterprises with similar working properties, and push resumes to recruitment enterprises as required.

In order to achieve the purpose, the invention provides a method for mining employment platform comments,

step 101, obtaining comment data of workers on a recruitment platform, storing the comment data in a comment data table, and identifying the comment data as new data;

102, constructing an employment unit word stock;

103, constructing a post work type word library;

104, constructing a feature vector matrix;

105, analyzing co-occurrence frequency;

and 106, displaying the co-occurrence frequency according to the word frequency data output.

Further, the method comprises periodically traversing the newly acquired comment data;

further, the building of the recruitment unit thesaurus specifically comprises:

(1) Loading new comments into text collections XText ⁱ (j) Wherein i represents the number of new comments, and j is the jth comment;

(2) Judging whether the new comment data contains recruitment unit information or not by using an indexOf () function, and judging whether the new comment data contains recruitment unit information or not when the XText function is used ⁱ (j) indexOf ("company") = -1, or XText ⁱ (j) Indiexof ("unit") = -1,xtext ⁱ (j) Indiexof ("plant") = -1,xtext ⁱ (j) indexOf ("factory") = = -1, consider that the review data contains recruitment unit information;

(3) For data XText containing recruitment unit information ⁱ (j) Introducing a jieba word segmentation function, segmenting the comment data, and defining a word segmentation chain Dword ⁿ (w) wherein: n =1 denotes a noun, n =2 denotes a verb, n =3 denotes an adjective, n =4 denotes a quantifier, n =5 denotes a pronoun, n =6 denotes an adverb, n =7 denotes a preposition, n =8 denotes a conjunct, n =9 denotes an auxiliary word, n =10 denotes an exclamation, n =11 denotes an analogistic, W denotes the order of the words, dword denotes the order of the words, and ⁿ (w) values represent specific vocabulary;

(4) Word-segmented text Dword ⁿ (w) processing, if n =1, the participle is a noun, referring to a standard noun dictionary Mdic, judging whether the participle is in a common noun dictionary, if not, feeding back 0 by the function in the common noun dictionary, and if so, jumping to the next vocabulary;

(5) Checking whether the noun already exists in the employment unit library Bdic under the condition that the function return value is 0, and skipping to continue executing if the noun exists;

(6) Using AddDIC (Dword) if the noun position number is less than the work unit appearance position number p ⁿ (w)) function, adding the vocabulary to the thesaurus of employment units.

Further, the constructing the feature vector matrix includes: traversing newly generated recruitment units in the recruitment unit word library, and constructing a characteristic vector matrix corresponding to the recruitment unit word library for each new recruitment unit

Wherein Pp represents the position index of the work unit library, cp represents the position index of the production position work seeds, and e is the co-occurrence number.

Further, the co-occurrence frequency analysis includes:

(1) Loading all comments into a text set Atext;

(2) Introducing a jieba word segmentation function, segmenting the comment Atext data, and defining a word stock chain Aword ⁿ (w) wherein: n =1 represents a noun, n =2 represents a verb, n =3 represents an adjective, n =4 represents a quantitative word, n =5 represents a pronoun, n =6 represents an adverb, n =7 represents a preposition, n =8 represents a conjunctive, n =9 represents a co-word, n =10 represents an exclamation, n =11 represents an ideogram; w represents the order of words; aword ⁿ (w) values represent specific vocabulary;

(3) For thesaurus chain Aword ⁿ (w) performing word frequency analysis on all vocabularies, selecting vocabularies with the occurrence frequency exceeding a threshold value, and constructing a word frequency matrix Aword of a word bank chain ⁿ (w, c), wherein n represents part of speech, w represents lexical position, and c represents word frequency;

(4) According to word frequency matrix Aword of word stock chain ⁿ C in (w, c), constructing a complete binary Huffman tree, generating a corresponding binary code k according to the corresponding position of each word, and constructing a Huffman vector matrix Hword ⁿ (w, c, k), wherein k is used to store the binary code k;

(5) For feature vectors

Cp post work of pp-in-middle unit, and vector matrix Hword comparison ⁿ (w, c, K), acquiring binary code K1 value corresponding to cp position work type of pp employment unit, and judging vector matrix Hword ⁿ Whether each vector in (w, c, k) belongs to a post work and seed library vocabulary of an employment unit, if the vector belongs to a cp post work and seed library of a certain employment unit, extracting corresponding Ki, and calculating cosine distance by using a cosine similarity formula, wherein the formula is as follows:

wherein j represents each component of binary coding K value, co-occurrence words of cp post work types with the former 10 recruitment units with the closest cosine distance as pp recruitment units are selected and added into a co-occurrence word matrix

Where n denotes a part of speech, w denotes a position, c denotes a word frequency, and k denotes a binary coded value. Will be provided with

Is saved to

In (1).

(6) And updating the work unit word library and the post work type word library.

In addition, the invention also provides a system for mining the comments of the employment platform, which comprises the following steps:

the data acquisition module 201 is used for acquiring comment data of workers on a employment platform, storing the comment data in a comment data table, and identifying the comment data as new data;

a unit word library construction module 202, configured to construct an employment unit word library;

the post work kind construction module 203 is used for constructing a post work kind word bank;

a vector matrix module 204, configured to construct a feature vector matrix;

a co-occurrence analysis module 205 for co-occurrence frequency analysis;

and a display module 206, configured to output a display co-occurrence frequency according to the word frequency data.

Further, the data obtaining module 201 periodically traverses the newly obtained comment data;

further, the constructing of the recruitment unit word library by the unit word library constructing module 202 specifically includes:

(2) Judging whether the new comment data contains recruitment unit information by using an indexOf () function, and judging whether the new comment data contains recruitment unit information when XText ⁱ (j) indexOf ("company") = -1, or XText ⁱ (j) Indiexof ("unit") = -1,xtext ⁱ (j) Indiexof ("plant") = -1,xtext ⁱ (j) indexOf ("factory") = = -1, consider that the review data contains recruitment unit information;

(4) Word-segmented text Dword ⁿ (w) processing, if n =1, the participle is a noun, referring to a standard noun dictionary Mdic, judging whether the participle is in the common noun dictionary, if not, feeding back 0 by a function in the common noun dictionary, and if so, jumping to the next vocabulary;

(6) Work orderIf the noun does not exist in the bit word library, if the noun position serial number is less than the work unit appearance position serial number p, addDIC (Dword) is used ⁿ (w)) function, adding the vocabulary to the thesaurus of employment units.

Further, the vector matrix module 204 constructs the feature vector matrix including: traversing newly generated recruitment units in the recruitment unit word bank, and constructing a feature vector matrix corresponding to the recruitment unit word bank for each new recruitment unit

Wherein Pp represents the position index of the work unit library, cp represents the position index of the production post work seeds, and e is the co-occurrence number;

further, the co-occurrence frequency analysis module 205 comprises:

(1) Loading all comments into a text set Atext;

(2) Introducing a jieba word segmentation function, segmenting the comment Atext data, and defining a word stock chain Aword ⁿ (w) wherein: n =1 represents a noun, n =2 represents a verb, n =3 represents an adjective, n =4 represents a quantitative word, n =5 represents a pronoun, n =6 represents an adverb, n =7 represents a preposition, n =8 represents a conjunctive, n =9 represents a co-word, n =10 represents an exclamation, n =11 represents an ideogram; w represents the order of words; aword ⁿ (w) the values represent specific vocabulary;

(4) According to word frequency matrix Aword of word stock chain ⁿ C in (w, c), constructing a complete binary Huffman tree, generating a corresponding binary code k according to the corresponding position of each word, and constructing a Huffman vector matrix Hword ⁿ (w, c, k), wherein k is used to hold the binary code k;

(5) For feature vectors

Cp position work seeds of pp-middle employment unit, and vector matrix Hword comparison ⁿ (w, c, K), acquiring binary code K1 value corresponding to cp position work type of pp employment unit, and judging vector matrix Hword ⁿ Whether each vector in (w, c, k) belongs to a work and seed library vocabulary of an employment unit, if the vector belongs to a cp work and seed library of a certain employment unit, extracting corresponding Ki, and calculating cosine distance by using a cosine similarity formula, wherein the formula is as follows:

Is saved to

In (1).

Furthermore, the invention also provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out a method such as mining with a workbench comment.

The invention provides a method and a system for mining recruitment platform comments, which are based on natural language processing, acquire recruitment platform worker comment data, store the comment data in a comment data table, identify the comment data as new data, construct a recruitment unit word library and a feature vector matrix based on the comment data, analyze the analysis of recruitment unit co-occurrence frequency, output and display co-occurrence frequency according to the word frequency data, realize finding out enterprises with similar working properties by analyzing the recruitment unit co-occurrence frequency, push resumes to recruitment enterprises as required, and improve the calculation speed of an algorithm and the recruitment unit resume matching efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method of mining employment platform reviews in accordance with the present invention;

FIG. 2 is a block diagram of a system architecture for benchmarking mining with benchmarks in accordance with the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a method and a system for mining recruitment platform comments, which are used for acquiring recruitment platform comment data based on natural language processing, constructing a recruitment unit lexicon and a characteristic vector matrix based on the comment data, analyzing the co-occurrence frequency of recruitment units to find out enterprises with similar working properties, and pushing resumes to recruitment enterprises as required.

Firstly, the invention provides a method for mining employment platform comments, and a flow chart is shown as the attached figure 1:

step 101, obtaining comment data of workers of a recruitment platform, storing the comment data in a comment data table, and identifying the comment data as new data;

the data acquisition module H acquires worker platform comment data through a data acquisition algorithm in the prior art, stores the data in a worker platform comment data table in the database D, and identifies the acquired data as new data.

102, constructing an employment unit word stock;

the process of specifically constructing the recruitment unit word stock comprises the following steps: and traversing newly acquired comment data by using the worker building module M1 at regular time.

(1) Loading new comments into text collections XText ⁱ (j) Wherein i represents the number of new comments, and j is the jth comment.

(2) Judging whether the new comment data contains recruitment unit information by using an indexOf () function, and judging whether the new comment data contains recruitment unit information when XText ⁱ (j) indexOf ("company") = -1, or XText ⁱ (j) Indiexof ("unit") = -1,xtext ⁱ (j) indexOf ("factory") = = -1,xtext ⁱ (j) indexOf ("factory") = = -1, and the comment data is considered to contain the recruitment unit information.

(3) For data XText containing employing unit information ⁱ (j) Introducing a jieba word segmentation function, segmenting the comment data, and defining a word segmentation chain Dword ⁿ (w) wherein: n =1 represents a noun, n =2 represents a verb, n =3 represents an adjective, n =4 represents a quantifier, n =5 represents a pronoun, n =6 represents an adverb, n =7 represents a preposition, n =8 represents a conjunct, n =9 represents a co-word, n =10 represents an exclamation, n =11 represents an analogistic. W denotes the order of words. Dword ⁿ The values of (w) represent specific vocabulary.

(4) Word-segmented text Dword ⁿ (w) processing, if n =1, the participle is a noun, referring to the standard noun dictionary Mdic, and using a wave _ dic () function to judge whether the participle is in the common noun dictionary, if not, feeding back 0 by the function. If in the common noun dictionary, jump to the next vocabulary, the logic of the have _ dic () function is constructed as follows:

(5) If the return value of the have _ dic () function is 0, the function have _ brand () is used to check whether the noun already exists in the work unit library Bdic, and if so, the execution is skipped. The have _ brand () function is logically constructed as follows:

(6) Using AddDIC (Dword) if the noun position number is less than the work unit appearance position number p ⁿ (w)) function, adding the vocabulary to the thesaurus of employment units. The AddDIC () function is logically constructed as follows:

the specific logic of the module M1 constructed by the employment unit word library is as follows:

103, constructing a post work type word library;

the post work type obtains the post work type of the worker comment through the obtaining algorithm in the prior art, and a post work type word library is constructed. The post job is a specific job, such as a Java development engineer, python algorithm engineer, etc.

104, constructing a feature vector matrix;

the characteristic vector matrix construction module traverses newly generated recruitment units in the recruitment unit word bank, and constructs a characteristic vector matrix corresponding to the recruitment unit word bank for each new recruitment unit

Wherein Pp represents the work cell libraryAnd a position index, cp represents a position index of the production position work type, and e is a co-occurrence number.

105, analyzing co-occurrence frequency;

and the co-occurrence analysis module analyzes the word co-occurrence condition with a certain recruitment unit and a potential competition recruitment unit as keywords, and analyzes the key characteristics of the post type of the certain recruitment unit by using the data.

(1) All comments are loaded into the text collection Atext.

(2) Introducing a jieba word segmentation function, segmenting the comment Atext data, and defining a word stock chain Aword ⁿ (w) wherein: n =1 represents a noun, n =2 represents a verb, n =3 represents an adjective, n =4 represents a quantifier, n =5 represents a pronoun, n =6 represents an adverb, n =7 represents a preposition, n =8 represents a conjunct, n =9 represents a co-word, n =10 represents an exclamation, n =11 represents an analogistic. W represents the order of words. Aword ⁿ The values of (w) represent specific vocabulary.

(3) For thesaurus chain Aword ⁿ (w) performing word frequency analysis on all vocabularies in the word library, selecting vocabularies with the occurrence frequency of more than 50, and constructing a word frequency matrix Aword of a word library chain ⁿ (w, c), wherein n represents part of speech, w represents lexical position, and c represents word frequency.

(4) Word frequency matrix Aword according to word stock chain ⁿ C in (w, c), constructing a complete binary Huffman tree, generating a corresponding binary code k according to the corresponding position of each word, and constructing a Huffman vector matrix Hword ⁿ (w, c, k), where k is used to hold the binary code k.

(5) For feature vectors

Cp post work of pp-in-middle unit, and vector matrix Hword comparison ⁿ (w, c, K), acquiring binary code K1 value corresponding to cp post work type of pp employment unit, and judging vector matrix Hword ⁿ (w, c, k) whether each vector belongs to the work and seed library vocabulary of the employment unit, if the vector belongs to the cp work and seed library of a certain employment unit, extracting the corresponding Ki, calculating the cosine distance by using a cosine similarity formula,the formula is as follows:

wherein j represents each component of the binary coding K value, the co-occurrence words of cp position work types with the former 10 employment units with the closest cosine distance as pp employment units are selected and added into the co-occurrence word matrix

Is saved to

In (1).

And outputting the co-occurrence frequency of each recruitment unit according to the selection of the user on the visual interface and the word frequency data, and presenting the co-occurrence frequency in a chart form. For example, when the user selects the co-occurrence frequency between the B work category of company a and the post of another company, co-occurrence frequency data with "the B work category of company a" as a comparison target is output and displayed.

Then, the invention provides a system for mining the comments of the employment platform, and the structural block diagram of the system is shown in the attached figure 2:

the data acquisition module 201 is used for acquiring comment data of workers on a platform, storing the comment data in a comment data table, and identifying the comment data as new data;

A unit word stock construction module 202, configured to construct an employment unit word stock;

the process of constructing the recruitment unit word bank specifically comprises the following steps: and traversing the newly acquired comment data by the employment unit building module M1 at regular time.

(1) Loading New comments into text set XText ⁱ (j) Wherein i represents the number of new comments, and j is the jth comment.

(2) Judging whether the new comment data contains recruitment unit information by using an indexOf () function, and judging whether the new comment data contains recruitment unit information when XText ⁱ (j) indexOf ("company") = = -1, or XText ⁱ (j) indexOf ("unit") = = -1,xtext ⁱ (j) Indiexof ("plant") = -1,xtext ⁱ (j) indexOf ("factory") = = -1, and the comment data is considered to contain the recruitment unit information.

(3) For data XText containing employing unit information ⁱ (j) Introducing a jieba word segmentation function, segmenting the comment data, and defining a word segmentation chain Dword ⁿ (w) wherein: n =1 represents a noun, n =2 represents a verb, n =3 represents an adjective, n =4 represents a quantifier, n =5 represents a pronoun, n =6 represents an adverb, n =7 represents a preposition, n =8 represents a conjunct, n =9 represents a co-word, n =10 represents an exclamation, n =11 represents an analogistic. W denotes the order of words. Dword ⁿ The values of (w) represent specific words.

(6) Using AddDIC (Dword) if the noun position number is less than the work unit appearance position number p ⁿ (w)) function, adding the vocabulary to the thesaurus of employment units. AddDIC () function is logically constructed as follows:

the concrete logic of the module M1 constructed by the employment unit word library is as follows:

the post work type obtains the post work type of the worker comment through the obtaining algorithm in the prior art, and a post work type word library is constructed. The job type is a specific job type, such as a Java development engineer, a Python algorithm engineer, and the like.

A vector matrix module 204, configured to construct a feature vector matrix;

Where Pp represents the station bin position index, cp represents the station production station seed position index, and e is the co-occurrence number.

A co-occurrence analysis module 205 for co-occurrence frequency analysis;

(1) All comments are loaded into the text collection Atext.

(2) Introducing a jieba word segmentation function, segmenting the comment Atext data, and defining a word stock chain Aword ⁿ (w) wherein: n =1 represents a noun, n =2 represents a verb, n =3 represents an adjective, n =4 represents a quantifier, n =5 represents a pronoun, n =6 represents an adverb, n =7 represents a preposition, n =8 represents a conjunct, n =9 represents a co-word, n =10 represents an exclamation, n =11 represents an analogistic. W denotes the order of words. Aword ⁿ The values of (w) represent specific words.

(3) For thesaurus chain Aword ⁿ (w) performing word frequency analysis on all words in the vocabulary, selecting words with the occurrence frequency of more than 50, and constructing word frequency matrix Aword of a word stock chain ⁿ (w, c), wherein n represents part of speech, w represents lexical position, and c represents word frequency.

(5) For feature vectors

Cp post work of pp-in-middle unit, and vector matrix Hword comparison ⁿ (w, c, K), acquiring binary code K1 value corresponding to cp post work type of pp employment unit, and judging vector matrix Hword ⁿ Whether each vector in (w, c, k) belongs to a work and seed library vocabulary of an employment unit, if the vector belongs to a cp work and seed library of a certain employment unit, extracting corresponding Ki, and calculating cosine distance by using a cosine similarity formula, wherein the formula is as follows:

Is saved to

In (1).

And outputting the co-occurrence frequency of each recruitment unit according to the selection of the user on the visual interface and the word frequency data, and presenting the co-occurrence frequency in a chart form. For example, when the user selects the co-occurrence frequency between the B work category of company a and the post of another company, co-occurrence frequency data to be compared with "the B work category of company a" is output and displayed.

In addition, the invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to execute the method such as mining with platform comments.

The principles and embodiments of the present invention have been described herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A method for mining employment platform comments is characterized in that,

102, constructing an employment unit word stock;

the construction of the recruitment unit word library specifically comprises the following steps:

(2) Judging whether the new comment data contains recruitment unit information by using an indexOf () function, and judging whether the new comment data contains recruitment unit information when XText ⁱ (j) indexOf ("company") = -1, or XText ⁱ (j) indexOf ("unit") = = -1,xtext ⁱ (j) indexOf ("factory") = = -1,xtext ⁱ (j) indexOf ("factory") = = -1, consider that the review data contains recruitment unit information;

(3) For data XText containing recruitment unit information ⁱ (j) Introducing a jieba word segmentation function, segmenting the comment data, and defining a word segmentation chain Dword ⁿ (w) wherein: n =1 denotes noun, n =2 denotes verb, n =3 denotes adjective, n =4 denotes quantifier, n =5 denotes pronoun, n =6 denotes adverb, n =7 denotes preposition, n =8 denotes conjunct, n =9 denotes co-word, n =10 denotes interjection, n =11 denotes pseudonym, W denotes the order of words, dword denotes the order of words ⁿ (w) the values represent specific vocabulary;

(6) If the noun does not exist in the work unit word bank, if the noun position sequence number is less than the work unit appearance position sequence number p, addDIC (Dword) is used ⁿ (w)) a function that adds the vocabulary to the recruitment unit thesaurus;

103, acquiring the post work type of the worker comment, and constructing a post work type word bank;

104, constructing a feature vector matrix; the constructing the feature vector matrix comprises: traversing newly generated recruitment units in the recruitment unit word bank, and constructing a feature vector matrix corresponding to the recruitment unit word bank for each new recruitment unit

Wherein Pp represents the position index of the work unit library, cp represents the position index of the post work seeds, and e is the co-occurrence number;

step 105, co-occurrence frequency analysis;

the co-occurrence frequency analysis comprises:

(1) Loading all comments into a text set Atext;

(2) Introducing a jieba word segmentation function, segmenting the comment Atext data, and defining a word stock chain Aword ⁿ (w) wherein: n =1 represents a noun, n =2 represents a verb, n =3 represents an adjective, n =4 represents a quantitative word, n =5 represents a pronoun, n =6 represents an adverb, n =7 represents a preposition, n =8 represents a conjunctive, n =9 represents a co-word, n =10 represents an exclamation, n =11 represents an ideogram; w represents the order of the words; aword ⁿ (w) values represent specific vocabulary;

(3) For thesaurus chain Aword ⁿ (w) performing word frequency analysis on all the words in the vocabulary, and selecting wordsAssembling words with frequency exceeding threshold value, and constructing word frequency matrix Aword of word bank chain ⁿ (w, c), wherein n represents part of speech, w represents lexical position, and c represents word frequency;

(4) Word frequency matrix Aword according to word stock chain ⁿ C in (w, c), constructing a complete binary Huffman tree, generating a corresponding binary code k according to the corresponding position of each word, and constructing a Huffman vector matrix Hword ⁿ (w, c, k), wherein k is used to hold the binary code k;

(5) For feature vectors

Cp position work seeds of pp-middle employment unit, and vector matrix Hword comparison ⁿ (w, c, K), acquiring binary code K1 value corresponding to cp post work type of pp employment unit, and judging vector matrix Hword ⁿ Whether each vector in (w, c, k) belongs to a work and seed library vocabulary of an employment unit, if the vector belongs to a cp work and seed library of a certain employment unit, extracting corresponding Ki, and calculating cosine distance by using a cosine similarity formula, wherein the formula is as follows:

Wherein n represents part of speech, w represents position, c represents word frequency, and k represents binary coding value; will be provided with

Is saved to

The preparation method comprises the following steps of (1) performing;

(6) Updating the recruitment unit word library and the post work type word library;

2. A system for mining employment platform reviews, the system comprising:

the building of the recruitment unit word stock specifically comprises the following steps:

(2) Judging whether the new comment data contains recruitment unit information by using an indexOf () function, and judging whether the new comment data contains recruitment unit information when XText ⁱ (j) indexOf ("company") = = -1, or XText ⁱ (j) Indiexof ("unit") = -1,xtext ⁱ (j) Indiexof ("plant") = -1,xtext ⁱ (j) indexOf ("factory") = = -1, consider that the review data contains recruitment unit information;

(3) For data XText containing recruitment unit information ⁱ (j) Introducing a jieba word segmentation function, segmenting the comment data, and defining a word segmentation chain Dword ⁿ (w) wherein: n =1 denotes a noun, n =2 denotes a verb, n =3 denotes an adjective, n =4 denotes a quantifier, n =5 denotes a pronoun, n =6 denotes an adverb, n =7 denotes a preposition, n =8 denotes a conjunct, n =9 denotes an auxiliary word, n =10 denotes an exclamation, n =11 denotes an analogistic, W denotes the order of the words, dword denotes the order of the words, and ⁿ (w) the values represent specific vocabulary;

the post work type construction module 203 is used for acquiring post work types of workers for commenting and constructing a post work type word bank;

a vector matrix module 204, configured to construct a feature vector matrix;

the constructing the feature vector matrix comprises: traversing newly generated recruitment units in the recruitment unit word bank, and constructing a feature vector matrix corresponding to the recruitment unit word bank for each new recruitment unit

a co-occurrence analysis module 205 for co-occurrence frequency analysis;

the co-occurrence frequency analysis comprises:

(1) Loading all comments into a text set Atext;

(5) For feature vectors

Is saved to

Performing the following steps;

(6) Updating an employment unit word bank and a post work type word bank;

and the display module 206 is configured to display the co-occurrence frequency according to the word frequency data output.

3. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of claim 1.