CN113886588A

CN113886588A - Major professional employment direction identification method based on recruitment text mining

Info

Publication number: CN113886588A
Application number: CN202111220573.1A
Authority: CN
Inventors: 张建桃; 曾莉; 刘洁荧; 韦婷婷; 黄文玲; 宋世领
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-01-04

Abstract

The invention discloses a major professional employment direction recognition method based on recruitment text mining, which takes recruitment information of a recruitment website as a data source and analyzes a major recruitment position name through 4 steps of main data acquisition, data preprocessing, word vectorization and K-means clustering to obtain a major professional employment direction. The method for cultivating and researching the professional directional talents based on text mining can quickly, efficiently and accurately identify the employment direction requirements of employment markets on the professional talents from the network recruitment text data, can optimize and improve professional talent cultivation schemes for colleges and universities, and provides decision support for cultivating the professional directional talents meeting the market requirements.

Description

Major professional employment direction identification method based on recruitment text mining

Technical Field

The invention relates to the field of major employment direction recognition, in particular to a major employment direction recognition method based on recruitment text mining.

Background

Along with diversification of professional education in colleges and universities, employment directions of professional talents are wider, and requirements of enterprises on talents in different directions of the profession are different. Therefore, under the background that the contradiction between supply and demand of talents is increasingly prominent, the accurate insight of the employment direction demand of the market for the professional talents is the key for the colleges and universities to cultivate the professional talents meeting the market demand, promote the employment of the professional talents and solve the contradiction between supply and demand of talents.

According to a '2020 Chinese network recruitment industry market development research report' issued by the ai rui network, the number of enterprise employers in 2019 network recruitment reaches 486.6 thousands, and the network recruitment becomes a main enterprise recruitment mode, and the extraction of enterprise recruitment requirements from network recruitment information is an effective path for acquiring employment market requirements. Text mining is a technique that can extract meaningful information from unstructured text data.

Disclosure of Invention

The invention aims to provide a major professional employment direction identification method based on recruitment text mining, which acquires major recruitment post name data from a recruitment website through a text mining technology, analyzes major professional employment directions, optimizes and improves a major talent culture scheme for colleges and universities, and provides decision support for culturing a major direction talent meeting market requirements.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a major professional employment direction recognition method based on recruitment text mining comprises the following steps:

step 1: data acquisition, namely crawling a professional recruitment post name from a selected recruitment website by using a Web crawler technology by taking the professional name as a keyword;

step 2: preprocessing data, namely preprocessing the collected recruitment post name data;

and step 3: performing Word vectorization, namely performing Word vectorization on the recruitment position name by adopting a Word2vec algorithm to obtain vector representation of each recruitment position name;

and 4, step 4: and (4) carrying out K-means clustering, and carrying out clustering analysis on the recruitment post names by using a K-means clustering algorithm to obtain the major employment direction of the specialty.

Preferably, the data acquisition comprises the following sub-steps:

step 1.1: formulating a crawler rule, and determining a webpage URL, a page number range and a post screening condition for acquiring the recruitment post name data;

step 1.2: and the Web crawler acquires the professional network recruitment post name by adopting a Web crawler technology according to the formulated crawler rule.

Preferably, the data preprocessing comprises the following sub-steps:

step 2.1: data cleaning, namely cleaning the collected recruitment post name data to remove data noise in the data, wherein the data noise comprises a null value, a repetition value, an abnormal value, an HTML (hypertext markup language) label and the like;

step 2.2: constructing a custom dictionary, selecting a special post name combination word from the data after word segmentation and stop word removal processing, putting the combination word into the custom word segmentation dictionary, and selecting a word without research meaning to put into a custom stop word library;

step 2.3: and segmenting words and stop words, segmenting data by using a Jieba segmentation program package in Python and a constructed custom segmentation dictionary, and selecting a Hardset stop word list and combining the constructed custom segmentation dictionary to perform stop word processing.

Preferably, said word vectorization comprises the sub-steps of:

step 3.1: and initializing word vectors, and initializing dictionary vector representation by utilizing a uniformly distributed random fixed-length sequence initialization dictionary.

Step 3.2: training word vectors, modeling a problem into the context of a given target word through a conditional probability model, and predicting a language model of the target word; the vector expression of the target word is obtained by utilizing the gradient descent and the back propagation to maximize the log-likelihood target function,

in the formula, P (omega)_t|ω_t-c：ω_t+c) Is the conditional probability, T is the length of the sentence; omega_tFor predicted target words, c is context size; omega_t-c：ω_t+cThe target word does not contain the first c to the last c words of the target word;

the conditional probability P is obtained by softmax,

in the formula, N is the size of a word list;

is composed of

Transposing;

is a vector representation of the target word; exp () is an exponential function with a natural constant as the base; v. of_nIs the vector representation of the nth time in the word list; v. of_jIs a vector representation of the jth word in the context of the target word.

Preferably, the K-means clustering comprises the following sub-steps:

step 4.1: performing K-means clustering on the recruitment position names, performing clustering analysis on the recruitment position names by using a K-means clustering algorithm, wherein the K-means algorithm takes the minimum value of the square error sum SSE of the sample and the mass center as an objective function, and the calculation formula is as follows:

in the formula: k is the number of clusters, E_iIs the ith cluster; e.g. of the type_iIs E_iThe center of mass of; x is E_iA knowledge point sample of (1); n is a radical of_iIs E_iThe number of samples in (1);

selecting the minimum K meeting the following constraint formula as the optimal clustering number, namely the value of the K,

Gap_k≥Gap_k+1-s_k+1 (6)

b is simulation times of Monte Carlo simulation calculation; SSE_kObtaining SSE obtained by calculation when the K value is K for the current sample; SSE_kbTaking the K value as K, and carrying out the quadratic error sum of the mass centers when carrying out the b-th Monte Carlo simulation calculation;

step 4.2: and (5) summarizing the main employment direction, and summarizing each post after the K-means clustering to obtain the professional main employment direction.

The invention has the following effective benefits: compared with traditional investigation methods such as questionnaire investigation, enterprise visit, expert consultation and the like, the employment direction requirements of employment markets for professional talents can be quickly, efficiently and accurately identified from the network recruitment text data by adopting the text mining technology. The method adopts a text mining technology to deeply mine the professional network recruitment post names, obtains the professional main employment direction through 4 steps of main data acquisition, data preprocessing, word vectorization and K-means clustering, and provides decision support for optimization and improvement of talent culture schemes in colleges and universities.

Drawings

Fig. 1 is a major employment direction recognition method for major specialization based on recruitment text mining according to the present invention.

FIG. 2 is a flow of custom thesaurus construction.

FIG. 3 is a graph of Gap value as a function of k value.

Detailed Description

In order to make the technical features, objects and effects of the present invention more clearly understood, the present invention will be further described in detail with reference to the accompanying drawings and examples. The embodiments described herein are only for explaining the technical solution of the present invention and are not limited to the present invention.

The invention provides a major professional employment direction identification method based on recruitment text mining, which has the process shown in figure 1 and takes industrial engineering major as an example, and the implementation comprises the following steps:

step 1: data acquisition, namely selecting the current popular recruitment website with no worry (https:// www.51job.com) by adopting a web crawler technology, and crawling the recruitment post name data of industrial engineering major nationwide by taking industrial engineering as a search keyword;

step 2: data preprocessing, namely performing data cleaning operations such as duplicate removal, null removal and noise removal on the collected post name data, constructing a customized word segmentation dictionary and a disabled word library according to a customized word library construction process shown in fig. 2, and performing word segmentation and word deactivation processing on recruitment data by combining a Jieba word segmentation program package and a Haugh disabled word list in Python, wherein 8169 effective industrial engineering professional recruitment post name data are obtained before and after post name preprocessing as shown in table 1.

TABLE 1 post name data comparison before and after preprocessing

And step 3: and (4) performing Word vectorization, namely performing Word vectorization on the recruitment position name data through a Word2vec algorithm to obtain vector expression of each position name.

And 4, step 4: and (3) performing K-means clustering, namely clustering the names of the recruitment posts by adopting a K-means algorithm, wherein the change curve of gap (K) is shown in figure 3, so that the optimal clustering number K is 4. The clustering results of the position names are shown in table 2, and analyzing the position names in each category can find out that: an industrial engineer, a process engineer, an IE engineer and the like in the category 1 can belong to an engineering post, a production plan, lean production, production management and the like in the category 2 can belong to a production post, a logistics specialist, a supply chain director, a supply chain specialist and the like in the category 3 can belong to a logistics supply chain post, and an ERP implementation consultant, an SAP implementation consultant and an MES implementation consultant in the category 4 can belong to a consultation consultant post. Through the cluster analysis of the post names, 4 main employment directions of industrial engineering major, such as engineering management, production management, logistics supply chain and consultants can be obtained.

TABLE 2 post name clustering results

Through example research, the major professional employment direction identification method based on recruitment text mining, provided by the invention, can quickly, efficiently and accurately identify the employment direction requirements of employment markets on professional talents from the network recruitment text data, can be applied and popularized in various professional fields, and provides decision support for colleges and universities to optimize and improve professional talent culture schemes and culture professional talents meeting the market requirements.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

The above-described embodiments of the present invention do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention shall be included in the protection scope of the claims of the present invention.

Claims

1. A major professional employment direction recognition method based on recruitment text mining is characterized by comprising the following steps:

2. The method for identifying major employment directions for professionals based on recruitment text mining as claimed in claim 1, wherein the data collection of the step 1 comprises the following substeps:

3. The method for identifying major employment directions for professionals based on recruitment text mining as claimed in claim 1, wherein the data preprocessing of the step 2 comprises the following sub-steps:

step 2.1: data cleaning, namely cleaning the collected recruitment post name data to remove data noise in the data, wherein the data noise comprises a null value, a repetition value, an abnormal value and an HTML (hypertext markup language) label;

4. The professional major employment direction recognition method based on recruitment text mining as claimed in claim 1, wherein the word vectorization of step 3 comprises the following sub-steps:

the conditional probability P is obtained by softmax,

in the formula, N is the size of a word list;

is composed of

Transposing;

5. The method for identifying major employment directions for expertise based on recruitment text mining as claimed in any one of claims 1 to 4, wherein the K-means clustering of step 4 comprises the following sub-steps:

Gap_k≥Gap_k+1-s_k+1 (6)