CN112699235A - Method, equipment and system for analyzing and evaluating resume sample data - Google Patents

Method, equipment and system for analyzing and evaluating resume sample data Download PDF

Info

Publication number
CN112699235A
CN112699235A CN202011521270.9A CN202011521270A CN112699235A CN 112699235 A CN112699235 A CN 112699235A CN 202011521270 A CN202011521270 A CN 202011521270A CN 112699235 A CN112699235 A CN 112699235A
Authority
CN
China
Prior art keywords
data
sample data
resume
resume sample
evaluating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011521270.9A
Other languages
Chinese (zh)
Inventor
付宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shengdoushi Shanghai Science and Technology Development Co Ltd
Original Assignee
Shengdoushi Shanghai Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengdoushi Shanghai Technology Development Co Ltd filed Critical Shengdoushi Shanghai Technology Development Co Ltd
Priority to CN202011521270.9A priority Critical patent/CN112699235A/en
Publication of CN112699235A publication Critical patent/CN112699235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method for parsing and evaluating resume sample data is presented, the resume sample data comprising at least unstructured data, the method comprising: acquiring resume sample data; analyzing resume sample data into at least one data characteristic item; and evaluating the level of the resume sample data based on the at least one data feature item using a decision tree model. An apparatus and system for parsing and evaluating resume sample data are also presented. According to the scheme, a set of analysis and evaluation screening model of automatic resume sample data with self-learning iteration capability is established, the content in the resume data is automatically analyzed, the output result can be flexibly adjusted according to the actual requirements of the business, and a visual and more optimal and explanatory insight report is provided for the business personnel recruiting interviews.

Description

Method, equipment and system for analyzing and evaluating resume sample data
Technical Field
The present application relates to data analysis, and more particularly, to methods, devices, and systems for parsing and evaluating resume sample data.
Background
The interview suggestions are provided for business personnel recruiting interviews by analyzing the resume information of the candidates, so that the efficiency of the enterprises for recruiting qualified employees can be effectively improved. Many internet recruitment platforms provide resume data from candidates to various enterprises through SaaS (Software-as-a-Service) platforms for resume data transmission.
Most of resume evaluation models adopted by existing resume data evaluation schemes are suitable for structured data contents, and mining and analysis of unstructured data are lacked. Classification and evaluation models such as logistic regression models, feedforward neural network models and bayesian classifiers are often used for model selection, and they do not fully satisfy the requirements in terms of accuracy and interpretability of evaluation results. In addition, the output result of the common evaluation scheme mostly adopts a fixed threshold value to judge whether the resume meets the requirement of recruitment, and the evaluation scheme and the service scene lack interaction, so that the insight cannot be provided for the service.
The above drawbacks lead to a number of deficiencies in existing resume data evaluation schemes in terms of recruitment costs for employees, targeted communications, and the feedback of business personnel responsible for recruiting interviews, and thus there is a need for improvements in existing resume data evaluation schemes.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the scope of the application.
Disclosure of Invention
In order to solve at least one of the above-mentioned drawbacks existing in the prior art, embodiments of the present application propose a method, an apparatus, and a system for parsing and evaluating resume sample data.
According to an aspect of the application, a method for parsing and evaluating resume sample data comprising at least unstructured data is presented, the method comprising: acquiring resume sample data; analyzing resume sample data into at least one data characteristic item, wherein the data characteristic item is a structural variable; and evaluating the level of the resume sample data based on the at least one data feature item using a decision tree model.
According to another aspect of the application, a device for parsing and evaluating resume sample data is proposed, comprising a processor and a memory for storing executable instructions of the processor, wherein the processor is configured to execute the executable instructions to implement the method as described above.
According to yet another aspect of the present application, a system for parsing and evaluating resume sample data is presented, comprising an apparatus and a database as described above.
According to yet another aspect of the application, a computer-readable storage medium is proposed, on which a computer program is stored, the computer program comprising executable instructions which, when executed by a processor, carry out a method according to the above.
According to the analysis and evaluation scheme of the resume sample data, a set of analysis and evaluation screening model of the automatic resume sample data with self-learning iteration capability is established, and the content in the resume data including structured data and unstructured data can be automatically analyzed; the corresponding conditional probability can be output according to the matching degree of the candidate and the working position and the possibility of the candidate passing the interview, and the output result can be flexibly adjusted according to the actual requirement of the service; the feature importance found in the model training process can be summarized, and a visual and more optimal insight report is provided for business personnel recruiting interviews.
Drawings
The above and other features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a schematic logical architecture diagram of a system for parsing and evaluating resume sample data according to an embodiment of the present application;
FIG. 2 is a schematic diagram of results and visual output of information for parsing and evaluation of resume sample data according to an embodiment of the application;
FIG. 3 is a schematic flow diagram of a method for parsing and evaluating resume sample data according to an embodiment of the present application;
FIG. 4 is a schematic block diagram of a system for parsing and evaluating resume sample data according to an embodiment of the present application; and
FIG. 5 is a schematic block diagram of an electronic device for parsing and evaluating resume sample data according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. In the drawings, the size of some of the elements may be exaggerated or distorted for clarity. The same reference numerals denote the same or similar structures in the drawings, and thus detailed descriptions thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, methods, or operations are not shown or described in detail to avoid obscuring aspects of the present application.
Those skilled in the art should understand that the solution for parsing and evaluating resume sample data proposed by the embodiments of the present application is not limited to the internet recruitment scenario, and the method, apparatus and system proposed in the present application can be used as long as the resume sample data includes unstructured data or combination information of unstructured data and structured data. In the present application, the industries involved in resume sample data include not only stock manager talent recruitment in the catering industry, but also store managers/manager talent recruitment in other industries (such as retail, delivery, express delivery, etc.), and the involved positions may include other positions outside the management layer.
Resume sample data refers herein to all data information included in a complete candidate resume. The plurality of resume sample data form a resume sample data set which can be used for analysis of the resume sample data and training and statistical analysis of an evaluation model. Resume sample data acquired before the resume sample data currently being parsed and evaluated is generally referred to as historical resume sample data. The resume sample data for the same candidate may exist in different versions at different times.
Resume sample data contains not only structured data such as gender, age, academic calendar, working age, etc., but also unstructured data such as work experience, work duties, post description, self-ratings. Structured data can be represented by a predetermined number of feature values or parameters, but unstructured data generally has no fixed paradigm and typically requires extraction of corresponding data features in a data combination comprising a plurality of information items. Common resume data evaluation models are mainly applicable to structured data and lack the ability to mine and parse information contained in unstructured data. Due to the lack of analysis on unstructured data, on one hand, the accuracy performance of an evaluation model is limited, on the other hand, the effective information quantity capable of being mined from resume data is also limited, and further reference cannot be provided for communication between business personnel recruiting interviews and candidates.
In addition to parsing of resume sample data, in parsing of resumes and selection of evaluation models, existing schemes generally employ logistic regression models or feedforward neural network models, the former often being difficult to expect in terms of accuracy, and the latter being unsatisfactory in terms of model interpretability. For the classification based on resume sample data, the traditional Bayes classifier and the feedforward neural network classifier cannot simultaneously realize the balance of accuracy and interpretability. The accuracy of the model determines the efficiency improvement and the cost saving brought by the model applied to the actual business scene, the interpretability determines the degree to which business personnel can understand and trust the evaluation result of the model, and meanwhile, the model result needs to provide insight for business personnel recruiting interviews to a certain degree.
The existing resume evaluation scheme has limitations on the support and performance of unstructured data, and has narrow applicability in terms of output form and feedback recruitment services. Most resume evaluation models are designed by setting a fixed threshold such that evaluation results above the threshold are considered to meet recruitment requirements, while evaluation results below the threshold are considered to not meet recruitment requirements. Such a fixed threshold-based determination form is inflexible, slow in adjustment, and unable to cope with rapid changes in traffic. Meanwhile, the model only provides evaluation output but cannot interact with a service scene, and the result of data modeling is only used by a prediction model and cannot provide insight for the service.
The above problems result in the following disadvantages of the existing resume evaluation scheme:
1. the recruitment cost is high, a great deal of manpower is invested in the interviewing link, but the successful enrollment ratio is insufficient (for example, less than 2%). This means that a lot of labor costs are wasted on candidates who do not match the job requirements, which also results in an extended interview procedure and a reduced efficiency.
2. Lack of targeted communication, the interviewer adopting the same communication strategy for all candidates due to lack of applicable resume evaluation and screening models, and being unable to give more personalized solicited communication for those candidates with high potential, actually affects the effect of talent recruitment.
3. The evaluation model is difficult to reversely feed businesses to form a closed loop, and because the model is difficult to achieve a sufficiently satisfactory balance in the aspects of accuracy and interpretability, the model is difficult to find a rule which can be interpreted and understood, or the coverage and the accuracy of the rule are not satisfactory, the rule and the insight which accord with the scene of recruiting businesses are difficult to refine.
Based on the improvement requirement on the defects, the embodiment of the application provides a method, equipment and a system for analyzing and evaluating resume sample data, based on a Gradient Boosting iterative Decision Tree (GBDT) model based on the Decision Tree model, a better balance point is obtained in the aspects of accuracy and interpretability of analysis and evaluation of the model, and meanwhile, more targeted communication modes are provided and opened for business personnel seeking interviews, so that the backspacing business and the better insight effect are facilitated to be developed.
The following describes aspects of an embodiment of the present application with a schematic logical structure diagram of a resume sample data parsing and evaluating system shown in fig. 1.
The system 100 for parsing and evaluating resume sample data generally includes a database 110 and portions related to a parsing and evaluating model.
The database 110 includes a big data platform 111, a data acquisition unit 112 and a data cleansing unit 113.
The data acquisition unit 112 acquires delivery resume data from different internet recruitment platforms and channels from the SaaS platform 120 for external resume collection through a common interface. These internet recruitment platforms and channels include, but are not limited to, carefree forethought, intelligent joining recruitment, catch-up networking, 58 city co-occurrence, BOSS direct engagement, etc. The acquired delivery resume data is stored to the big data platform 111 through aggregation. The big data platform 111 may employ, for example, a privatized deployed Hadoop big data platform. To ensure the real-time performance of the resume data, the data obtaining unit 112 may obtain multiple versions of the resume data from the SaaS platform 120 multiple times at a certain period (e.g., minute, quarter, hour, multiple minutes, multiple hours, day, multiple days, etc.). For example, the database may obtain the content of the resume data with the change of the recruitment stage/state information 6 times a day (once every 4 hours), further extract, clean, convert and merge the resume data of the same candidate. The resume data of the candidate after the processing is stored in the database in the form of combination of structured data and unstructured data as the original data of resume sample data. The method and system of the embodiment of the application are dedicated to processing resume sample data including unstructured data, so the original data of the resume sample data at least needs to include unstructured data.
An ETL, Extract, Transform, Load unit 113 is used to preprocess the resume data from the SaaS platform 120 before storing the resume data to the big data platform 111 and outputting the stored resume sample data to the parsing and evaluation model. In the embodiment of the application, the preprocessing comprises not only the extraction, conversion and combination of data, but also the cleaning and adjustment of partial contents in resume sample data. The service person who recruits the interview may request to mask or delete a certain item of data information about the candidate in the resume sample data through the data cleansing unit 113. For example, if the present recruitment interview is performed only on candidates in the subject calendar, resume sample data of candidates corresponding to data items that do not meet the requirement of the subject calendar may be masked or deleted in the data cleansing unit 113, or the content of the portion may not be transmitted when the portion is transmitted to the analysis and evaluation model of the resume sample data. Generally, all the original information of the resume sample data is stored in the database 110, so that data cleaning is not performed when the resume sample data is transmitted to the model, the resume sample data of a candidate corresponding to a data item with specific information can be analyzed and evaluated in the model training and predicting stage, and corresponding shielding or deleting operation is adopted in the model output stage. On the other hand, the data cleansing operation in the database may result in erroneous screening out of resume data of candidates who meet job requirements and can successfully pass interviews and enrollment due to inherent error bias, and thus the data cleansing unit 113 may be set not to perform data cleansing prematurely.
According to embodiments of the application, resume sample data is parsed and evaluated using a decision tree based model.
The decision tree model needs to be trained before it is used for parsing and evaluation of resume sample data. The training set adopts a data set formed by historical resume sample data. Historical resume sample data in a corresponding time span can be selected according to the time length of the interview flow. According to a common interview process, interviews are generally completed within 4 days and results are synchronized, so that historical resume sample data which is synchronized forward by week for 8 weeks, namely, forward 56 days, from the 5 th day of the current resume sample data after being analyzed and evaluated in the time span can be selected as a component of a training data set. For this reason, the applicant conducted a test using historical resume sample data corresponding to a time span as a validation set and its average AUC, where AUC (Area Under the Curve) is used as an accuracy index, and a higher value indicates that the prediction result is about accurate. As shown in table 1, the results of the previous test of 8 forward synchronous rolling are shown. In the table, 7 days are taken as a verification set, the verification set sequentially rolls forward for one day, 86 verification sets with the time span of 3 months are selected, the interview passing condition of the verification data sets is predicted by respectively applying 12 models, and the 86 AUC values are averaged. It can be seen that the average AUC of the training set constituted by the historical resume sample data when rolling forward for 8 weeks is the highest, indicating that the accuracy is the highest.
Training set time span Validation set mean AUC
Rolling for 1 week 68.6%
Rolling for 2 weeks 70.2%
Rolling for 3 weeks 70.5%
Rolling for 4 weeks 70.6%
Rolling for 5 weeks 70.6%
Rolling for 6 weeks 70.6%
Rolling for 7 weeks 70.8%
Rolling for 8 weeks 70.9%
Rolling for 9 weeks 70.6%
Rolling for 10 weeks 70.6%
Rolling for 11 weeks 70.6%
Rolling for 12 weeks 70.5%
TABLE 1 AUC results for different training set time spans
According to one embodiment, a gradient boosting iterative decision tree GBDT model may be specifically employed. The GBDT model is a classification algorithm model implemented based on decision trees. Compared with other classification algorithms, the decision tree algorithm needs less feature engineering, can well process data with field missing, does not need to care whether data features are interdependent or not, and can automatically combine a plurality of data features. The GBDT model integrates a plurality of decision trees by a gradient lifting method, so that the disadvantage that a single decision tree is easy to be over-fitted is avoided while the interpretability of a decision tree algorithm is kept, and the GBDT model can provide relatively better balance on accuracy and interpretability. Those skilled in the art will also appreciate that other decision tree models may be used for the model for parsing and evaluating the resume sample data, as well as other tree models with feature segmentation/classification of the structured data. According to an exemplary embodiment of the application, the following model key parameters are employed:
learning rate: 0.1;
number of trees: 130, 130;
maximum depth of a single tree: 5;
l0 regular term coefficients: 0;
l2 regular term coefficients: 1;
leaf node minimum weight: 1;
leaf node minimum partition gain: 0.1.
the training period of the model can be selected from different scales of each week, half month, quarter, year and the like. According to the requirements of service personnel for recruiting interviews, the change of recruitment scenes and job requirements of enterprises and the change of resume data of candidates of the SaaS platform 120, the training period of the model can be flexibly adjusted, or random training of the model can be performed.
After the training set and corresponding model for model training are determined, the training procedure shown in FIG. 1 begins.
The access and verification 141 of the data is first performed. A set of stored historical resume sample data is received from the database 110. And (3) checking the quality of the historical resume sample data by adopting a data checking script, and performing basic data error correction on the data such as messy codes, null data and the like. Next, the verified data is automatically parsed 142 to perform the obtaining of the at least one data characteristic item. The data feature items obtained are parsed into structured variables, which can be input into the GBDT model for training of model parameters. For the structured data in the historical resume sample data, a virtual structured variable can be generated as a data feature item in a high-dimensional discretization mode. For unstructured data, the key information in the unstructured data can be extracted in a regularization mode, and corresponding structured variables are generated to serve as data feature items. In particular, the information extraction of unstructured data can be done using a regular polynomial. The category of the data feature item and the corresponding parameter value range can be preset, and the analysis from structured data and unstructured data to the data feature item is completed based on the setting. The preset categories and parameters for the data feature items are fine enough to cover all valuable information in the resume sample data that is concerned by the business personnel recruiting the interview and/or can represent the candidate features.
Data feature items parsed from resume sample data of a candidate may serve as input arguments of the GBDT model, including, for example, more than 400 categories including gender, height, date of birth, mailbox, highest school calendar, expected work, description of work content, etc. Data characteristic item settings of candidate resume sample data for the catering industry according to embodiments of the application are provided in the appendix.
The GBDT model is trained 143 using the above inputs, and the output of the model is obtained and compared to the correct evaluation results. The result of the evaluation of the GBDT model may be a conditional probability that the historical resume sample data can pass the interview. The evaluation result output by the model may also be referred to as a level of the historical resume sample data, which indicates a degree to which the resume sample data of the candidate can pass the interview.
Different from the traditional decision tree model, according to the training process of the embodiment of the application, a machine learning model which predicts the probability of whether the candidate can pass the interview according to resume sample data of the candidate can be obtained, and the machine learning model has better interpretability. The GBDT model has the advantage that the form of the combination of data feature items and their importance can be output, for example visually. For example, in the tree structure in the form of a pie chart in fig. 2, a process of segmenting each historical resume sample data into the next branch (or leaf node) by a certain node according to the segmentation condition characteristics of the node is shown from the root node. A total of 63 resume sample data are shown, where the number of positive samples predicted to be able to pass the interview (i.e., predicted to be able to pass the interview) is 56 and the number of negative samples predicted to be unable to pass the interview is 7. Some resume sample data can not be finally determined to belong to a positive sample or a negative sample through a set number of 5 layers (the depth of a travel path reaches the maximum depth 5 set by the model), and some resume sample trees can not be determined to belong to the sample type through 5 layers. For example, if a resume sample data is determined to be able to be segmented into leaf nodes at a node on the second layer with a probability of 100% of interview, the resume sample data belongs to a positive sample, and the prediction result of the positive sample is determined by 100% of the data feature item corresponding to the node, and these pieces of information are recorded. We record the splitting operation of each historical resume sample data at each node in the travel path on each tree in the GBDT model, the splitting operation including the data feature item parsed out above corresponding to the splitting condition feature, and the probability that the resume sample data associated with the data feature item is predicted to be able to pass the interview. These probabilities associated with the data feature items may be considered as probability factors corresponding to the data feature items. Based on the data feature items and the associated probability factors recorded at all the nodes on the travel path, the probability that the resume sample data of the candidate finally output by the model can pass the interview can be determined, that is, the final level of the resume sample data output by the model is jointly determined by the data feature item corresponding to the segmentation condition feature at each node and the level (probability factor) of the resume sample data associated with the data feature item.
This way of GBDT model training and prediction according to embodiments of the present application makes it possible to obtain all the information of the resume sample data traveling in the tree structure, enabling the user to visually output the combined form of the data feature items and see their respective importance (contribution to the prediction result). The report tool can be used for periodically providing historical recruited insight reports for business personnel recruiting interviews, finding key variables influencing whether the interviews pass or not and changing the bias of the business personnel on specific data characteristic items possibly existing in the interviews. This advantage also exists in the real-time prediction process of the GBDT model.
The trained GBDT model is synchronized to the GBDT model used in the prediction process. For the resume sample data that needs to be parsed and evaluated currently, the prediction process may periodically (or at other periods) each day or based on recruitment, start to acquire the resume sample data of the candidate from the database 110 by event triggering, perform data access and verification 131 and resume content automatic parsing 132 similar to the data access and verification 141 and resume content automatic parsing 142 of the training process.
The resume sample data is predicted 133 using the trained GBDT model to obtain an evaluation level, similar to the model training process, the output of the model prediction process is also the conditional probability that the resume sample data can pass the interview. In the recruitment scenario, a threshold (for example, 0.5) of the conditional probability of prediction may be set, and the resume sample data above the threshold is determined to be able to pass the interview, and the resume sample tree below the threshold is determined to be unable to pass the interview. According to the embodiment of the application, the threshold value is adjustable, so that a service person can flexibly adjust the standard for judging whether resume sample data can pass the interview according to actual requirements. The criterion can be applied to the binarization evaluation result, and also to other evaluation level results. For example, the resume sample data of the candidate whose conditional probability is the top 30% of the ranking may be taken proportionally, or the resume sample data whose conditional probability is greater than 0.3 may be taken. The flexibility enables the output result of the model to better adapt to seasonal changes of talent demands (such as the number of new stores opened in a specific month is greatly increased or reduced), and can also assist business personnel recruiting interviews to make a more flexible manpower arrangement strategy (such as interviews arranging deeply interviewers through the resume with the probability of 30% in the first place, and introducing the advantages of companies in the aspect of professional development while interviewing candidates). The corresponding interview strategy is set for resume sample data of the candidate based on the conditional probability or the grade segmentation, so that the labor cost can be effectively reduced, and the targeted communication effect can be improved.
As described above, the combined form of the data feature items can be vividly output and the corresponding importance (contribution to the prediction result) can be seen in the model prediction and model training processes, and the rules and the insight of the recruitment business scenes can be summarized and explained in a report form, so that the business personnel recruiting interviews are analyzed in a positive reason and corrected in a negative bias.
Since the resume sample data of the candidate originates from the SaaS platform 120 of the external resume collection, the results of the model prediction may be returned 134 through the port of the SaaS platform to be provided to the business personnel.
FIG. 3 shows a schematic flow of a method for parsing and evaluating resume sample data according to an embodiment of the application. Some details of the process are already described in the system logic architecture above, and are not described herein again.
The method mainly comprises the following steps:
s310: acquiring resume sample data, wherein the resume sample data comprises unstructured data;
s320: analyzing resume sample data into at least one data characteristic item, wherein the data characteristic item is a structured variable; and
s330: the level of the resume sample data is evaluated based on the at least one data feature item using a decision tree model.
The step S310 of obtaining resume sample data further includes substeps S311 and S312. Substep S311 includes obtaining resume sample data from an external resume collection platform SaaS from a database. An optional substep S312 is used to perform data cleansing on the resume sample data stored in the database.
The step of parsing the data feature item S320 further comprises substeps S321 and S322. In sub-step S321, extracting information of the unstructured data into at least one data feature item through regularization; in sub-step S322, the structured data included in the resume sample data is extracted as the above-described data feature item by discretization.
According to an embodiment of the present application, before step S320, step S340 may be included, in which resume sample data from the server is checked.
Step S330 further includes substeps S331 and S332. The substep S331 is configured to, in the prediction process of the decision tree model, record a data feature item corresponding to the segmentation condition feature and a level of the resume sample data associated with the data feature item when the resume sample data performs a segmentation operation at each node on a travel path of each tree in the decision tree model. Sub-step S332 is then used to determine the final level of the resume sample data based on all the data feature items and the levels of the resume sample data associated with the data feature items. According to an embodiment of the present application, the level of the evaluated result, i.e., the resume sample data, may be a probability that the resume sample data can pass the interview.
According to an embodiment of the present application, the method further includes a step S350 of providing the evaluation level to a service person recruiting the interview after obtaining the output result of the model prediction. The output of the prediction result can also be provided to service personnel through the SaaS platform. To provide the business person with more explanatory and suggested information, an evaluation report may also be generated based on the evaluated level of the resume sample data, at least some or all of the data feature items, and the level of the resume sample data associated with the data feature items, which may include information related to key feature data items and/or information related to prior bias on feature data items for the evaluated level of the resume sample data.
The method may further comprise a step S360 of training the decision tree model, performed periodically, before the step S330 of evaluating using the decision tree model. The training of the model can adopt a training data set formed by historical resume sample data in a period of time before the resume sample data is analyzed and evaluated, and further, the historical resume sample data in the period of time from 5 days to 12 weeks before the resume sample data is analyzed and evaluated can be selected for training.
Embodiments of the present application also provide a system 400 for parsing and evaluating resume sample data as shown in FIG. 4, which includes a device 410 and a database 420. The device 410 includes a processor 411 and a memory 412. The memory 412 stores executable instructions of the processor 411 such that the processor 411 performs the method steps as in fig. 3 when executing the executable instructions. The database 420 includes a data acquisition unit 421 for acquiring resume sample data from an external resume collection SaaS platform, a big data platform 422 for storing the acquired resume sample data, and an optional data cleansing unit 423 for extracting, converting and/or cleansing the stored resume sample data. The structure of database 420 is similar to database 110 in the system logic architecture shown in FIG. 1.
According to the scheme of the embodiment of the application, a set of analysis and evaluation screening model of automatic resume sample data with self-learning iteration capability is established, so that recruitment requirements of managers such as catering service industry can be automatically learned and updated, evaluation standards of resume contents are automatically adjusted by combining the requirement change of candidates, and the requirements of market change and brands are timely adapted. In addition, due to the superior performance of the GBDT model in interpretability, the content in the resume data including structured and unstructured data can be automatically analyzed; the corresponding conditional probability can be output according to the matching degree of the candidate and the working position and the possibility of the candidate passing the interview, and the output result can be flexibly adjusted according to the actual requirement of the service; the importance of the features found in the model training process can be summarized, and the decision process of the model can also be provided to a decision maker in the form of an insight report for understanding the latest change trend of talent supply and recruitment requirements.
It should be noted that although several modules or units of the system for parsing and evaluating resume sample data are mentioned in the above detailed description, such partitioning is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units. The components shown as modules or units may or may not be physical units, i.e. may be located in one place or may also be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
In an exemplary embodiment of the present application, there is also provided a computer readable storage medium having stored thereon a computer program comprising executable instructions which, when executed by, for example, a processor, may implement the steps of the method for parsing and evaluating resume sample data described in any of the above embodiments. In some possible implementations, various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present application described in the method for parsing and evaluating resume sample data of the present specification, when the program product is run on the terminal device.
A program product for implementing the above method according to an embodiment of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the present application, there is also provided an electronic device that may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the method for parsing and evaluating resume sample data in any of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the present application is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 that couples various system components including the memory unit 520 and the processing unit 510, a display unit 540, and the like.
Wherein the storage unit stores program code executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present application described in the present specification for a method for parsing and evaluating resume sample data. For example, the processing unit 510 may perform the steps as shown in fig. 3.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
The memory unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the method for parsing and evaluating resume sample data according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
Appendix
Data feature item list
Figure BDA0002848992850000171
Figure BDA0002848992850000181
Figure BDA0002848992850000191
Figure BDA0002848992850000201
Figure BDA0002848992850000211
Figure BDA0002848992850000221
Figure BDA0002848992850000231
Figure BDA0002848992850000241
Figure BDA0002848992850000251
Figure BDA0002848992850000261
Figure BDA0002848992850000271
Figure BDA0002848992850000281
Figure BDA0002848992850000291
Figure BDA0002848992850000301
Figure BDA0002848992850000311

Claims (21)

1. A method for parsing and evaluating resume sample data, the resume sample data including at least unstructured data, the method comprising:
acquiring resume sample data;
analyzing the resume sample data into at least one data characteristic item, wherein the data characteristic item is a structural variable; and
evaluating a level of the resume sample data based on the at least one data feature item using a decision tree model.
2. The method of claim 1, wherein the decision tree model is a gradient boosting iterative decision tree (GBDT) model.
3. The method of claim 1 or 2, wherein parsing the resume sample data into at least one data feature item comprises:
extracting information of unstructured data in the resume sample data into the at least one data feature item through regularization.
4. The method of claim 3, wherein the resume sample data further comprises structured data, the structured data being extracted by discretization as the at least one data feature item.
5. A method according to claim 1 or 2, wherein the resume sample data is checked prior to parsing the resume sample data into the at least one data feature item.
6. The method of claim 1 or 2, wherein evaluating the level of the resume sample data based on the at least one data feature item using the decision tree model comprises:
recording the data characteristic item corresponding to the segmentation condition characteristic and the level of the resume sample data associated with the data characteristic item when the resume sample data performs the segmentation operation at each node on the travel path in the decision tree model;
determining a level of the resume sample data based on all of the data feature items and the level of the resume sample data associated with the data feature items.
7. The method of claim 1 or 2, wherein the level of the resume sample data is a probability that the resume sample data can pass interviewing.
8. The method of claim 1 or 2, wherein obtaining the resume sample data further comprises:
and acquiring resume sample data from an external resume collection platform from a database.
9. The method of claim 8, wherein obtaining the resume sample data further comprises data cleansing the resume sample data.
10. The method of claim 1 or 2, further comprising providing the assessed level of the resume sample data to a person recruiting interviews through an external resume collection platform.
11. The method of claim 6, further comprising generating an evaluation report based on the evaluated level of the resume sample data, at least some or all of the data feature items, and the level of the resume sample data associated with the data feature items.
12. The method according to claim 11, wherein the evaluation report comprises information relating to key feature data items for the level of the resume sample data being evaluated and/or information relating to prior prejudices to the feature data items.
13. The method according to claim 1 or 2, further comprising setting an adjustable threshold for a level of the resume sample data to generate a binarization evaluation result with respect to the resume sample data.
14. The method of claim 1 or 2, further comprising setting a respective interview policy based on the reviewed level of the resume sample data.
15. The method according to claim 1 or 2, wherein the decision tree model is trained using historical resume sample data prior to using the decision tree model.
16. The method of claim 15, wherein training is performed using historical resume sample data over a period of time prior to parsing and evaluating the resume sample data.
17. The method of claim 16, wherein training is performed using historical resume sample data within 5 days to 12 weeks before parsing and evaluating the resume sample data.
18. An apparatus for parsing and evaluating resume sample data, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the executable instructions to implement the method of any of claims 1 to 17.
19. A system for parsing and evaluating resume sample data, comprising:
the apparatus of claim 18, and
a database.
20. The system of claim 19, wherein the database comprises:
a data acquisition unit configured to acquire the resume sample data from an external resume collection platform;
a big data platform configured to store the resume sample data; and
a data cleansing unit configured to extract, convert and/or cleanse the stored resume sample data.
21. A computer-readable storage medium, on which a computer program is stored, the computer program comprising executable instructions that, when executed by a processor, carry out the method according to any one of claims 1 to 17.
CN202011521270.9A 2020-12-21 2020-12-21 Method, equipment and system for analyzing and evaluating resume sample data Pending CN112699235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011521270.9A CN112699235A (en) 2020-12-21 2020-12-21 Method, equipment and system for analyzing and evaluating resume sample data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011521270.9A CN112699235A (en) 2020-12-21 2020-12-21 Method, equipment and system for analyzing and evaluating resume sample data

Publications (1)

Publication Number Publication Date
CN112699235A true CN112699235A (en) 2021-04-23

Family

ID=75509758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011521270.9A Pending CN112699235A (en) 2020-12-21 2020-12-21 Method, equipment and system for analyzing and evaluating resume sample data

Country Status (1)

Country Link
CN (1) CN112699235A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186983A (en) * 2022-02-16 2022-03-15 北森云计算有限公司 Video interview multidimensional scoring method, system, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130246290A1 (en) * 2012-03-16 2013-09-19 Precision Litigation, LLC Machine-Assisted Legal Assessments
US8818910B1 (en) * 2013-11-26 2014-08-26 Comrise, Inc. Systems and methods for prioritizing job candidates using a decision-tree forest algorithm
CN107291715A (en) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 Resume appraisal procedure and device
US20190102462A1 (en) * 2017-09-29 2019-04-04 International Business Machines Corporation Identification and evaluation white space target entity for transaction operations
CN110377560A (en) * 2019-07-18 2019-10-25 中科鼎富(北京)科技发展有限公司 A kind of structural method and device of biographic information
CN111198970A (en) * 2020-01-02 2020-05-26 中科鼎富(北京)科技发展有限公司 Resume matching method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130246290A1 (en) * 2012-03-16 2013-09-19 Precision Litigation, LLC Machine-Assisted Legal Assessments
US8818910B1 (en) * 2013-11-26 2014-08-26 Comrise, Inc. Systems and methods for prioritizing job candidates using a decision-tree forest algorithm
CN107291715A (en) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 Resume appraisal procedure and device
US20190102462A1 (en) * 2017-09-29 2019-04-04 International Business Machines Corporation Identification and evaluation white space target entity for transaction operations
CN110377560A (en) * 2019-07-18 2019-10-25 中科鼎富(北京)科技发展有限公司 A kind of structural method and device of biographic information
CN111198970A (en) * 2020-01-02 2020-05-26 中科鼎富(北京)科技发展有限公司 Resume matching method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨真;陈建安;: "招聘面试人工智能***的框架与模块研究", 江苏大学学报(社会科学版), no. 06 *
杨真;陈建安;: "招聘面试人工智能***的框架与模块研究", 江苏大学学报(社会科学版), no. 06, 30 November 2017 (2017-11-30) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186983A (en) * 2022-02-16 2022-03-15 北森云计算有限公司 Video interview multidimensional scoring method, system, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US20200410001A1 (en) Networked computer-system management and control
US10909188B2 (en) Machine learning techniques for detecting docketing data anomalies
WO2019165673A1 (en) Reimbursement form risk prediction method, apparatus, terminal device, and storage medium
Krishna et al. Artificial Intelligence Integrated with Big Data Analytics for Enhanced Marketing
US11948187B2 (en) Artificial intelligence based digital leasing assistant
CN113672732B (en) Method and device for classifying service data
US8868516B2 (en) Managing enterprise data quality using collective intelligence
US20230269265A1 (en) Systems and methods for cybersecurity risk mitigation and management
CN112508723B (en) Financial risk prediction method and device based on automatic preferential modeling and electronic equipment
US20110093309A1 (en) System and method for predictive categorization of risk
CA3050952A1 (en) Inspection risk estimation using historical inspection data
US20220374814A1 (en) Resource configuration and management system for digital workers
CN112651534B (en) Method, device and storage medium for predicting resource supply chain demand
CN112862182A (en) Investment prediction method and device, electronic equipment and storage medium
CN115249081A (en) Object type prediction method and device, computer equipment and storage medium
Verenich Explainable predictive monitoring of temporal measures of business processes
CN112699235A (en) Method, equipment and system for analyzing and evaluating resume sample data
Shankar et al. Analyzing attrition and performance of an employee using machine learning techniques
CN115438190B (en) Power distribution network fault auxiliary decision knowledge extraction method and system
Quan et al. Human Resource Analytics on Data Science Employment Based on Specialized Skill Sets with Salary Prediction
Adams-Prassl et al. Firm concentration & job design: the case of schedule flexible work arrangements
US20210027234A1 (en) Systems and methods for analyzing user projects
CN113850609A (en) Customer management system, method, computer equipment and storage medium
Canitz Machine Learning in Supply Chain Planning--When Art & Science Converge.
Sembina Building a Scoring Model Using the Adaboost Ensemble Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20210423

Assignee: Baisheng Consultation (Shanghai) Co.,Ltd.

Assignor: Shengdoushi (Shanghai) Technology Development Co.,Ltd.

Contract record no.: X2023310000138

Denomination of invention: Method, equipment, and system for analyzing and evaluating resume sample data

License type: Common License

Record date: 20230714

EE01 Entry into force of recordation of patent licensing contract