CN108932294B - Resume data processing method, device, equipment and storage medium based on index - Google Patents

Resume data processing method, device, equipment and storage medium based on index Download PDF

Info

Publication number
CN108932294B
CN108932294B CN201810548843.3A CN201810548843A CN108932294B CN 108932294 B CN108932294 B CN 108932294B CN 201810548843 A CN201810548843 A CN 201810548843A CN 108932294 B CN108932294 B CN 108932294B
Authority
CN
China
Prior art keywords
resume
index
effective
data
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810548843.3A
Other languages
Chinese (zh)
Other versions
CN108932294A (en
Inventor
张师琲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810548843.3A priority Critical patent/CN108932294B/en
Priority to PCT/CN2018/094393 priority patent/WO2019227585A1/en
Publication of CN108932294A publication Critical patent/CN108932294A/en
Application granted granted Critical
Publication of CN108932294B publication Critical patent/CN108932294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of computers, and provides a resume data processing method, a resume data processing device, computer equipment and a storage medium based on indexes, wherein the resume data processing method comprises the following steps: acquiring an original resume file; preprocessing an original resume file according to a preset text format to obtain a resume text; analyzing the resume text according to preset keywords to obtain effective keywords in the resume text and data information corresponding to each effective keyword; for each effective keyword, packaging the effective keyword and corresponding data information into an index block; establishing an index item for each index block; and correspondingly storing the index block and the index item in a resume library. The invention realizes complete extraction of the resume, is beneficial to management of the resume text data information and establishment of the resume library, realizes quick search of the data information, and can improve the efficiency and accuracy of data information retrieval.

Description

Resume data processing method, device, equipment and storage medium based on index
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a resume data processing method, device, equipment and storage medium based on an index.
Background
At present, in the process of managing the resume, the traditional method only can rely on a large amount of manpower to conduct classification processing in the face of personal resume with different formats, and various personal resume information is manually input into a database. Meanwhile, due to the convenience of delivering the resume on the internet, job seekers can deliver multiple resume to apply for job positions to multiple enterprises in a short time, the specific format of personal resume information is changed from person to person due to different design styles of various personal resume and different writing habits of individuals, and although the resume templates are also provided by recruitment websites on the internet, the resume templates provided by different recruitment websites are different, so that enterprises recruited through the internet need to invest a large amount of labor cost every day to process hundreds of received electronic resume, the establishment and management of a database and the retrieval of resume information bring inconvenience, and the retrieval efficiency and accuracy of resume data information are low.
Disclosure of Invention
Based on the above, it is necessary to provide an index-based resume data processing method, device, equipment and storage medium, which can realize quick search of resume data information and improve efficiency and accuracy of resume data information retrieval.
An index-based resume data processing method, comprising:
acquiring an original resume file;
preprocessing the original resume file according to a preset text format to obtain a resume text;
analyzing the resume text according to preset keywords to obtain effective keywords in the resume text and data information corresponding to each effective keyword;
for each effective keyword, packaging the effective keyword and the corresponding data information into an index block;
establishing an index item for each index block;
and correspondingly storing the index block and the index item in a resume library.
An index-based resume data processing apparatus comprising:
the information acquisition module is used for acquiring an original resume file;
the text processing module is used for preprocessing the original resume file according to a preset text format to obtain resume text;
the information analysis module is used for analyzing the resume text according to preset keywords and obtaining effective keywords in the resume text and data information corresponding to each effective keyword;
the information packaging module is used for packaging the effective keywords and the corresponding data information into index blocks aiming at each effective keyword;
The index establishing module is used for establishing an index item for each index block;
and the index storage module is used for correspondingly storing the index block and the index item in a resume library.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the index-based resume data processing method described above when the computer program is executed.
A computer readable storage medium storing a computer program which when executed by a processor implements the steps of the index-based resume data processing method described above.
According to the resume data processing method, device, equipment and storage medium based on indexes, the original resume file is obtained, the original resume file is preprocessed according to the preset text format to obtain resume texts, the resume texts are analyzed according to the preset keywords, the effective keywords in the resume texts and the data information corresponding to each effective keyword are obtained, the integrity of extraction of the resume text data information can be ensured, meanwhile, the effective keywords and the data information corresponding to the effective keywords are packaged into index blocks, index items are built for each index block, and the index blocks and the index items are correspondingly stored in a resume library, so that management of the resume text data information and establishment of the resume library are facilitated, and the data information in the index blocks can be quickly searched through the index items, and the efficiency and the accuracy of data information retrieval are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment of an index-based resume data processing method according to an embodiment of the invention;
FIG. 2 is a flow chart of a method for index-based resume data processing in accordance with an embodiment of the present invention;
FIG. 3 is a flowchart illustrating an implementation of step S3 in an index-based resume data processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart showing an implementation of step S303 in the index-based resume data processing method according to an embodiment of the present invention;
FIG. 5 is a flowchart showing an implementation of step S5 in the index-based resume data processing method according to an embodiment of the present invention;
FIG. 6 is a flow chart of processing a resume download request in an index-based resume data processing method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an index-based resume data processing apparatus according to an embodiment of the invention;
FIG. 8 is a schematic diagram of a computer device in accordance with an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 shows an application environment provided by an embodiment of the present invention, where the application environment includes a server and a client, where the server and the client are connected through a network, the client is configured to collect a resume file and send the collected resume file to the server, and the client may specifically but not be limited to various personal computers, notebook computers, smartphones, tablet computers and portable wearable devices; the server side is used for processing the resume file, and the server side can be realized by an independent server or a server cluster formed by a plurality of servers. The resume data processing method based on the index provided by the embodiment of the invention is applied to the server.
Referring to fig. 2, fig. 2 shows an implementation flow of the index-based resume data processing method provided in this embodiment. The details are as follows:
s1: and obtaining an original resume file.
In this embodiment, file types of the original resume file may include, but are not limited to: doc, pdf, html, etc., the language type of the original resume file may include, but is not limited to: chinese, english, japanese, etc., it should be understood that the file types and language types listed herein are examples only, and that other and further file types or language types are possible, without limitation.
The method for obtaining the original resume file may be receiving the resume file uploaded by the user, or obtaining the resume file from a third party resume library at an automatic timing, or the like, or may be other obtaining methods, which are not limited herein, wherein the third party resume library may be a resume library of the network recruitment platform.
S2: and preprocessing the original resume file according to a preset text format to obtain a resume text.
It should be noted that, the preset text format may be xml, pdf, doc, or the like, but is not limited thereto, and may be specifically set according to the needs of practical applications, which is not limited herein.
The preprocessing is to perform text format conversion and file content reading on the original resume file, and the processing can be performed by using a Tika tool of an analysis class library or other tools, which is not limited herein.
Preferably, the original resume file is preprocessed by using a Tika tool, wherein Tika is a library for detecting file types and extracting contents from files in various formats, and data is detected and extracted by using various file resolvers and detection technologies of file types, and a universal application programming interface (Application Programming Interface, API) for resolving different file formats is provided by Tika, and data in various file formats is detected and extracted by providing a universal API, so that the integrity of a read resume text can be ensured, and meanwhile, the data extracted into various file formats is subjected to format conversion by using an internal plug-in, so that the conversion of an irregular resume text format into a unified preset text format can be realized.
APIs are generic application programming interfaces that provide applications and developers the ability to access a set of routines based on some software or hardware, and without having to access source code, can directly understand the details of the internal operating mechanisms of the application.
Specifically, the original resume files with various different text formats obtained in the step S1 are read by using a Tika tool, the integrity of the obtained resume text can be ensured, and the read file content is subjected to format conversion according to a preset text format to obtain resume text with uniform format, so that the unified standardization of irregular resume text is realized, and the subsequent analysis and extraction of data information of the resume text are facilitated.
As an example, according to a preset text format, format conversion may be performed on various types of original resume files without a fixed text format, where a space may be indicated by "×", for example, the doc resume text format is "name Li Lei" × and the pdf resume text format is "× Li Lei" × and the resume text with the two different formats is converted according to the preset text format, for example, converted into a text format of "× Li Lei" × ".
S3: analyzing the resume text according to preset keywords to obtain effective keywords in the resume text and data information corresponding to each effective keyword.
In this embodiment, the preset keywords may be, but are not limited to, names, schools, professions, educational experiences, work experiences, etc., and may be specifically set according to actual application requirements, which is not limited herein. The effective keywords are texts matched with preset keywords in the resume text. The data information refers to specific resume text content corresponding to the effective keywords.
Specifically, the specific method for parsing the resume text obtained in the step S2 according to the preset keywords may include: traversing the resume text, retrieving keywords in the resume text, taking the retrieved keywords as effective keywords, and taking the content between the two effective keywords as the data information of the previous effective keywords according to the sequence of the keywords.
Further, each preset keyword may correspond to a keyword set, where the keyword set includes keywords with the same word meaning, for example, the keyword set corresponding to the preset keyword "address" may include keywords indicating addresses such as "home address", and "address", and when the resume text is parsed, for each preset keyword, if the keyword in the keyword set corresponding to the preset keyword is retrieved in the resume text, the keyword in the retrieved keyword set is used as a valid keyword, for example, for the preset keyword of "address", if the "address" is retrieved in the resume text, the "address" is used as a valid keyword.
Further, the user can flexibly configure the keywords in the keyword set according to the requirement, so that the effective keywords can be more accurately matched in the analysis process of the resume text.
S4: and packaging each effective keyword and the corresponding data information thereof into an index block.
In this embodiment, the index block is a data packet obtained by combining the effective keyword, the data information corresponding to the effective keyword, and the mapping relationship between the effective keyword and the data information.
The packaging is a process of combining the effective keywords, the data information corresponding to the effective keywords, and the mapping relation between the effective keywords and the data information, and the packaging modes include, but are not limited to: the construction classes, components or functional modules, etc. may also be implemented in other ways, without limitation.
Preferably, the packaging mode adopted by the embodiment of the invention is to construct a functional module, and the effective keywords, the data information corresponding to the effective keywords and the mapping relation between the effective keywords and the data information are modularized mainly by a public method, so that the effective keywords and the corresponding data information can be packaged into an index block.
Specifically, the effective keywords obtained in the step S3, the data information corresponding to the effective keywords, and the mapping relation between the effective keywords and the data information are combined into an independent whole, namely the packaged index block, so that maintenance and management of the effective keywords and the data information corresponding to each effective keyword can be realized.
S5: an index entry is established for each index block.
In this embodiment, the index entries have a mapping relationship with each index block, and are used for fast matching and searching for the tag entry of the corresponding index block, where one index block at least corresponds to one index entry.
The index item is a label item with a mapping relation for each index block, and can be established by adopting a full-text search server engine or a search server, or by other modes, without limitation.
Preferably, the present embodiment uses a full text search server solr engine to build the index item.
It should be noted that, the solr is a high-performance full text search server with fast search speed. The software engine provides rich query language, realizes configurable and extensible configuration performance, and provides a perfect function management interface.
lucene is a full text search engine toolkit of open source code that provides a complete query engine, indexing engine, and partial text analysis engine.
S6: and correspondingly storing the index block and the index item in a resume library.
In this embodiment, the correspondence between the index blocks and the index entries may be one-to-one correspondence, or one-to-many correspondence.
For example, the "index block a" only establishes one "index item a", that is, the "index block a and the index item a" are in one-to-one correspondence; the "index block B" establishes a plurality of index items, such as "index item B", "index item c", and "index item d", etc., that is, "index block B is in one-to-many correspondence with index item B, index item c, and index item d".
In this embodiment, by obtaining an original resume file, preprocessing the original resume file according to a preset text format to obtain resume text, unified standardization of irregular resume text is achieved, analysis and extraction of data information of resume text are facilitated, and according to preset keywords, resume text is analyzed, effective keywords in resume text and data information corresponding to each effective keyword are obtained, rapid extraction of data information of the effective keywords can be achieved, extraction integrity of data information of the effective keywords is guaranteed, meanwhile, the effective keywords and the data information corresponding to the effective keywords are packaged into index blocks, index items are established for each index block, the index blocks and the index items are correspondingly stored in a resume library, management and maintenance of data information of resume text are facilitated, rapid search of data information in the index blocks can be achieved through the index items, and efficiency and accuracy of data information search are improved.
In an embodiment, as shown in fig. 3, in step S3, the parsing of the resume text according to the preset keywords, to obtain the valid keywords in the resume text and the data information corresponding to each valid keyword specifically includes the following steps:
s301: and extracting the tags of the resume text to obtain the title tags.
In this embodiment, the title tag is a series of tags representing personal characteristics, such as extraction and identification of content in the resume text, and is used for describing professions, academia, work experiences, and the like in the resume, where the title tag may be specifically "name", "academia", "educational experience" or "work experience" and the like.
The label extraction mode may be feature extraction, but may also be other extraction modes, and is not limited herein.
Preferably, in this embodiment, a feature extraction manner is adopted to perform tag extraction, for example, a text line and features of the text line in the resume text are obtained, and according to a preset feature index, the features of the text line are compared with the feature index, and the text line meeting the feature index requirement is marked as a title tag.
It can be understood that the text line is a word or a sentence in the resume text, and the feature of the text line is an attribute feature for describing the text line, and the preset feature index is set according to the actual application requirement and is used as a standard for extracting the title label.
S302: and matching the title label with the keyword, and determining the successfully matched title label as the effective keyword.
In this embodiment, the matching of the title tag and the keyword may be performed by condition matching, or may be performed by other ways, which is not limited herein, where the condition in condition matching may be set according to the actual application requirement, which is not limited herein.
Preferably, the matching method adopted in the embodiment is conditional matching, and the conditional matching process may specifically be that whether the word senses of the title tag and the keywords are the same is determined according to a preset word bank, wherein a set of hyponyms corresponding to each keyword is defined in the preset word bank, for example, the set of hyponyms of "education experience" includes "education experience", "education degree", etc., the set of hyponyms of "work experience" includes "work experience", "history", etc., and if the title tag belongs to the set of hyponyms of the keywords, it is confirmed that the title tag is the same as the word sense of the keyword, that is, the matching is successful; the condition matching process may also be that the text similarity between the title tag and the keyword is calculated, if the text similarity is greater than or equal to a preset similarity threshold, the matching is successful, where the preset similarity threshold may be specifically 80%, or may be other numerical values, specifically may be set according to the actual application requirement, and the present invention is not limited herein.
S303: and analyzing the resume text according to the effective keywords to obtain data information corresponding to each effective keyword in the resume text.
The manner in which the resume text is parsed may include, but is not limited to: the data partitioning method, regular expression, score algorithm and the like can be specifically set according to actual application requirements, and are not particularly limited.
Preferably, in the embodiment, a data dividing method is adopted for analyzing, the data dividing method selects boundary marks in a text, the boundary marks are used as intervals, the text is divided into independent boundary marks and text blocks corresponding to each boundary mark, wherein the boundary marks are effective keywords in a resume text, and the text blocks corresponding to each boundary mark are data information corresponding to each effective keyword.
Specifically, according to the effective keywords, the content in the resume text obtained in the step S2 is divided and extracted, so that data information corresponding to each effective keyword in the resume text is obtained, and the integrity of content extraction of the resume text can be ensured.
In this embodiment, the resume text obtained in step S2 is subjected to label extraction to obtain a title label, which is favorable for performing target locking on the data information to be extracted in the subsequent step, matching the title label with a keyword, determining the successfully matched title label as an effective keyword, so as to be convenient for determining the position of the data information to be extracted, and meanwhile, analyzing the resume text according to the effective keyword to obtain the data information corresponding to the effective keyword, thereby ensuring the integrity of content extraction of the resume text.
In an embodiment, as shown in fig. 4, in step S303, the parsing the resume text according to the valid keywords, and obtaining the data information corresponding to each valid keyword in the resume text specifically includes the following steps:
s3031: and cutting the resume text into a plurality of data blocks by taking the effective keywords as intervals.
In this embodiment, the interval is used to represent the boundary between data blocks, and the effective key is used as the boundary identifier.
The data block refers to a resume text between a valid keyword and a next valid keyword, and is divided into one data block.
Specifically, the effective keywords obtained in the step S3 are used as demarcation marks for dividing the resume text, data are divided into corresponding data blocks, the dividing operation of the resume text is simple and clear, the efficiency and the rapidness are realized, and the rapid extraction of the resume text data information can be realized.
S3032: and regarding each effective keyword, taking the data block between the effective keyword and the next effective keyword as the data information corresponding to the effective keyword.
For example, it is assumed that two adjacent effective keywords in the resume text are respectively an "effective keyword a" and an "effective keyword B", and the resume text between the "effective keyword a" and the "effective keyword B" is used as the data information corresponding to the "effective keyword a".
It should be noted that, if the "effective keyword C" is the last effective keyword in the resume text, the resume text between the "effective keyword C" and the text ending symbol of the resume text is used as the data information corresponding to the "effective keyword C".
In this embodiment, the valid keywords obtained in step S3 are taken as intervals, the resume text is segmented into each data block, and the data block between the valid keyword and the next valid keyword is taken as the data information corresponding to the valid keyword, so that the data information extraction mode of dividing the data blocks is simple and clear, the rapid extraction of the data information can be realized, and the integrity and the high efficiency of the data information extraction can be ensured.
In one embodiment, as shown in fig. 5, in step S5, establishing an index entry for each index block specifically includes the following steps:
s501: according to a preset word segmentation method, word segmentation processing is carried out on the effective keywords in the index block and the data information corresponding to the effective keywords, so as to obtain vocabulary units.
In this embodiment, the preset word segmentation method may specifically be an IK word segmentation algorithm in a plug-in a solr engine, that is, a forward iteration finest granularity segmentation algorithm, or may also be other word segmentation methods, which may specifically be selected according to actual application requirements, and is not limited herein.
The vocabulary unit is used for carrying out word segmentation on the effective keywords in the index block and the data information corresponding to the effective keywords to obtain words.
Specifically, the effective keywords and the data information corresponding to the effective keywords obtained in the step S3 are removed from the data information punctuation marks, stop words are removed, and then the data information corresponding to the effective keywords and the effective keywords is split into independent words, wherein the stop words are commonly used stop words in chinese, for example, "one-to-one", "one-to-one", "one-and-one", and "one-to-one", and the like, which are beneficial to ensuring that the words obtained by word segmentation processing are single, meaningful and complete words.
S502: and performing de-duplication processing on the vocabulary units to obtain a single vocabulary unit.
In this embodiment, the deduplication process may be implemented by a tool with a deduplication function, for example, a deduplication plug-in a solr engine is adopted, or a corresponding tool is selected according to actual application requirements, which is not particularly limited herein.
Specifically, step S501 is traversed to obtain vocabulary units, repeated words are screened out and deleted to obtain a single vocabulary unit, so that the words in the single vocabulary unit are unique, the use of storage space is reduced, the establishment and management of a resume library are facilitated, and meanwhile, repeated redundancy can be effectively avoided during subsequent searching, the workload of a machine is reduced, and the searching efficiency is improved.
S503: and performing format conversion on the single vocabulary unit according to a preset word format to obtain a standard vocabulary unit.
In this embodiment, the preset word format may specifically be a root lowercase format of an english word, but is not limited thereto, and may specifically be set according to actual application requirements, which is not limited herein.
The format conversion mode of the single vocabulary unit can adopt a language processing plug-in the solr engine, and can also adopt other plug-ins, and the method is not limited in this regard.
Specifically, step S502 is traversed to obtain a single vocabulary unit, if the preset word format is the root lowercase format of the english word, the english word in the word is identified, the uppercase format appearing in the english word is converted into the lowercase format, and then the lowercase format english word is converted into the root form, so that the word with uniform format can be obtained, and the rapid identification in the subsequent search process is facilitated, thereby being beneficial to improving the query efficiency.
For example, the vocabulary units include english words such as "Annual", "Tom", and the like, and the language processing plug-in of the solr converts "Annual" into "Annual", and then converts "Annual" into "ann", "Tom" into "Tom" format.
S504: according to a preset ordering mode, ordering the standard vocabulary units, taking the ordered standard vocabulary units as index items of the index block, and establishing a mapping relation between the index items and the index block.
In this embodiment, the preset sorting manner may be in order of from small to large according to the number of strokes, or may be other sorting manners, which may be specifically set according to the actual application requirement, and is not limited herein. The mapping relation between the index items and the index blocks is established by establishing one-to-one correspondence relation between the index items and the index blocks or many-to-one correspondence relation.
Specifically, the standard vocabulary units obtained in step S503 are traversed, the standard vocabulary units are ordered according to the order of the number of strokes from small to large, and the ordered standard vocabulary units are used as index items of the index block, so that a mapping relationship between the index items and the index block is established.
For example, standard vocabulary units include words such as "Li Rui", "Li Lei", "Li Hua", and the like, which are ordered in order of the number of strokes from small to large, resulting in standard vocabulary units in the order of "Li Hua", "Li Rui", "Li Lei".
In this embodiment, according to a preset word segmentation method, word segmentation is performed on effective keywords in an index block and data information corresponding to the effective keywords to obtain vocabulary units, so that the obtained words are ensured to be independent, meaningful and complete, the vocabulary units are subjected to duplication removal processing, the use of storage space can be reduced, the establishment and management of a resume library are facilitated, meanwhile, according to a preset word format, format conversion is performed on single vocabulary units to obtain standard vocabulary units, the standard vocabulary units are ordered according to a preset ordering mode, the ordered standard vocabulary units are used as index items of the index block, the mapping relation between the index items and the index block is established, the machine can rapidly identify the index items during subsequent searching, and the machine workload can be reduced, so that the efficiency of querying the index block is improved.
In an embodiment, as shown in fig. 6, after step S6, the index-based resume data processing method further includes the following steps:
s7: and if a resume information downloading request sent by a user is received, acquiring query condition information in the resume information downloading request, wherein the query condition information at least comprises one query condition item.
In this embodiment, the query condition information may include one or more query condition terms, where the query condition terms are used to match the index terms, and may implement a query on resume information. For example, the term of the query condition may be words of junior middle school, high school, university, college, family, research student, national enterprise, foreign enterprise, etc.
S8: and matching the query condition items in the query condition information with the index items in the resume library.
Specifically, based on the query term in step S7, the index term is traversed in the resume library, and the index term identical to the query term is searched. If the index item which is the same as the query condition item is found, the index item is considered to be matched, and the step S9 is continuously executed, otherwise, the step S10 is skipped to be continuously executed.
S9: if the index item is matched, acquiring effective keywords and data information in an index block corresponding to the index item which is successfully matched, matching the effective keywords with template tags according to template tags in a preset standard resume template, importing the data information corresponding to the effective keywords which is successfully matched into positions corresponding to the template tags, and generating and displaying a standard resume report;
In this embodiment, the preset standard resume template is set according to the actual application requirement, which is not limited herein. The template label may be, but is not limited to, name, academic, educational history, work history, etc., and may be specifically set according to practical application requirements, which is not limited herein.
Further, the matching of the effective keyword with the template tag may be performed in a condition matching manner, and the condition matching manner may specifically be the same as the condition matching manner performed by the keyword tag and the keyword in step S302, which is not described herein.
It should be noted that, data information corresponding to the effective keywords successfully matched is imported into the positions corresponding to the template labels to generate the standard resume report, the system automatically stores the standard resume report in the resume library, repeated generation of the same resume report conforming to the query condition item is avoided, the query efficiency of the resume can be improved, and the occupancy rate of the disk space in the resume library is reduced.
S10: and if the index item is not matched, reporting and displaying a preset standard resume template serving as a blank information resume.
Specifically, if the index item is not matched in step S8, the preset standard resume template is directly used as a blank information resume report and is displayed to the user, so that the user can conveniently input resume information according to the blank information resume report.
In this embodiment, according to the query condition item in the resume information downloading request sent by the user, the index item in the resume library is matched, if the index item is matched, the effective keyword and the data information in the index block corresponding to the index item which is successfully matched are obtained, so that quick search of resume information can be realized, meanwhile, according to the template tag in the preset standard resume template, the effective keyword and the template tag are matched, and the data information corresponding to the effective keyword which is successfully matched is imported to the position corresponding to the template tag, so that the standard resume report is generated and displayed, unified standardization of the resume report is realized, and convenience is brought to the user to view and download.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
In an embodiment, an index-based resume data processing device is provided, where the index-based resume data processing device corresponds to the index-based resume data processing method in the above embodiment one by one. As shown in fig. 7, the index-based resume data processing apparatus includes an information acquisition module 701, a text processing module 702, an information parsing module 703, an information packaging module 704, an index creation module 705, and an index saving module 706. The functional modules are described in detail as follows:
An information obtaining module 701, configured to obtain an original resume file;
the text processing module 702 is configured to pre-process the original resume file according to a preset text format to obtain a resume text;
the information analysis module 703 is configured to analyze the resume text according to preset keywords, and obtain valid keywords in the resume text and data information corresponding to each valid keyword;
an information packaging module 704, configured to package, for each valid keyword, the valid keyword and corresponding data information thereof into an index block;
an index establishing module 705, configured to establish an index entry for each index block;
and the index saving module 706 is configured to save the index block and the index item in the resume library.
Further, the information parsing module 703 includes:
the tag extraction unit 7031 is used for extracting tags from the resume text to obtain a title tag;
a tag matching unit 7032, configured to match the title tag with the keyword, and determine the title tag that is successfully matched as a valid keyword;
the data parsing unit 7033 is configured to parse the resume text according to the valid keywords, and obtain data information corresponding to each valid keyword in the resume text.
Further, the data analysis unit 7033 includes:
the data segmentation unit 70331 is used for segmenting the resume text into a plurality of data blocks with effective keywords as intervals;
the data determining subunit 70332 is configured to, for each valid keyword, use a data block between the valid keyword and a next valid keyword as data information corresponding to the valid keyword.
Further, the index establishment module 705 includes:
the data word segmentation unit 7051 is configured to perform word segmentation on the effective keywords in the index block and data information corresponding to the effective keywords according to a preset word segmentation method, so as to obtain a vocabulary unit;
the data deduplication unit 7052 is configured to perform deduplication processing on the vocabulary units to obtain a single vocabulary unit;
the data conversion unit 7053 is configured to perform format conversion on the single vocabulary unit according to a preset word format, so as to obtain a standard vocabulary unit;
the data sorting unit 7054 is configured to sort the standard vocabulary units according to a preset sorting manner, use the sorted standard vocabulary units as index items of the index block, and establish a mapping relationship between the index items and the index block.
Further, the index-based resume data processing device further includes:
A request receiving module 707, configured to obtain query condition information in a resume information downloading request if a resume information downloading request sent by a user is received, where the query condition information includes at least one query condition item;
a condition matching module 708, configured to match a query term in the query condition information with an index term in the resume library;
the data export module 709 is configured to obtain valid keywords and data information in an index block corresponding to the index item that is successfully matched if the index item is matched, match the valid keywords with template tags according to template tags in a preset standard resume template, import the data information corresponding to the valid keywords that is successfully matched into positions corresponding to the template tags, and generate and display a standard resume report;
and the template export module 710 is configured to report and display a preset standard resume template as a blank information resume if the index item is not matched.
For specific limitations on the index-based resume data processing apparatus, reference may be made to the above limitation on the index-based resume data processing method, and no further description is given here. The various modules in the index-based resume data processing device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing resume data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements an index-based resume data processing method.
In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement steps of the index-based resume data processing method of the above embodiment, such as steps S1 to S6 shown in fig. 2. Alternatively, the processor may implement the functions of each module/unit of the index-based resume data processing apparatus in the above embodiment when executing the computer program, for example, the functions of the modules 701 to 706 shown in fig. 7. In order to avoid repetition, a description thereof is omitted.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, where the computer program when executed by a processor implements the index-based resume data processing method in the above method embodiment, or where the computer program when executed by a processor implements the functions of each module/unit in the index-based resume data processing device in the above device embodiment. In order to avoid repetition, a description thereof is omitted.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; while the invention has been described in detail with reference to the foregoing embodiments, it will be appreciated by those skilled in the art that variations may be made in the techniques described in the foregoing embodiments, or equivalents may be substituted for elements thereof; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (9)

1. The resume data processing method based on the index is characterized by comprising the following steps of:
acquiring an original resume file;
preprocessing the original resume file according to a preset text format to obtain a resume text;
Analyzing the resume text according to preset keywords to obtain effective keywords in the resume text and data information corresponding to each effective keyword;
for each effective keyword, packaging the effective keyword and the corresponding data information into an index block; the index block is a data packet obtained by combining the effective keywords, the data information corresponding to the effective keywords and the mapping relation between the effective keywords and the data information;
establishing an index item for each index block;
correspondingly storing the index block and the index item in a resume library;
wherein said establishing an index entry for each of said index blocks comprises:
according to a preset word segmentation method, word segmentation is carried out on the effective keywords in the index block and the data information corresponding to the effective keywords, so as to obtain vocabulary units;
performing de-duplication processing on the vocabulary units to obtain single vocabulary units;
performing format conversion on the single vocabulary unit according to a preset word format to obtain a standard vocabulary unit;
and sequencing the standard vocabulary units according to a preset sequencing mode, taking the sequenced standard vocabulary units as the index items of the index block, and establishing the mapping relation between the index items and the index block.
2. The method for processing resume data based on index as claimed in claim 1, wherein said parsing the resume text according to the preset keywords, obtaining the valid keywords in the resume text and the data information corresponding to each valid keyword comprises:
extracting the label from the resume text to obtain a title label;
matching the title label with the keyword, and determining the successfully matched title label as the effective keyword;
and analyzing the resume text according to the effective keywords to obtain the data information corresponding to each effective keyword in the resume text.
3. The method for processing resume data based on index as claimed in claim 2, wherein said parsing the resume text according to the effective keywords, and obtaining the data information corresponding to each effective keyword in the resume text comprises:
dividing the resume text into a plurality of data blocks by taking the effective keywords as intervals;
and regarding each effective keyword, taking the data block between the effective keyword and the next effective keyword as data information corresponding to the effective keyword.
4. An index-based resume data processing method according to any one of claims 1 to 3, wherein after said storing the index block and the index item in correspondence in a resume library, the resume data processing method further comprises:
if a resume information downloading request sent by a user is received, acquiring inquiry condition information in the resume information downloading request, wherein the inquiry condition information at least comprises an inquiry condition item;
matching the query condition items in the query condition information with the index items in the resume library;
if the index item is matched, acquiring the effective keyword and the data information in the index block corresponding to the index item which is successfully matched, matching the effective keyword with the template tag according to a template tag in a preset standard resume template, importing the data information corresponding to the effective keyword which is successfully matched into a position corresponding to the template tag, and generating a standard resume report and displaying;
and if the index item is not matched with the index item, reporting and displaying the preset standard resume template as a blank information resume.
5. An index-based resume data processing apparatus, the resume data processing apparatus comprising:
the information acquisition module is used for acquiring an original resume file;
the text processing module is used for preprocessing the original resume file according to a preset text format to obtain resume text;
the information analysis module is used for analyzing the resume text according to preset keywords and obtaining effective keywords in the resume text and data information corresponding to each effective keyword;
the information packaging module is used for packaging the effective keywords and the corresponding data information into index blocks aiming at each effective keyword; the index block is a data packet obtained by combining the effective keywords, the data information corresponding to the effective keywords and the mapping relation between the effective keywords and the data information;
the index establishing module is used for establishing an index item for each index block;
the index storage module is used for correspondingly storing the index block and the index item in a resume library;
wherein, the index establishment module comprises:
The data word segmentation unit is used for carrying out word segmentation on the effective keywords in the index block and the data information corresponding to the effective keywords according to a preset word segmentation method to obtain a vocabulary unit;
the data deduplication unit is used for performing deduplication processing on the vocabulary units to obtain a single vocabulary unit;
the data conversion unit is used for carrying out format conversion on the single vocabulary unit according to a preset word format to obtain a standard vocabulary unit;
the data ordering unit is used for ordering the standard vocabulary units according to a preset ordering mode, taking the ordered standard vocabulary units as index items of the index block, and establishing a mapping relation between the index items and the index block.
6. The index-based resume data processing device of claim 5, wherein the information parsing module comprises:
the label extracting unit is used for extracting labels from the resume text to obtain a title label;
the label matching unit is used for matching the title label with the keyword and determining the title label successfully matched as the effective keyword;
and the data analysis unit is used for analyzing the resume text according to the effective keywords and acquiring the data information corresponding to each effective keyword in the resume text.
7. The index-based resume data processing device of claim 6, wherein the data parsing unit comprises:
the data segmentation subunit is used for segmenting the resume text into a plurality of data blocks by taking the effective keywords as intervals;
and the data determination subunit is used for regarding the data blocks between the effective keywords and the next effective keywords as data information corresponding to the effective keywords for each effective keyword.
8. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the index-based resume data processing method of any of claims 1 to 4 when the computer program is executed.
9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the index-based resume data processing method of any of claims 1 to 4.
CN201810548843.3A 2018-05-31 2018-05-31 Resume data processing method, device, equipment and storage medium based on index Active CN108932294B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810548843.3A CN108932294B (en) 2018-05-31 2018-05-31 Resume data processing method, device, equipment and storage medium based on index
PCT/CN2018/094393 WO2019227585A1 (en) 2018-05-31 2018-07-04 Index-based resume data processing method, device, apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810548843.3A CN108932294B (en) 2018-05-31 2018-05-31 Resume data processing method, device, equipment and storage medium based on index

Publications (2)

Publication Number Publication Date
CN108932294A CN108932294A (en) 2018-12-04
CN108932294B true CN108932294B (en) 2024-01-09

Family

ID=64449207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810548843.3A Active CN108932294B (en) 2018-05-31 2018-05-31 Resume data processing method, device, equipment and storage medium based on index

Country Status (2)

Country Link
CN (1) CN108932294B (en)
WO (1) WO2019227585A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948120B (en) * 2019-04-02 2023-03-14 深圳市前海欢雀科技有限公司 Binary resume parsing method
CN110265032A (en) * 2019-06-05 2019-09-20 平安科技(深圳)有限公司 Conferencing data analysis and processing method, device, computer equipment and storage medium
CN110413771A (en) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 Classified index method, apparatus, equipment and storage medium based on solr
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110990397A (en) * 2019-11-01 2020-04-10 东方微银科技(北京)有限公司 Credit investigation data extraction method and device
CN111143517B (en) * 2019-12-30 2023-09-05 浙江阿尔法人力资源有限公司 Human selection label prediction method, device, equipment and storage medium
CN111339244A (en) * 2020-02-29 2020-06-26 山东浪潮通软信息科技有限公司 Tax policy and regulation inquiry method, computer equipment and storage medium
CN111913910B (en) * 2020-06-23 2022-10-11 复旦大学附属中山医院厦门医院 Follow-up file data extraction method and system
CN112100313B (en) * 2020-08-05 2024-04-12 山东鲁软数字科技有限公司 Data indexing method and system based on finest granularity segmentation
CN111930805A (en) * 2020-08-10 2020-11-13 中国平安人寿保险股份有限公司 Information mining method and computer equipment
CN112199461B (en) * 2020-09-17 2022-05-31 暨南大学 Document retrieval method, device, medium and equipment based on block index structure
CN112149389A (en) * 2020-09-27 2020-12-29 南方电网数字电网研究院有限公司 Resume information structured processing method and device, computer equipment and storage medium
CN113268306B (en) * 2021-06-08 2024-03-19 金蝶软件(中国)有限公司 Resume analysis interface calling method and device and computer storage medium
CN113807807A (en) * 2021-08-16 2021-12-17 深圳市云采网络科技有限公司 Component parameter identification method and device, electronic equipment and readable medium
CN113485282B (en) * 2021-09-07 2021-12-07 西安热工研究院有限公司 Message tracking display method, system, equipment and storage medium for distributed control system
CN114168715A (en) * 2022-02-10 2022-03-11 深圳希施玛数据科技有限公司 Method, device and equipment for generating target data set and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826084A (en) * 2009-03-05 2010-09-08 深圳市万泉河科技有限公司 Fast searching method for files, fast searching method for mass talent hiring on Internet and system
CN102023989A (en) * 2009-09-23 2011-04-20 阿里巴巴集团控股有限公司 Information retrieval method and system thereof
CN102231168A (en) * 2011-07-29 2011-11-02 前锦网络信息技术(上海)有限公司 Method for quickly retrieving resume from resume database
CN107145584A (en) * 2017-05-10 2017-09-08 西南科技大学 A kind of resume analytic method based on n gram models
CN107563725A (en) * 2017-08-25 2018-01-09 浙江网新恒天软件有限公司 A kind of recruitment system for optimizing cumbersome personnel recruitment process

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020119A (en) * 2012-11-16 2013-04-03 北京北森测评技术有限公司 Conversion method, device and system for converting paper edition resume into electronic edition resume

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826084A (en) * 2009-03-05 2010-09-08 深圳市万泉河科技有限公司 Fast searching method for files, fast searching method for mass talent hiring on Internet and system
CN102023989A (en) * 2009-09-23 2011-04-20 阿里巴巴集团控股有限公司 Information retrieval method and system thereof
CN102231168A (en) * 2011-07-29 2011-11-02 前锦网络信息技术(上海)有限公司 Method for quickly retrieving resume from resume database
CN107145584A (en) * 2017-05-10 2017-09-08 西南科技大学 A kind of resume analytic method based on n gram models
CN107563725A (en) * 2017-08-25 2018-01-09 浙江网新恒天软件有限公司 A kind of recruitment system for optimizing cumbersome personnel recruitment process

Also Published As

Publication number Publication date
WO2019227585A1 (en) 2019-12-05
CN108932294A (en) 2018-12-04

Similar Documents

Publication Publication Date Title
CN108932294B (en) Resume data processing method, device, equipment and storage medium based on index
CN108874928B (en) Resume data information analysis processing method, device, equipment and storage medium
CN109992645B (en) Data management system and method based on text data
WO2019091026A1 (en) Knowledge base document rapid search method, application server, and computer readable storage medium
CN110795919B (en) Form extraction method, device, equipment and medium in PDF document
CN110321470B (en) Document processing method, device, computer equipment and storage medium
CN111680634B (en) Document file processing method, device, computer equipment and storage medium
CN110851598B (en) Text classification method and device, terminal equipment and storage medium
CN107085583B (en) Electronic document management method and device based on content
CN108932218B (en) Instance extension method, device, equipment and medium
CN111177532A (en) Vertical search method, device, computer system and readable storage medium
CN110427612B (en) Entity disambiguation method, device, equipment and storage medium based on multiple languages
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN107085568B (en) Text similarity distinguishing method and device
CN111460131A (en) Method, device and equipment for extracting official document abstract and computer readable storage medium
CN113407785B (en) Data processing method and system based on distributed storage system
CN110909123B (en) Data extraction method and device, terminal equipment and storage medium
US20180089335A1 (en) Indication of search result
CN109933502B (en) Electronic device, user operation record processing method and storage medium
US20170060841A1 (en) Text Extraction
CN114722137A (en) Security policy configuration method and device based on sensitive data identification and electronic equipment
CN115687655A (en) PDF document-based knowledge graph construction method, system, equipment and storage medium
CN112685475A (en) Report query method and device, computer equipment and storage medium
CN111078839A (en) Structured processing method and processing device for referee document
CN112364068A (en) Course label generation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant