CN105183742A - Resume identification method - Google Patents

Resume identification method Download PDF

Info

Publication number
CN105183742A
CN105183742A CN201510321901.5A CN201510321901A CN105183742A CN 105183742 A CN105183742 A CN 105183742A CN 201510321901 A CN201510321901 A CN 201510321901A CN 105183742 A CN105183742 A CN 105183742A
Authority
CN
China
Prior art keywords
resume
key word
recognition methods
information
methods according
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510321901.5A
Other languages
Chinese (zh)
Inventor
蔡志旻
沈峰
王峰
邹阳
张海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Fujitsu Nanda Software Technology Co Ltd
Original Assignee
Nanjing Fujitsu Nanda Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Fujitsu Nanda Software Technology Co Ltd filed Critical Nanjing Fujitsu Nanda Software Technology Co Ltd
Priority to CN201510321901.5A priority Critical patent/CN105183742A/en
Publication of CN105183742A publication Critical patent/CN105183742A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a resume identification method which is characterized by comprising the following steps of a first step, setting all latent keywords in a resume; a second step, selecting a to-be-analyzed resume; a third step, performing preprocessing on the resume according to the preset keywords, and analyzing the keywords which are included in the resume; a fourth step, differentiating a simple information field and a complex information field according to the information of the keywords which are included in the resume; a fifth step, performing secondary analysis processing on the complicated information field, extracting sub-item information; and a sixth step, outputting the simple information field and the complex information field. The resume identification method can realize high-efficiency accurate extraction for resume information and furthermore has high extraction accuracy.

Description

A kind of resume recognition methods
Technical field
The present invention relates to a kind of text recognition method, be specifically related to a kind of resume recognition methods, the invention belongs to text identification field.
Background technology
Resume is the common text of a class.Functionally see, resume is its introduction of authors oneself, promotes oneself, finally reaches the important means of effective communication; From style of writing structure, it is a kind of semi-structured text.Such text application extensively, Numerous, therefore, realizing its information extraction efficiently, accurately becomes a urgent demand.On the one hand, from information extraction efficiency, artificial reading obviously can not meet current demand, and must utilize computer-related technologies; On the other hand, from the feasibility accurately extracted, according to the characteristic sum Text Information Extraction technology of semi-structured text, as the methods such as matching regular expressions, correlation analysis, statistics can make extraction result meet actual needs, it is feasible for namely realizing machine intelligence identification.But not yet there is the technology of the effective identification to resume in prior art.
Summary of the invention
For solving the deficiencies in the prior art, the object of the present invention is to provide a kind of resume recognition methods, be difficult to realize the technical matters to effective identification of resume to solve prior art.
In order to realize above-mentioned target, the present invention adopts following technical scheme:
a kind of resume recognition methods, it is characterized in that, comprise the steps:
Step one: all key words potential in setting resume;
Step 2: the resume selecting Water demand;
Step 3: the key word according to setting carries out pre-service to resume;
Step 4: according to the keyword message comprised in resume, distinguishes simple information territory and complex information territory;
Step 5: carry out secondary analysis process to complex information territory, extracts subitem information;
Step 6: simple information territory, complex information territory are exported.
Aforesaid a kind of resume recognition methods, it is characterized in that, in described step one, also comprise setting key word Conflict Analysis, described key word Conflict Analysis, for the treatment of when the plurality of positions of key word in resume occurs, judges the actual location of key word in resume.
Aforesaid a kind of resume recognition methods, it is characterized in that, in described step 3, adopt canonical matching way to analyze the key word comprised in resume.
Aforesaid a kind of resume recognition methods, it is characterized in that, in described step 3, if the plurality of positions of key word in resume occurs, according to key word Conflict Analysis, judge the actual location of key word in resume.
Aforesaid a kind of resume recognition methodsit is characterized in that, described key word Conflict Analysis comprises: if the plurality of positions of key word in resume occurs, for the position, everywhere that key word occurs, carry out text analyzing in the front and back of described position, when retrieval, there is the check information corresponding with this key word, if there is described check information, then judge the actual location of described position as this key word, if there is no described check information, then judge that described position is not the actual location of this key word.
Aforesaid a kind of resume recognition methods, it is characterized in that, in step 3, if obtain the key word in resume, then continue next step, if do not obtain key word in resume, then terminate analytic process.
Aforesaid a kind of resume recognition methods, it is characterized in that, described simple information territory comprises name, age, date of birth; Complex information territory comprises subitem, as working experience, project experiences.
Aforesaid a kind of resume recognition methods, it is characterized in that, carry out secondary analysis process and comprise: the key word that Analysis of Complex information field comprises to complex information territory, is secondary key by the keyword definition of acquisition, extracts the specifying information of secondary key and correspondence thereof.
Aforesaid a kind of resume recognition methods, it is characterized in that, the form of resume is any one in Word form/html format, PDF, txt form.
Aforesaid a kind of resume recognition methods, it is characterized in that, simple information territory and complex information territory are exported the XML format data for standard or JSON formatted data.
Usefulness of the present invention is: the present invention can realize extracting the efficiently and accurately of biographic information, and the accuracy rate of extraction is high.
Accompanying drawing explanation
fig. 1it is a preferred implementing procedure of the present invention figure;
fig. 2it is the signal of resume in the present invention figure;
fig. 3it is the signal of keyword-dictionary in the present invention figure;
fig. 4for the resume recognition effect exported actual in the present invention figure;
Embodiment
Below in conjunction with accompanying drawingwith specific embodiment, concrete introduction is done to the present invention.
Reference fig. 1shown in, the present invention includes following steps:
Step one: all key words potential in setting resume.Key word stores with the form of dictionary.The resume that the present embodiment is recruited using portion as explanation, as Fig. 2shown in.Keyword-dictionary wherein with as Fig. 3shown in.Its key word comprises name, sex, date of birth, residence etc.In this step, can also set key word Conflict Analysis, described key word Conflict Analysis, for the treatment of when the plurality of positions of key word in resume occurs, judges the actual location of key word in resume.
Step 2: the resume selecting Water demand; The form of preferred resume is Word form or html format.Except conventional Word form, Web text based on XML is a kind of semi-structured text, XML is then a kind of semi-structured data descriptive language, which overcome the content that traditional Web descriptive language HTML is merely able to expression data, the architectural feature of the web data that is beyond expression, this is not enough to be not easy to Query semi-structured for data, progressively substitute HTML, become web data of new generation to describe with data exchange standard based on the Web text of XML in semi-structured text, format information is than more rich, and have fixing standard, so, this class text is in information extraction process, than other semi-structured texts, easier.
Step 3: the key word according to setting carries out pre-service to resume, analyzes the key word comprised in resume.In this step, the present invention is first split resume text.The target of segmentation is that one section of resume text is dismembered into many units.Basic composition unit due to semi-structured text is unit, so one section of text is resolved into a metasequence, is the key that machine carries out Text Information Extraction.What segmentation adopted is text segmentation based on regular expression.Text segmentation based on regular expression can with reference to existing techniques in realizing.
After completing text segmentation, text identification is carried out to resume.Key word and resume text are compared.Analyze in this resume and comprise which keyword message, if the plurality of positions of key word in resume occurs, judge the actual location of key word in resume.If the plurality of positions of key word in resume occurs, this means that this in resume key word of repeating has a position, place to be real keyword message, remaining position is plain text information.Carry out fuzzy analysis to these key words repeated, judge which is real key word, which is plain text information.Specifically, if the plurality of positions of key word in resume occurs, according to key word Conflict Analysis, the actual location of key word in resume is judged.
Provide a kind of key word Conflict Analysis below, it comprises: if the plurality of positions of key word in resume occurs, for the position, everywhere that key word occurs, text analyzing is carried out in the front and back of described position, there is the check information corresponding with this key word when retrieval, if there is described check information, then judge the actual location of described position as this key word, if there is no described check information, then judge that described position is not the actual location of this key word.Described check information is appear at before and after real key word in text, a kind of word with verify relation relevant to described real keyword message.For example, to the key word of potentially conflicting, text analyzing is carried out before and after its position, such as " working experience " occurs repeatedly, if " working experience " that a certain position occurs is real key word, so general followed by temporal information after it, as 2012-09 ~ 2013-02, so just screen " working experience " of this position, place for key word present position, if somewhere " working experience " be not below followed by the time, then judge that " working experience " of this position, place is as plain text information, instead of genuine key word.
Step 4: according to the keyword message comprised in resume, distinguishes simple information territory and complex information territory.Simple information territory comprises name, age, date of birth and so on, and complex information territory then comprises subitem, as working experience, project experiences and so on.The reason distinguishing simple information territory and complex information territory is that complex information territory comprises subitem, needs to be further analyzed subitem.The subitem of such as working experience has Reason for leaving, work unit etc.
Step 5: carry out secondary analysis process to complex information territory, extracts subitem information; Such as, Reason for leaving, work unit are extracted to above-mentioned working experience.In fact the form of above-mentioned steps three that still adopts of carrying out secondary analysis process processes.The subitem information extracted is called secondary key.Obtain specifying information corresponding to secondary key simultaneously.Comprising: the key word that Analysis of Complex information field comprises, is secondary key by the keyword definition of acquisition, extracts the specifying information of secondary key and correspondence thereof.
Step 6: simple information territory, complex information territory are exported.Output format can be XML format data or the JSON formatted data of standard.
In reality, the simple information territory after output, complex information territory as Fig. 4shown in.
More than show and describe ultimate principle of the present invention, principal character and advantage.The technician of the industry should understand, and above-described embodiment does not limit the present invention in any form, the technical scheme that the mode that all employings are equal to replacement or equivalent transformation obtains, and all drops in protection scope of the present invention.

Claims (10)

1. a resume recognition methods, is characterized in that, comprises the steps:
Step one: all key words potential in setting resume;
Step 2: the resume selecting Water demand;
Step 3: the key word according to setting carries out pre-service to resume;
Step 4: according to the keyword message comprised in resume, distinguishes simple information territory and complex information territory;
Step 5: carry out secondary analysis process to complex information territory, extracts subitem information;
Step 6: simple information territory, complex information territory are exported.
2. a kind of resume recognition methods according to claim 1, it is characterized in that, in described step one, also comprise setting key word Conflict Analysis, described key word Conflict Analysis, for the treatment of when the plurality of positions of key word in resume occurs, judges the actual location of key word in resume.
3. a kind of resume recognition methods according to claim 2, is characterized in that, in described step 3, adopts canonical matching way to analyze the key word comprised in resume.
4. a kind of resume recognition methods according to claim 3, is characterized in that, in described step 3, if the plurality of positions of key word in resume occurs, according to key word Conflict Analysis, judges the actual location of key word in resume.
5. a kind of resume recognition methods according to claim 4, it is characterized in that, described key word Conflict Analysis comprises: if the plurality of positions of key word in resume occurs, for the position, everywhere that key word occurs, text analyzing is carried out in the front and back of described position, the check information corresponding with this key word is there is when retrieval, if there is described check information, then judge the actual location of described position as this key word, if there is no described check information, then judge that described position is not the actual location of this key word.
6. a kind of resume recognition methods according to claim 5, is characterized in that, in step 3, if obtain the key word in resume, then continues next step, if do not obtain key word in resume, then terminates analytic process.
7. a kind of resume recognition methods according to claim 6, is characterized in that, described simple information territory comprises name, age, date of birth; Complex information territory comprises subitem, as working experience, project experiences.
8. a kind of resume recognition methods according to claim 7, it is characterized in that, carrying out secondary analysis process to complex information territory to comprise: the key word that Analysis of Complex information field comprises, is secondary key by the keyword definition of acquisition, extracts the specifying information of secondary key and correspondence thereof.
9. a kind of resume recognition methods according to claim 8, is characterized in that, the form of resume is any one in Word form/html format, PDF, txt form.
10. a kind of resume recognition methods according to claim 9, is characterized in that, simple information territory and complex information territory is exported the XML format data for standard or JSON formatted data.
CN201510321901.5A 2015-06-12 2015-06-12 Resume identification method Pending CN105183742A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510321901.5A CN105183742A (en) 2015-06-12 2015-06-12 Resume identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510321901.5A CN105183742A (en) 2015-06-12 2015-06-12 Resume identification method

Publications (1)

Publication Number Publication Date
CN105183742A true CN105183742A (en) 2015-12-23

Family

ID=54905826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510321901.5A Pending CN105183742A (en) 2015-06-12 2015-06-12 Resume identification method

Country Status (1)

Country Link
CN (1) CN105183742A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933783A (en) * 2015-12-31 2017-07-07 远光软件股份有限公司 A kind of method and device on the intelligent extraction date from text
CN107392143A (en) * 2017-07-20 2017-11-24 中国科学院软件研究所 A kind of resume accurate Analysis method based on SVM text classifications
CN109165295A (en) * 2018-09-27 2019-01-08 天涯社区网络科技股份有限公司 A kind of intelligence resume appraisal procedure
CN109271479A (en) * 2018-09-29 2019-01-25 广东润弘科技有限公司 A kind of resume structuring processing method
CN109948120A (en) * 2019-04-02 2019-06-28 深圳市前海欢雀科技有限公司 A kind of resume analytic method based on dualization
CN110020327A (en) * 2019-04-16 2019-07-16 上海大易云计算股份有限公司 A kind of resume resolution system based on vertical search engine
CN110222292A (en) * 2019-04-29 2019-09-10 毕昀 Website resume automatically parses method, computer equipment and storage medium
CN112214572A (en) * 2020-10-20 2021-01-12 济南浪潮高新科技投资发展有限公司 Method for secondarily extracting entities in resume analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043796A (en) * 2009-10-14 2011-05-04 无锡华润上华半导体有限公司 Information collecting method and device based on Internet
CN102999523A (en) * 2011-09-16 2013-03-27 陆敏 Intelligence digitizing method
CN103634420A (en) * 2013-11-22 2014-03-12 北京极客优才科技有限公司 Resume e-mail screening system and method
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043796A (en) * 2009-10-14 2011-05-04 无锡华润上华半导体有限公司 Information collecting method and device based on Internet
CN102999523A (en) * 2011-09-16 2013-03-27 陆敏 Intelligence digitizing method
CN103634420A (en) * 2013-11-22 2014-03-12 北京极客优才科技有限公司 Resume e-mail screening system and method
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933783A (en) * 2015-12-31 2017-07-07 远光软件股份有限公司 A kind of method and device on the intelligent extraction date from text
CN107392143A (en) * 2017-07-20 2017-11-24 中国科学院软件研究所 A kind of resume accurate Analysis method based on SVM text classifications
CN107392143B (en) * 2017-07-20 2019-12-27 中国科学院软件研究所 Resume accurate analysis method based on SVM text classification
CN109165295A (en) * 2018-09-27 2019-01-08 天涯社区网络科技股份有限公司 A kind of intelligence resume appraisal procedure
CN109271479A (en) * 2018-09-29 2019-01-25 广东润弘科技有限公司 A kind of resume structuring processing method
CN109948120A (en) * 2019-04-02 2019-06-28 深圳市前海欢雀科技有限公司 A kind of resume analytic method based on dualization
CN109948120B (en) * 2019-04-02 2023-03-14 深圳市前海欢雀科技有限公司 Binary resume parsing method
CN110020327A (en) * 2019-04-16 2019-07-16 上海大易云计算股份有限公司 A kind of resume resolution system based on vertical search engine
CN110222292A (en) * 2019-04-29 2019-09-10 毕昀 Website resume automatically parses method, computer equipment and storage medium
CN112214572A (en) * 2020-10-20 2021-01-12 济南浪潮高新科技投资发展有限公司 Method for secondarily extracting entities in resume analysis
CN112214572B (en) * 2020-10-20 2022-11-01 山东浪潮科学研究院有限公司 Method for secondarily extracting entities in resume analysis

Similar Documents

Publication Publication Date Title
CN105183742A (en) Resume identification method
CN105975499B (en) A kind of text subject detection method and system
CN104598535B (en) A kind of event extraction method based on maximum entropy
CN104572958A (en) Event extraction based sensitive information monitoring method
CN103294664A (en) Method and system for discovering new words in open fields
CN106777957B (en) The new method of biomedical more ginseng event extractions on unbalanced dataset
CN111309910A (en) Text information mining method and device
CN102270212A (en) User interest feature extraction method based on hidden semi-Markov model
CN108664474A (en) A kind of resume analytic method based on deep learning
CN110929520B (en) Unnamed entity object extraction method and device, electronic equipment and storage medium
CN106372053B (en) Syntactic analysis method and device
CN103838796A (en) Webpage structured information extraction method
CN110032649A (en) Relation extraction method and device between a kind of entity of TCM Document
CN105975475A (en) Chinese phrase string-based fine-grained thematic information extraction method
CN104462041A (en) Method for completely detecting hot event from beginning to end
CN101782897A (en) Chinese corpus labeling method based on events
CN103246641A (en) Text semantic information analyzing system and method
CN106503256B (en) A kind of hot information method for digging based on social networks document
CN105868408A (en) Machine learning based recruitment information analyzing system and method thereof
CN105389303B (en) A kind of automatic fusion method of heterologous corpus
CN104991909B (en) A kind of dictionary method for auto constructing for specific software history codes storehouse
CN110866172B (en) Data analysis method for block chain system
CN102737244B (en) Method for determining corresponding relationships between areas and annotations in annotated image
CN103646117A (en) Link-based bilingual parallel page identification method and system
CN109325159A (en) A kind of microblog hot event method for digging

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20151223