CN110413742B - Resume information duplication checking method, device, equipment and storage medium - Google Patents

Resume information duplication checking method, device, equipment and storage medium Download PDF

Info

Publication number
CN110413742B
CN110413742B CN201910729969.5A CN201910729969A CN110413742B CN 110413742 B CN110413742 B CN 110413742B CN 201910729969 A CN201910729969 A CN 201910729969A CN 110413742 B CN110413742 B CN 110413742B
Authority
CN
China
Prior art keywords
resume
repeated
checked
information
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910729969.5A
Other languages
Chinese (zh)
Other versions
CN110413742A (en
Inventor
张航
李成铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910729969.5A priority Critical patent/CN110413742B/en
Publication of CN110413742A publication Critical patent/CN110413742A/en
Application granted granted Critical
Publication of CN110413742B publication Critical patent/CN110413742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Abstract

The embodiment of the disclosure discloses a resume information duplicate checking method, device, equipment and storage medium. The method comprises the following steps: acquiring a resume to be checked containing set information; the setting information comprises education experience information and/or work experience information; matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules; filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain a target repeated resume; and acquiring the similarity between the target repeated resume and the to-be-checked repeated resume, sequencing the target repeated resume according to the similarity, and displaying a sequencing result. The resume information duplication checking method provided by the embodiment of the disclosure can accurately check the resume, thereby improving the accuracy of the resume information duplication checking.

Description

Resume information duplication checking method, device, equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of information duplication checking, and in particular relates to a method, a device, equipment and a storage medium for duplication checking of resume information.
Background
The phenomenon that the same talents are repeatedly delivered in different modes is quite common due to the increasingly diversified recruitment channels in the internet era. The traditional duplicate checking method mainly depends on basic information (such as name, mobile phone number, mailbox and the like) in the resume. If the newly delivered resume is the same as a person in the talent bank in terms of name, mobile phone and mailbox, the new talent is judged to be the same talent. This approach effectively filters normal repeat deliveries, but fails to detect special modifications by hunters or applicants.
Hunting heads sometimes have a willingness to modify the contact information of the courier in order to bypass the duplication checking mechanism of the recruiting enterprise, such as: using english names or aliases, using alternate contact means, etc. If the traditional duplicate checking method is used, the resume cannot be accurately subjected to duplicate removal.
Disclosure of Invention
The embodiment of the disclosure provides a resume information duplication checking method, device and equipment and a storage medium, so as to improve the accuracy of the resume information duplication checking.
In a first aspect, an embodiment of the present disclosure provides a resume information duplication checking method, including:
acquiring a resume to be checked containing set information; the setting information comprises education experience information and/or work experience information;
matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules;
filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain a target repeated resume;
and acquiring the similarity between the target repeated resume and the to-be-checked repeated resume, sequencing the target repeated resume according to the similarity, and displaying a sequencing result.
In a second aspect, an embodiment of the present disclosure further provides a resume information duplication checking apparatus, including:
the resume to be checked comprises a resume to be checked acquisition module, a resume checking module and a resume checking module, wherein the resume to be checked comprises set information; the setting information comprises education experience information and/or work experience information;
the initial repeated resume acquisition module is used for matching resumes in the database with the resumes to be checked to obtain initial repeated resumes meeting set matching rules; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules;
the target repeated resume acquisition module is used for filtering resumes which conflict with the set information of the resume to be checked in the initial repeated resumes to obtain target repeated resumes;
and the sequencing module is used for acquiring the similarity between the target repeated resume and the resume to be checked, sequencing the target repeated resume according to the similarity, and displaying a sequencing result.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
one or more processing devices;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processing devices, the one or more processing devices are enabled to implement the resume information duplication checking method according to the embodiment of the disclosure.
In a fourth aspect, this disclosed embodiment also discloses a computer readable medium, on which a computer program is stored, where the computer program, when executed by a processing device, implements a method for checking resume information according to this disclosed embodiment.
In the embodiment of the disclosure, firstly, a resume to be checked containing setting information is obtained; the setting information comprises education experience information and/or work experience information; matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules; then filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain a target repeated resume; and finally, acquiring the similarity between the target repeated resume and the resume to be checked, sequencing the target repeated resume according to the similarity, and displaying the sequencing result. According to the resume information duplication checking method provided by the embodiment of the disclosure, resumes in the database are matched with resumes to be duplicated according to the education experience matching rule and/or the work experience matching rule to obtain the initial repeated resumes, and the initial repeated resumes are filtered and sorted, so that the resumes can be accurately duplicated, and the accuracy of the duplication checking of the resume information is improved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a flowchart of a resume information duplication checking method in a first embodiment of the present disclosure;
FIG. 2 is a diagram illustrating details of an educational story matching rule and a work story matching rule in a first embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a resume information duplication checking apparatus in a second embodiment of the disclosure;
fig. 4 is a schematic structural diagram of an electronic device in a third embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units. [ ordinal numbers ]
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Example one
Fig. 1 is a flowchart of a resume information duplication checking method according to an embodiment of the present disclosure, where the embodiment of the present disclosure is applicable to a case of duplicating a resume, and the method may be executed by a resume information duplication checking device, where the device may be composed of hardware and/or software, and may be generally integrated in a device having a resume information duplication checking function, where the device may be an electronic device such as a server, a mobile terminal, or a server cluster. As shown in fig. 1, the method specifically comprises the following steps:
step 110, obtaining the resume to be checked containing the setting information.
Wherein the setting information includes education experience information and/or work experience information. The educational history information may include the name of the school attended, the specialty studied, the school calendar, and the time period attended (start time and graduation time), etc.; the work history information may include company name of employment, job title, work content description information, and time period of employment (start time and time of departure), and the like. In the embodiment of the present disclosure, the resume to be reviewed may be composed of structured data, and for example, table 1 is a structured resume.
TABLE 1
Figure GDA0003487020980000051
Figure GDA0003487020980000061
In the embodiment of the disclosure, the manner of acquiring the resume to be searched containing the setting information may be actively pulled from the recruitment website or receiving the resume pushed by the recruitment website.
And step 120, matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting the set matching rule.
Wherein the set matching rule comprises an education experience matching rule and/or a work experience matching rule. The educational experience matching rule and the work experience matching rule may be set by the developer according to the duplication checking requirement. The educational experience matching rules may include a first educational experience loose matching rule, a first educational experience strict matching rule, a second educational experience loose matching rule, and a second educational experience strict matching rule; the work experience matching rules may include a first work experience loose matching rule, a first work experience tight matching rule, a second work experience loose matching rule, and a second work experience tight matching rule.
FIG. 2 is a detailed view of an educational history matching rule and a work history matching rule according to an embodiment of the present disclosure, and as shown in FIG. 2, a first educational history loose matching rule may be a school name matching having an educational history; the first strict match rule for an educational story may be a match between a name of a school with a piece of educational story and a time period of reading; the second loose matching rule of the educational experience can be the matching of the school name and the school calendar of one educational experience; the second strict match rule for an educational story may be that there is a match for a school name, a reading session, and a scholarship of the educational story. The first work experience loose matching rule may be a company name match with a work experience; the first strict matching rule of the work experience can be that the company name and the employment time period of a work experience are matched; the second loose matching rule of the work experience is that the company name and the position of one work experience are matched; the second job experience exact match rule may be that the company name, time of employment period, and position of a job experience all match.
The set matching rule may be formed by a combination of an education story matching rule and a work story matching rule. In the embodiment of the present disclosure, the initial repeated resume may satisfy any one of the following matching rules: satisfying a second educational experience strict matching rule and a second work experience strict matching rule; satisfying a strict matching rule of the second work experience, a loose matching rule of the first education experience and a loose matching rule of the first work experience, or satisfying a strict matching rule of the second work experience and two loose matching rules of the first education experience (matching of school names with two education experiences); satisfying a strict matching rule of the second working experience and at least two loose matching rules of the first working experience (at least two working experiences have the same company name and one working time period is aligned); three first work experience loose match rules are satisfied (company names of three work experiences are the same).
Specifically, the education experience information and the work experience information of the resume in the database are matched with the education experience information and the work experience information of the resume to be checked, and the initial repeated resume determined by the resume meeting the set matching rule is obtained. And if the number of the initial repeated resumes exceeds a set threshold, selecting the initial repeated resumes with the set threshold number for recalling.
And step 130, filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain the target repeated resumes.
The conflict of the setting information may be understood as a conflict of the time lines of the setting information. The timeline conflict is that the setting information of the initial repeated resume and the resume to be checked in the same time period is different, for example: the company name in the initial repeated resume and the company name in the resume to be checked are different in the same time period, or the school name in the initial repeated resume and the school name in the resume to be checked are different in the same time period.
Specifically, the process of filtering the resume in the initial repeated resume that conflicts with the setting information of the resume to be checked to obtain the target repeated resume may be: traversing the initial repeated resume, and judging whether the time lines of the set information of the current repeated resume and the resume to be checked conflict or not when the current repeated resume is traversed; and if the conflict exists, acquiring the consistency probability of the current repeated resume and the set information of the repeated resume to be checked, and if the consistency probability is smaller than a set threshold, filtering the current repeated resume.
The consistency probability of the set information comprises the consistency probability of the school names in the education experience information and the consistency probability of the company names in the work experience information. The consistency probability may be calculated in such a manner that the consistency probability is 1 if the current repeated resume and the school name or company name in the resume to be queried have an inclusion relationship. If the two types of the text libraries do not have the inclusion relationship, the consistency probability is calculated according to the frequency of the two types of the text libraries. The calculation formula may be: p (w1, w2)/min (P (w1), P (w2)), wherein P represents the consistency probability, P (w1, w2) represents the frequency of the simultaneous occurrence of two school names or company names in the textbook, P (w1) represents the frequency of the occurrence of the school name or company name of the current repeating resume in the textbook, and P (w2) represents the frequency of the occurrence of the school name or company name of the repeating resume to be checked in the textbook.
And 140, acquiring the similarity between the target repeated resume and the to-be-checked repeated resume, sequencing the target repeated resume according to the similarity, and displaying a sequencing result.
Optionally, the manner of obtaining the similarity between the target repeated resume and the resume to be checked may be: acquiring text information of a target repeated resume and a resume to be checked; and calculating the similarity between the text information of the target repeated resume and the resume to be checked, and determining the text similarity as the similarity between the target repeated resume and the resume to be checked.
The text information is description information of the work experience, specifically description information of the work experience when the same company works. In the embodiment of the present disclosure, the way of calculating the similarity between the text information of the target repeated resume and the to-be-checked repeated resume may be: for the current target repeated resume, performing word segmentation processing on the text information of the current target repeated resume and the resume to be searched respectively to obtain two word sequences; acquiring the longest public subsequence of the two word sequences, and determining the length of the longest public subsequence; and calculating the similarity between the text information according to the length of the longest common subsequence and the lengths of the two word sequences.
Exemplarily, the text information of the current target repeating resume and the to-be-checked repeating resume is respectively subjected to word segmentation processing to obtain two word sequences as [ w1, w2,. ·, w _ n ] and [ v1, v2,. ·, v _ m ], and then the longest common subsequence of the two word sequences is calculated, and if the length of the common subsequence is L, the similarity is calculated by the following formula: and s is L/min (n, m).
Optionally, the manner of obtaining the similarity between the target repeated resume and the resume to be checked may be: and determining a matching score according to a set matching rule met by the target repeated resume and the resume to be checked, and determining the matching score as the similarity of the target repeated resume and the resume to be checked.
In the embodiment of the disclosure, the education experience matching rule and the work experience matching rule are scored in advance, and the scores in the education experience matching rule are sorted as follows: second educational experience strict matching rule > first educational experience strict matching rule > second educational experience loose matching rule > first educational experience loose matching rule; the scores in the job experience matching rules are ordered as: second job experience strict matching rules > first job experience strict matching rules > second job experience loose matching rules > first job experience loose matching rules. For example, the pre-scored score may be that the second job experience strictly matches the rule: 22, the first job is subject to strict matching rules: 20, the second job is subject to loose matching rules: 11, the first job experience relaxed matching rule: 10; the second education experiences strict matching rules: 14, the first education experiences strict matching rules: 12, second educational experience relaxed matching rules: 8, first education experience relaxed matching rules: 5.
specifically, after a set matching rule that the target repeated resume and the to-be-checked repeated resume meet is obtained, a matching score can be calculated. For example, assuming that the set matching rule satisfied by the target repeated resume and the to-be-checked repeated resume is that the second education experience strict matching rule and the second work experience strict matching rule are satisfied, the matching score is 22+ 14-36.
Optionally, the method for calculating the similarity between the text information of the target repeated resume and the to-be-checked repeated resume may be: and performing weighted calculation on the text similarity and the matching score to obtain the similarity between the text information of the target repeated resume and the text information of the to-be-checked repeated resume.
Specifically, the weighting calculation is performed according to the respective weights corresponding to the text similarity and the matching score. In the embodiment of the disclosure, the weight of the text similarity is smaller than the weight of the matching score.
Specifically, after the similarity between the target repeated resume and the to-be-checked repeated resume is obtained, the target repeated resumes are sorted according to the sequence of the similarity from high to low, and the sorting result is displayed.
According to the technical scheme of the embodiment of the disclosure, firstly, a resume to be checked containing set information is obtained; the setting information comprises education experience information and/or work experience information; matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules; then filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain a target repeated resume; and finally, acquiring the similarity between the target repeated resume and the resume to be checked, sequencing the target repeated resume according to the similarity, and displaying the sequencing result. According to the resume information duplication checking method provided by the embodiment of the disclosure, resumes in the database are matched with resumes to be duplicated according to the education experience matching rule and/or the work experience matching rule to obtain the initial repeated resumes, and the initial repeated resumes are filtered and sorted, so that the resumes can be accurately duplicated, and the accuracy of the duplication checking of the resume information is improved.
Optionally, the setting of the matching rule further includes a basic information matching rule, and the process of matching the resume in the database with the resume to be checked to obtain the initial repeated resume meeting the setting of the matching rule may further be: traversing the candidate resumes in the database; when the current candidate resume is traversed, matching the current candidate resume with the resume to be checked according to the basic information matching rule, and if the current candidate resume meets the basic information matching rule, determining the current candidate resume as an initial repeated resume; and if the current candidate resume does not meet the basic information matching rule, matching the current candidate resume with the resume to be checked according to the education experience matching rule and/or the work experience matching rule, and if the current candidate resume meets the education experience matching rule and/or the work experience matching rule, determining the current candidate resume as the initial repeated resume.
The basic information may include a name, a phone number, a mailbox, and the like. The basic information matching rule may be that at least two pieces of the basic information are the same. If the current candidate resume does not meet the basic information matching rule and only one piece of basic information is the same, judging whether the current candidate resume and the resume to be checked meet the first work experience loose matching rule or the first education experience matching work, and if so, determining the current candidate resume as the initial repeated resume. And if not, matching the resume to be checked according to the education experience matching rule and/or the work experience matching rule.
Example two
Fig. 3 is a schematic structural diagram of a resume information duplication checking device according to a second embodiment of the present disclosure, as shown in fig. 3, the device includes: a to-be-reviewed resume acquisition module 310, an initial repeated resume acquisition module 320, a target repeated resume acquisition module 330, and a sorting module 340.
A resume to be checked obtaining module 310, configured to obtain a resume to be checked containing setting information; the setting information comprises education experience information and/or work experience information;
the initial repeated resume obtaining module 320 is configured to match the resumes in the database with the resumes to be checked, and obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules;
a target repeated resume obtaining module 330, configured to filter resumes in the initial repeated resume that conflict with the setting information of the resume to be checked, so as to obtain a target repeated resume;
the sorting module 340 is configured to obtain similarity between the target repeated resume and the resume to be checked, sort the target repeated resume according to the similarity, and display a sorting result.
Optionally, the set matching rule further includes a basic information matching rule, and the initial repeated resume obtaining module 320 is further configured to:
traversing the candidate resumes in the database;
when the current candidate resume is traversed, matching the current candidate resume with the resume to be checked according to the basic information matching rule, and if the current candidate resume meets the basic information matching rule, determining the current candidate resume as an initial repeated resume;
and if the current candidate resume does not meet the basic information matching rule, matching the current candidate resume with the resume to be checked according to the education experience matching rule and/or the work experience matching rule, and if the current candidate resume meets the education experience matching rule and/or the work experience matching rule, determining the current candidate resume as the initial repeated resume.
Optionally, the target repeated resume obtaining module 330 is further configured to:
traversing the initial repeated resume, and judging whether the time lines of the set information of the current repeated resume and the resume to be checked conflict or not when the current repeated resume is traversed; the timeline conflict is that the current repeated resume and the resume to be checked have different set information in the same time period;
and if the conflict exists, acquiring the consistency probability of the current repeated resume and the set information of the repeated resume to be checked, and if the consistency probability is smaller than a set threshold, filtering the current repeated resume.
Optionally, the sorting module 340 is further configured to:
acquiring text information of a target repeated resume and a resume to be checked; wherein the text information is description information of the work experience;
and calculating the similarity between the text information of the target repeated resume and the resume to be checked, and determining the text similarity as the similarity between the target repeated resume and the resume to be checked.
Optionally, the sorting module 340 is further configured to: :
for the current target repeated resume, performing word segmentation processing on the text information of the current target repeated resume and the resume to be searched respectively to obtain two word sequences;
acquiring the longest public subsequence of the two word sequences, and determining the length of the longest public subsequence;
and calculating the similarity between the text information according to the length of the longest common subsequence and the lengths of the two word sequences.
Optionally, the sorting module 340 is further configured to:
and determining a matching score according to a set matching rule met by the target repeated resume and the resume to be checked, and determining the matching score as the similarity of the target repeated resume and the resume to be checked.
Optionally, the sorting module 340 is further configured to:
and performing weighted calculation on the text similarity and the matching score to obtain the similarity between the text information of the target repeated resume and the text information of the to-be-checked repeated resume.
EXAMPLE III
Referring now to fig. 4, a schematic diagram of an electronic device (e.g., the terminal device or the server of fig. 4) 400 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or from the storage device 408, or from the ROM 402. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 401.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a resume to be checked containing set information; the setting information comprises education experience information and/or work experience information; matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules; filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain a target repeated resume; and acquiring the similarity between the target repeated resume and the to-be-checked repeated resume, sequencing the target repeated resume according to the similarity, and displaying a sequencing result.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, an embodiment of the present disclosure provides a resume information duplication checking method, including:
acquiring a resume to be checked containing set information; the setting information comprises education experience information and/or work experience information;
matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules;
filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain a target repeated resume;
and acquiring the similarity between the target repeated resume and the to-be-checked repeated resume, sequencing the target repeated resume according to the similarity, and displaying a sequencing result.
Further, the setting of the matching rule further includes a basic information matching rule, matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting the setting of the matching rule, including:
traversing the candidate resumes in the database;
when the current candidate resume is traversed to the current candidate resume, matching the current candidate resume with the resume to be checked according to the basic information matching rule, and if the current candidate resume meets the basic information matching rule, determining the current candidate resume as an initial repeated resume;
and if the current candidate resume does not meet the basic information matching rule, matching the resume to be checked according to an education experience matching rule and/or a work experience matching rule, and if the current candidate resume meets the education experience matching rule and/or the work experience matching rule, determining the current candidate resume as an initial repeated resume.
Further, filtering the resume in the initial repeated resume which conflicts with the setting information of the resume to be checked to obtain a target repeated resume, including:
traversing the initial repeated resume, and when the initial repeated resume is traversed to the current repeated resume, judging whether the timelines of the current repeated resume and the setting information of the resume to be checked conflicts; the timeline conflict is that the current repeated resume and the to-be-checked repeated resume have different set information in the same time period;
and if the current repeated resume is in conflict with the to-be-checked repeated resume, acquiring the consistency probability of the set information of the current repeated resume and the to-be-checked repeated resume, and if the consistency probability is smaller than a set threshold value, filtering the current repeated resume.
Further, obtaining the similarity between the target repeated resume and the resume to be checked includes:
acquiring a target repeated resume and text information of the resume to be checked; wherein the text information is description information of the work experience;
and calculating the similarity between the target repeated resume and the text information of the resume to be checked, and determining the text similarity as the similarity between the target repeated resume and the resume to be checked.
Further, calculating the similarity between the target repeated resume and the text information of the resume to be checked, including:
for the current target repeated resume, performing word segmentation processing on the text information of the current target repeated resume and the resume to be searched respectively to obtain two word sequences;
acquiring the longest public subsequence of the two word sequences, and determining the length of the longest public subsequence;
and calculating the similarity between the text messages according to the length of the longest common subsequence and the lengths of the two word sequences.
Further, obtaining the similarity between the target repeated resume and the resume to be checked includes:
and determining a matching score according to a set matching rule met by the target repeated resume and the resume to be checked, and determining the matching score as the similarity of the target repeated resume and the resume to be checked.
Further, calculating the similarity between the target repeated resume and the text information of the resume to be checked, including:
and performing weighted calculation on the text similarity and the matching score to obtain the similarity between the text information of the target repeated resume and the to-be-checked repeated resume.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (9)

1. A resume information duplication checking method is characterized by comprising the following steps:
acquiring a resume to be checked containing set information; the setting information comprises education experience information and/or work experience information;
matching the resume in the database with the resume to be checked to obtain an initial repeated resume meeting a set matching rule; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules;
filtering the resume which conflicts with the set information of the resume to be checked in the initial repeated resumes to obtain a target repeated resume;
acquiring the similarity between the target repeated resume and the to-be-checked repeated resume, sequencing the target repeated resume according to the similarity, and displaying a sequencing result;
the setting matching rule further comprises a basic information matching rule, the resume in the database is matched with the resume to be checked, and an initial repeated resume meeting the setting matching rule is obtained, and the setting matching rule comprises the following steps:
traversing the candidate resumes in the database;
when the current candidate resume is traversed to the current candidate resume, matching the current candidate resume with the resume to be checked according to the basic information matching rule, and if the current candidate resume meets the basic information matching rule, determining the current candidate resume as an initial repeated resume;
and if the current candidate resume does not meet the basic information matching rule, matching the resume to be checked according to an education experience matching rule and/or a work experience matching rule, and if the current candidate resume meets the education experience matching rule and/or the work experience matching rule, determining the current candidate resume as an initial repeated resume.
2. The method according to claim 1, wherein filtering the resume in the initial repeated resume that conflicts with the setting information of the resume to be checked to obtain the target repeated resume comprises:
traversing the initial repeated resume, and when the initial repeated resume is traversed to the current repeated resume, judging whether the timelines of the current repeated resume and the setting information of the resume to be checked conflicts; the timeline conflict is that the current repeated resume and the to-be-checked repeated resume have different set information in the same time period;
and if the current repeated resume is in conflict with the to-be-checked repeated resume, acquiring the consistency probability of the set information of the current repeated resume and the to-be-checked repeated resume, and if the consistency probability is smaller than a set threshold value, filtering the current repeated resume.
3. The method according to claim 1, wherein obtaining the similarity between the target repeated resume and the resume to be reviewed comprises:
acquiring a target repeated resume and text information of the resume to be checked; wherein the text information is description information of the work experience;
and calculating the similarity between the target repeated resume and the text information of the resume to be checked, and determining the text similarity as the similarity between the target repeated resume and the resume to be checked.
4. The method of claim 3, wherein calculating the similarity between the text information of the target repeated resume and the resume to be reviewed comprises:
for the current target repeated resume, performing word segmentation processing on the text information of the current target repeated resume and the resume to be searched respectively to obtain two word sequences;
acquiring the longest public subsequence of the two word sequences, and determining the length of the longest public subsequence;
and calculating the similarity between the text messages according to the length of the longest common subsequence and the lengths of the two word sequences.
5. The method according to claim 1, wherein obtaining the similarity between the target repeated resume and the resume to be reviewed comprises:
and determining a matching score according to a set matching rule met by the target repeated resume and the resume to be checked, and determining the matching score as the similarity of the target repeated resume and the resume to be checked.
6. The method of claim 1, wherein calculating the similarity between the text information of the target repeated resume and the resume to be reviewed comprises:
performing weighted calculation on the text similarity and the matching score to obtain the similarity between the target repeated resume and the text information of the resume to be checked;
the text similarity is obtained by calculating the similarity between the obtained text information of the target repeated resume and the to-be-checked resume, and the matching score is calculated and obtained based on a set matching rule met by the target repeated resume and the to-be-checked resume.
7. A resume information duplication checking device is characterized by comprising:
the resume to be checked comprises a resume to be checked acquisition module, a resume checking module and a resume checking module, wherein the resume to be checked comprises set information; the setting information comprises education experience information and/or work experience information;
the initial repeated resume acquisition module is used for matching resumes in the database with the resumes to be checked to obtain initial repeated resumes meeting set matching rules; wherein the set matching rules comprise education experience matching rules and/or work experience matching rules;
the target repeated resume acquisition module is used for filtering resumes which conflict with the set information of the resume to be checked in the initial repeated resumes to obtain target repeated resumes;
the sequencing module is used for acquiring the similarity between the target repeated resume and the resume to be checked, sequencing the target repeated resume according to the similarity and displaying a sequencing result;
the setting matching rule further comprises a basic information matching rule, and the initial repeated resume obtaining module is further configured to:
traversing the candidate resumes in the database;
when the current candidate resume is traversed, matching the current candidate resume with the resume to be checked according to the basic information matching rule, and if the current candidate resume meets the basic information matching rule, determining the current candidate resume as an initial repeated resume;
and if the current candidate resume does not meet the basic information matching rule, matching the current candidate resume with the resume to be checked according to the education experience matching rule and/or the work experience matching rule, and if the current candidate resume meets the education experience matching rule and/or the work experience matching rule, determining the current candidate resume as the initial repeated resume.
8. An electronic device, characterized in that the electronic device comprises:
one or more processing devices;
storage means for storing one or more programs;
when executed by the one or more processing devices, cause the one or more processing devices to implement the resume information duplication checking method of any one of claims 1-6.
9. A computer-readable medium, on which a computer program is stored, which, when being executed by a processing device, implements a method for duplication checking resume information according to any one of claims 1 to 6.
CN201910729969.5A 2019-08-08 2019-08-08 Resume information duplication checking method, device, equipment and storage medium Active CN110413742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910729969.5A CN110413742B (en) 2019-08-08 2019-08-08 Resume information duplication checking method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910729969.5A CN110413742B (en) 2019-08-08 2019-08-08 Resume information duplication checking method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110413742A CN110413742A (en) 2019-11-05
CN110413742B true CN110413742B (en) 2022-03-29

Family

ID=68366574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910729969.5A Active CN110413742B (en) 2019-08-08 2019-08-08 Resume information duplication checking method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110413742B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11829386B2 (en) 2020-01-30 2023-11-28 HG Insights, Inc. Identifying anonymized resume corpus data pertaining to the same individual
CN111723269A (en) * 2020-06-28 2020-09-29 上海沃锐企业发展有限公司 Resume duplicate checking method
CN111753516B (en) * 2020-06-29 2024-04-16 平安国际智慧城市科技股份有限公司 Text check and repeat processing method and device, computer equipment and computer storage medium
CN112084302B (en) * 2020-08-24 2024-04-30 江苏易达捷信数字科技有限公司 Method, system, device and storage medium for detecting inventory data of cost file
CN112131859A (en) * 2020-08-25 2020-12-25 中央民族大学 Tibetan composition plagiarism detection prototype system
CN112785282A (en) * 2021-02-10 2021-05-11 中国工商银行股份有限公司 Resume recommendation method, device, computer system and computer-readable storage medium
US11599856B1 (en) 2022-01-24 2023-03-07 My Job Matcher, Inc. Apparatuses and methods for parsing and comparing video resume duplications

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177060A (en) * 2012-11-09 2013-06-26 国家外国专家局国外人才信息研究中心 Data searching and capturing method of information of a plurality of high-end talents
TW201709117A (en) * 2015-08-27 2017-03-01 All Chinese Internet Inc Method, system and software for evaluating job suitability, and method of lining up personnel relevance according to resumes using the same preventing enterprises from repeatedly editing their requirements for talent seeking
CN107993019A (en) * 2017-12-12 2018-05-04 北京字节跳动网络技术有限公司 A kind of resume appraisal procedure and device
CN109087071A (en) * 2018-08-10 2018-12-25 安徽网才信息技术股份有限公司 A method of recommending similar resume
CN109471924A (en) * 2018-10-18 2019-03-15 国云科技股份有限公司 A kind of identification Match Analysis of unisonance personnel resume of the same name
CN109472310A (en) * 2018-11-12 2019-03-15 深圳八爪网络科技有限公司 Determine the recognition methods and device that two parts of resumes are the identical talent
CN109740147A (en) * 2018-12-14 2019-05-10 国云科技股份有限公司 A kind of big quantity personnel resume duplicate removal Match Analysis
CN109902726A (en) * 2019-02-02 2019-06-18 天津字节跳动科技有限公司 Biographic information processing method and processing device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177060A (en) * 2012-11-09 2013-06-26 国家外国专家局国外人才信息研究中心 Data searching and capturing method of information of a plurality of high-end talents
TW201709117A (en) * 2015-08-27 2017-03-01 All Chinese Internet Inc Method, system and software for evaluating job suitability, and method of lining up personnel relevance according to resumes using the same preventing enterprises from repeatedly editing their requirements for talent seeking
CN107993019A (en) * 2017-12-12 2018-05-04 北京字节跳动网络技术有限公司 A kind of resume appraisal procedure and device
CN109087071A (en) * 2018-08-10 2018-12-25 安徽网才信息技术股份有限公司 A method of recommending similar resume
CN109471924A (en) * 2018-10-18 2019-03-15 国云科技股份有限公司 A kind of identification Match Analysis of unisonance personnel resume of the same name
CN109472310A (en) * 2018-11-12 2019-03-15 深圳八爪网络科技有限公司 Determine the recognition methods and device that two parts of resumes are the identical talent
CN109740147A (en) * 2018-12-14 2019-05-10 国云科技股份有限公司 A kind of big quantity personnel resume duplicate removal Match Analysis
CN109902726A (en) * 2019-02-02 2019-06-18 天津字节跳动科技有限公司 Biographic information processing method and processing device

Also Published As

Publication number Publication date
CN110413742A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN110413742B (en) Resume information duplication checking method, device, equipment and storage medium
CN110321958B (en) Training method of neural network model and video similarity determination method
CN110634047B (en) Method and device for recommending house resources, electronic equipment and storage medium
CN110598157A (en) Target information identification method, device, equipment and storage medium
CN114422267B (en) Flow detection method, device, equipment and medium
CN110634050B (en) Method, device, electronic equipment and storage medium for identifying house source type
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN110083677B (en) Contact person searching method, device, equipment and storage medium
CN110852720A (en) Document processing method, device, equipment and storage medium
CN112819512B (en) Text processing method, device, equipment and medium
CN112836128A (en) Information recommendation method, device, equipment and storage medium
CN111949837A (en) Information processing method, information processing apparatus, electronic device, and storage medium
CN113011169A (en) Conference summary processing method, device, equipment and medium
CN112148841A (en) Object classification and classification model construction method and device
CN110659208A (en) Test data set updating method and device
CN111382262A (en) Method and apparatus for outputting information
CN111090993A (en) Attribute alignment model training method and device
CN110633411A (en) Method and device for screening house resources, electronic equipment and storage medium
CN111339776B (en) Resume parsing method and device, electronic equipment and computer-readable storage medium
CN111475722B (en) Method and apparatus for transmitting information
CN111694985B (en) Search method, search device, electronic equipment and computer-readable storage medium
CN111382365B (en) Method and device for outputting information
CN110543491A (en) Search method, search device, electronic equipment and computer-readable storage medium
CN110598133A (en) Method, apparatus, electronic device, and computer-readable storage medium for determining an order of search items
CN111597441A (en) Information processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant