CN111669451A - Private mailbox judgment method and judgment device - Google Patents

Private mailbox judgment method and judgment device Download PDF

Info

Publication number
CN111669451A
CN111669451A CN201910173126.1A CN201910173126A CN111669451A CN 111669451 A CN111669451 A CN 111669451A CN 201910173126 A CN201910173126 A CN 201910173126A CN 111669451 A CN111669451 A CN 111669451A
Authority
CN
China
Prior art keywords
character
inbox
prefix
sender
outbox
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910173126.1A
Other languages
Chinese (zh)
Other versions
CN111669451B (en
Inventor
马敏
胡泽柱
黄丽诗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201910173126.1A priority Critical patent/CN111669451B/en
Publication of CN111669451A publication Critical patent/CN111669451A/en
Application granted granted Critical
Publication of CN111669451B publication Critical patent/CN111669451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/30Types of network names
    • H04L2101/37E-mail addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/30Types of network names
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/30Types of network names
    • H04L2101/35Types of network names containing special prefixes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a private mailbox judging method and a private mailbox judging device, which comprise an inbox, an outbox, a name of an outbox and a mobile phone number of the outbox, wherein the inbox is used for storing a mail; preprocessing the names of the inbox, the outbox and the sender; respectively calculating the class editing distance between the inbox prefix character and the outbox prefix character and between the outbox name extension character and the number of digits of coincidence of the inbox prefix character and the mobile phone number of the outbox; and calculating the similarity ratio, judging whether the ratio is not less than a set value, and if so, determining that the inbox is a private mailbox of the current sender. According to the technical scheme provided by the embodiment of the application, the technology for identifying the private mailbox when the mail is sent out to the private mailbox is provided, wherein the association degree of the inbox and the sender is judged by comparing the inbox with the sender mailbox, the sender name information, the sender mobile phone number and the like, and whether the inbox is the private mailbox of the sender is further judged.

Description

Private mailbox judgment method and judgment device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a private mailbox.
Background
In the management process of an enterprise, an event of internal data leakage caused by mail outgoing generally occurs, and the behavior that the employee sends the internal data of the enterprise to the private mailbox is difficult to prevent in time due to the fact that the private mailbox of the employee cannot be obtained, so that it is important to judge whether the mailbox used by the employee is the private mailbox in time.
Disclosure of Invention
In view of the above-mentioned drawbacks and deficiencies of the prior art, it is desirable to provide a private mailbox judging method and device.
In a first aspect, a method for determining a private mailbox is provided, which includes the steps of:
acquiring an inbox, an outbox, the name of an addresser and the mobile phone number of the addresser;
preprocessing the names of the inbox, the outbox and the sender to obtain inbox prefix characters, outbox prefix characters and sender name extension characters;
respectively calculating the class editing distance between the inbox prefix character and the sender prefix character, the sender name extension character and the sender mobile phone number, and the number of digits of the inbox prefix character coinciding with the sender mobile phone number;
calculating similarity ratios according to the class editing distance and the sum of the inbox prefix character, the sender name extension character and the sender mobile phone number length, and determining the similarity ratio according to the number of coincident digits of the inbox prefix character and the sender mobile phone number extension character;
and judging whether the similarity ratio is not less than a set value, if so, determining that the inbox is a private mailbox of the current sender.
In a second aspect, there is provided a private mailbox judging apparatus comprising:
the acquisition unit is used for acquiring the inbox, the outbox, the name of the addresser and the mobile phone number of the addresser;
the preprocessing unit is used for preprocessing the names of the inbox, the outbox and the sender to obtain inbox prefix characters, outbox prefix characters and sender name extension characters;
the first calculation unit is used for calculating the class editing distance between the inbox prefix character and the sender prefix character, between the sender name extension character and between the inbox prefix character and the sender mobile phone number;
the second calculation unit is used for calculating the number of digits of the inbox prefix character which is superposed with the sender mobile phone number;
a third calculating unit, configured to calculate a similarity ratio according to the class editing distance and the inbox prefix character, and a sum of the outbox prefix character, the addresser name extension character, and the addresser mobile phone number length;
the fourth calculation unit determines the similarity ratio according to the superposed digits of the inbox prefix characters and the sender mobile phone number extension characters;
and the private mailbox determining unit is used for judging whether the similarity ratio is not less than a set value, and if so, determining that the inbox is the private mailbox of the current sender.
According to the technical scheme provided by the embodiment of the application, the technology for identifying the private mailbox when the mail is sent out to the private mailbox is provided, wherein the association degree of the inbox and the sender is judged by comparing the inbox with the sender mailbox, the sender name information, the sender mobile phone number and the like, and whether the inbox is the private mailbox of the sender is further judged.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flowchart illustrating a method for determining a private mailbox in the present embodiment;
fig. 2 is a schematic structural diagram of a private mailbox judging apparatus in the present embodiment;
fig. 3 is a schematic structural diagram of an apparatus according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, the present embodiment provides a method for determining a private mailbox, including the steps of:
acquiring an inbox, an outbox, the name of an addresser and the mobile phone number of the addresser;
preprocessing the names of the inbox, the outbox and the sender to obtain inbox prefix characters, outbox prefix characters and sender name extension characters;
respectively calculating the class editing distance between the inbox prefix character and the sender prefix character, the sender name extension character and the sender mobile phone number, and the number of digits of the inbox prefix character coinciding with the sender mobile phone number;
calculating similarity ratios according to the class editing distance and the sum of the inbox prefix character, the sender name extension character and the sender mobile phone number length, and determining the similarity ratio according to the number of coincident digits of the inbox prefix character and the sender mobile phone number extension character;
and judging whether the similarity ratio is not less than a set value, if so, determining that the inbox is a private mailbox of the current sender.
The judging method of the embodiment obtains the sender box, the inbox, the name of the sender and the mobile phone number of the sender when the sender sends the mail, pre-processes the obtained information, extracts the information needing to be operated and processed, the information needing to be extracted comprises the prefix of the sender box, the prefix of the inbox and the extension of the name of the sender, and compares the prefix of the inbox with the prefix of the sender box, the extension of the name of the sender and the mobile phone number of the sender respectively, wherein two comparison modes are adopted, firstly, the specified numerical value of an editing mode required for calculating a character string to be converted into another character string is calculated by calculating a class editing distance, secondly, the superposition number of the inbox and the mobile phone number of the sender is judged, the association degree of the prefix of the inbox and the sender is further judged, and if the association degree of the private information such as the name of the sender or the mobile phone number is larger, it is further determined whether the inbox is a private mailbox of the sender.
Further, the step of obtaining the extension character of the sender name comprises the steps of: expanding according to name pinyin of a sender, wherein the expanded characters at least comprise: the name can be combined into any combination, the name can be combined into any two words, the first letter can be combined into any combination, and a certain first letter and other full spellings can be combined into any combination.
According to observation, most people can set mailbox accounts according to their own names, so that in the embodiment, the names of senders need to be preprocessed, the names of employees are converted into pinyin to be expanded, the class editing distance between the expanded characters and the prefix of the inbox is solved, and whether the mailbox is a private mailbox or not is further determined; extensions to the employee's name pinyin include at least the aspects described above, such as sequential spellings, reverse spellings, mixed sequential spellings, sequential initials, reverse spellings, mixed sequential initials, and so forth, and in this embodiment provide a greater number of possibilities for name extension characters to cover as much as possible the conditions associated with the sender's name; for example: the name of the sender is Liu De Hua, the pinyin is called liude hua, and the name is expanded to obtain the following name expansion characters: dehualiu, huadeliu, liuhuade, ldh, dhl, hdl, lhd, liudh, dhliu, hdliu, dliuh, hliud, ldhua, lhead, huadl, huald, huadl, dlhua, dhual, ldeh, lhde, dehl, delh, hlde, hdel, etc., as full extension of pinyin as possible is performed to increase the accuracy of subsequent comparison calculations.
Further, the class edit distance is calculated as: and calculating the editing times of editing the inbox prefix character into the sender box prefix character, the sender name extension character or the sender mobile phone number, wherein the editing mode is insertion, deletion or replacement, the class editing distance is equal to the sum of the specified numerical values of all the editing modes, the specified numerical values of deletion and insertion are 1, and the specified numerical values of replacement are 2.
In this embodiment, the class edit distance is calculated according to the preprocessed characters, for example, there are two solutions for converting 'victoria' into 'victoriy' edit distance:
a) 'victoria' deletes 'i' and 'a', and then inserts 'y' to obtain 'victoriy', and the class editing distance is 3 after 2 times of deletion and 1 time of insertion;
b) 'victoria' deletes 'i' or 'a', and then replaces with 'y' to obtain 'victoriy', and the class edit distance is 3 after 1 deletion and 1 replacement.
Further, if the inbox prefix character or the outbox prefix character includes a plurality of numbers, the method further includes the steps of: preprocessing the inbox prefix character and/or the outbox prefix character, removing numbers and letters in the inbox prefix character, and respectively obtaining an inbox prefix first character and an inbox prefix second character; and/or the digits in the outbox prefix character, and acquiring an outbox prefix first character;
calculating a class edit distance between the inbox prefix character and the outbox prefix character further comprises: respectively calculating the class edit distance between the inbox prefix character and the outbox prefix character, the inbox prefix first character and the outbox prefix character, the inbox prefix character and the outbox prefix first character, and the inbox prefix first character and the outbox prefix first character, and taking the minimum value as the class edit distance between the inbox prefix character and the outbox prefix character;
calculating the class editing distance between the inbox prefix character and the sender mobile phone number as follows: and calculating the class editing distance between the second character of the inbox prefix and the sender mobile phone number.
In the actual operation process, when the inbox and the outbox are compared, the situation that the character string length is too large due to too many numbers and too many numbers existing in the inbox or outbox prefix characters is considered, which may cause the class editing distance to be large, and further affect the judgment of the private mailbox, is provided in this embodiment, a processing mode of the situation is provided, the mailbox prefix characters with more numbers are preprocessed to form two different characters, for example, the outbox is the liuuhua 12345, the characters liuuhua 12345 and the first character liuuhua are formed after preprocessing, the class editing distances between the two characters and the inbox are respectively calculated, and the minimum value of the class editing distance number is taken to reduce the influence caused by too many numbers;
in this embodiment, a comparison is performed between inbox prefix characters and a sender mobile phone number to determine a class editing distance or directly determine a similarity ratio, when there are letters and numbers in the inbox prefix, the inbox prefix from which the letters are removed is compared with the sender mobile phone number except for determining the number of coincident digits with the sender mobile phone number, the letter is removed as a second character of the inbox prefix, the class editing distance between the second character and the sender mobile phone number is calculated, the inbox and the sender mobile phone number can be compared comprehensively, and the situation of comparison omission is prevented.
Further, calculating similarity ratios according to the class edit distance and the inbox prefix character, the sender name extension character and the sum of the sender mobile phone number length, wherein the similarity ratios are as follows: (the inbox prefix character length + the outbox prefix character length-the class edit distance)/(the inbox prefix character length + the outbox prefix character length), or (the inbox prefix character length + the addressor name extension character length-the class edit distance)/(the inbox prefix character length + the addressor name extension character length), or (the inbox prefix character length + the addressor mobile phone number length-the class edit distance)/(the inbox prefix character length + the addressor mobile phone number length).
The class edit distances between different mailboxes and different names are different, and the small class edit distance cannot indicate that the mailbox is a private mailbox, so that, determining a similarity ratio based on the calculated class edit distance divided by the sum of characters used for calculation, the judgment of the private mailbox is performed according to the similarity ratio, in this embodiment, three ways are mainly adopted to compare the inbox prefix characters, the maximum ratio of the three modes can be fed back, the returned similarity ratio value range is 0-1, the possibility that the inbox is the private mailbox of the sender is represented, the set value is selected according to the actual situation, the inbox is determined to be the private mailbox by the similarity ratio more than or equal to the set value, in the embodiment, 0.8 is preferably taken as the set value, the private mailbox of the employee can be accurately identified, and the internal data of the enterprise can be prevented from being leaked in time.
In this embodiment, a formula for calculating the similarity ratio is given, and the similarity ratio is also called levenshtein. ratio (str1, str2) ═ sum-ldist)/sum. Where sum refers to the sum of the lengths of the str1 and str2 strings and ldist is the class edit distance. The similarity ratio for converting 'victoria' to 'victoriy' in the above example was calculated as: since the edit distances in these two cases are the same, the class edit distance is 3, the sum of the string lengths of the two cases is 8+7 to 15, and the similarity ratio between the two cases is obtained according to the levenstein ratio formula: (15-3)/15 ═ 0.8.
Further, determining the similarity ratio according to the number of coincident digits of the inbox prefix character and the sender mobile phone number comprises the following steps: and when the coincidence position number of the inbox prefix character and the sender mobile phone number extension character is more than or equal to 8, setting the similarity ratio to be 1.
In the embodiment, when the private mailbox is judged by the sender mobile phone number, the superposed digit of the inbox prefix character and the mobile phone number is determined, the superposed digit is any N continuous digits of the mobile phone number, if the superposed digit of the inbox and the sender mobile phone number is more, the similarity ratio can be basically determined to be higher, and different similarity ratios are set according to different superposed digits; in this embodiment, when the number of coincidence bits is preferably equal to or greater than eight bits, the similarity ratio is set to 1.
The private mailbox judgment method in the embodiment further comprises a precondition, before the private mailbox is judged, mailboxes such as normal clients and the like are firstly excluded through judgment of postfix characters of the inbox, so that the judgment range can be effectively narrowed, and the judgment efficiency and accuracy are improved.
As shown in fig. 2, the present embodiment further provides a private mailbox judging apparatus 200, including: an obtaining unit 201, configured to obtain an inbox, an outbox, a name of a sender, and a mobile phone number of the sender;
the preprocessing unit 202 is used for preprocessing the names of the inbox, the outbox and the sender to obtain inbox prefix characters, outbox prefix characters and sender name extension characters;
a first calculating unit 205, configured to calculate class editing distances between the inbox prefix character and the outbox prefix character, between the sender name extension character, and between the sender mobile phone numbers, respectively;
a second calculating unit 206, configured to calculate a number of digits where the inbox prefix character coincides with the sender mobile phone number;
a third calculating unit 207, configured to calculate a similarity ratio according to the class editing distance and the inbox prefix character, and a sum of the outbox prefix character, the addresser name extension character, and the addresser mobile phone number length;
a fourth calculating unit 208, determining a similarity ratio according to the superposed digits of the inbox prefix character and the sender mobile phone number extension character;
a private mailbox determination unit 209, configured to determine whether the similarity ratio is not less than a set value, and if so, determine that the inbox is a private mailbox of the current sender.
The working principle of the private mailbox judging apparatus 200 in this embodiment refers to the judging method shown in fig. 1, and is not described herein again.
Further, the preprocessing unit 202 further includes a character extension module 203 for performing extension according to the name pinyin of the sender, where the extension characters at least include: the name can be combined into any combination, the name can be combined into any two words, the first letter can be combined into any combination, and a certain first letter and other full spellings can be combined into any combination.
The working principle of the character extension module described in this embodiment refers to the foregoing specific method for obtaining the extension character of the sender name, and details are not described here.
Further, the specific calculation manner of the first calculation unit is as follows: calculating a class editing distance for editing the inbox prefix character into the sender box prefix character, the sender name extension character or the sender mobile phone number, wherein the editing mode is insertion, deletion or replacement, the class editing distance is equal to the sum of specified numerical values of all editing modes, the specified numerical values of deletion and insertion are 1, and the specified numerical values of replacement are 2.
The first calculating unit is configured to calculate a class editing distance, wherein editing is performed by inserting, deleting, or replacing, and specifically refer to the processing method described above.
Further, the preprocessing unit 202 further includes a character adjusting module 204, configured to process the inbox prefix character or the outbox prefix character when the inbox prefix character or the outbox prefix character includes a plurality of numbers, specifically including: removing numbers and letters in the inbox prefix characters, and respectively obtaining an inbox prefix first character and an inbox prefix second character; and/or the number in the outbox prefix character, and acquiring the first character of the outbox prefix.
In the embodiment, the character adjusting module adjusts the condition that too many numbers exist in prefix characters of an inbox or an outbox and the length of a character string is too large due to too many numbers, and the method specifically refers to the adjusting method; after adjustment, calculating a class edit distance through a first calculation unit, wherein the class edit distance between the inbox prefix character and the outbox prefix character, the inbox prefix first character and the outbox prefix character, the inbox prefix character and the outbox prefix first character, and the class edit distance between the inbox prefix first character and the outbox prefix first character are respectively calculated, and the minimum value is taken as the class edit distance between the inbox prefix character and the outbox prefix character;
calculating the class editing distance between the inbox prefix character and the sender mobile phone number as follows: and calculating the class editing distance between the second character of the inbox prefix and the sender mobile phone number.
Further, the third calculation unit calculates the formula as: a similarity ratio ═ either (the inbox prefix character length + the outbox prefix character length-the class edit distance)/(the inbox prefix character length + the outbox prefix character length), or (the inbox prefix character length + the sender name extension character length-the class edit distance)/(the inbox prefix character length + the sender name extension character length).
In this embodiment, the third calculating unit performs similar ratio calculation with reference to the calculating method described above, and details are not repeated.
Further, the step of determining the similarity ratio according to the number of coincident digits of the inbox prefix character and the sender mobile phone number specifically includes: and when the coincidence digit of the inbox prefix character and the sender mobile phone number is more than or equal to 8, setting the similarity ratio to be 1.
In this embodiment, a specific operation of determining the similar ratio by the fourth calculating unit to determine the number of coincident digits of the mobile phone number is described in detail, and the above determining method is referred to.
With further reference to fig. 3, as another aspect, the present application also provides an apparatus 300 including a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage section into a Random Access Memory (RAM) 303. In the RAM303, various programs and data necessary for the operation of the apparatus are also stored. The CPU 301, ROM 302, and RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 303 is also connected to bus 304.
The following components are connected to the I/O interface 303: an input portion 306 including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. The drive is also connected to the I/O interface 303 as necessary. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.
In particular, according to an embodiment of the invention, the process described above with reference to the flowchart of fig. 1 may be implemented as a computer software program. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The above-described functions defined in the apparatus of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 301.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or a combination of any of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based apparatus that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the private mailbox judgment method as described in the above embodiments.
For example, the electronic device may implement the steps as shown in fig. 1: acquiring an inbox, an outbox, the name of an addresser and the mobile phone number of the addresser; preprocessing the names of the inbox, the outbox and the sender to obtain inbox prefix characters, outbox prefix characters and sender name extension characters; respectively calculating the class editing distance between the inbox prefix character and the outbox prefix character and between the outbox name extension character and the number of digits of the inbox prefix character and the number of the outbox mobile phone number; calculating similarity ratio according to the class editing distance and the sum of the length of the inbox prefix character and the length of the sender name extension character, determining the similarity ratio according to the number of superposed digits of the inbox prefix character and the sender mobile phone number extension character, and selecting the maximum value as the ratio of the inbox as a private mailbox; and judging whether the ratio is not less than a set value, if so, determining that the inbox is a private mailbox of the current sender.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A private mailbox judging method is characterized by comprising the following steps:
acquiring an inbox, an outbox, the name of an addresser and the mobile phone number of the addresser;
preprocessing the names of the inbox, the outbox and the sender to obtain inbox prefix characters, outbox prefix characters and sender name extension characters;
respectively calculating the class editing distance between the inbox prefix character and the sender prefix character, the sender name extension character and the sender mobile phone number, and the number of digits of the inbox prefix character coinciding with the sender mobile phone number;
calculating similarity ratios according to the class editing distance and the sum of the inbox prefix character, the sender name extension character and the sender mobile phone number length, and determining the similarity ratio according to the number of coincident digits of the inbox prefix character and the sender mobile phone number extension character;
and judging whether the similarity ratio is not less than a set value, if so, determining that the inbox is a private mailbox of the current sender.
2. The method of claim 1 wherein said step of obtaining an extension character of said sender's name comprises the steps of: expanding according to name pinyin of a sender, wherein the expanded characters at least comprise: the name can be combined into any combination, the name can be combined into any two words, the first letter can be combined into any combination, and a certain first letter and other full spellings can be combined into any combination.
3. A method according to claim 1, wherein said class edit distance is calculated as: calculating a class editing distance for editing the inbox prefix character into the sender box prefix character, the sender name extension character or the sender mobile phone number, wherein the editing mode is insertion, deletion or replacement, the class editing distance is equal to the sum of specified numerical values of all editing modes, the specified numerical values of deletion and insertion are 1, and the specified numerical values of replacement are 2.
4. The method of claim 1, wherein if said inbox prefix character or said outbox prefix character includes a plurality of digits, further comprising the steps of: preprocessing the inbox prefix character and/or the outbox prefix character, removing numbers and letters in the inbox prefix character, and respectively obtaining an inbox prefix first character and an inbox prefix second character; and/or the digits in the outbox prefix character, and acquiring an outbox prefix first character;
calculating a class edit distance between the inbox prefix character and the outbox prefix character further comprises: respectively calculating the class edit distance between the inbox prefix character and the outbox prefix character, the inbox prefix first character and the outbox prefix character, the inbox prefix character and the outbox prefix first character, and the inbox prefix first character and the outbox prefix first character, and taking the minimum value as the class edit distance between the inbox prefix character and the outbox prefix character;
calculating the class editing distance between the inbox prefix character and the sender mobile phone number as follows: and calculating the class editing distance between the second character of the inbox prefix and the sender mobile phone number.
5. The method as claimed in claim 2, wherein the step of calculating the similarity ratio between the class edit distance and the inbox prefix character and the sum of the outbox prefix character, the sender name extension character and the sender mobile phone number length respectively comprises: (the inbox prefix character length + the outbox prefix character length-the class edit distance)/(the inbox prefix character length + the outbox prefix character length), or (the inbox prefix character length + the addressor name extension character length-the class edit distance)/(the inbox prefix character length + the addressor name extension character length), or (the inbox prefix character length + the addressor mobile phone number length-the class edit distance)/(the inbox prefix character length + the addressor mobile phone number length).
6. A private mailbox judging apparatus, comprising:
the acquisition unit is used for acquiring the inbox, the outbox, the name of the addresser and the mobile phone number of the addresser;
the preprocessing unit is used for preprocessing the names of the inbox, the outbox and the sender to obtain inbox prefix characters, outbox prefix characters and sender name extension characters;
the first calculation unit is used for calculating the class editing distance between the inbox prefix character and the sender prefix character, between the sender name extension character and between the inbox prefix character and the sender mobile phone number;
the second calculation unit is used for calculating the number of digits of the inbox prefix character which is superposed with the sender mobile phone number;
a third calculating unit, configured to calculate a similarity ratio according to the class editing distance and the inbox prefix character, and a sum of the outbox prefix character, the addresser name extension character, and the addresser mobile phone number length;
the fourth calculation unit determines the similarity ratio according to the superposed digits of the inbox prefix characters and the sender mobile phone number extension characters;
and the private mailbox determining unit is used for judging whether the similarity ratio is not less than a set value, and if so, determining that the inbox is the private mailbox of the current sender.
7. The apparatus of claim 6, wherein the preprocessing unit further comprises a character expansion module for expanding according to a pinyin name of the sender, the expanded characters at least comprising: the name can be combined into any combination, the name can be combined into any two words, the first letter can be combined into any combination, and a certain first letter and other full spellings can be combined into any combination.
8. The private mailbox judgment device according to claim 6, wherein the first calculation unit specifically calculates the following way: calculating a class editing distance for editing the inbox prefix character into the sender box prefix character, the sender name extension character or the sender mobile phone number, wherein the editing mode is insertion, deletion or replacement, the class editing distance is equal to the sum of specified numerical values of all editing modes, the specified numerical values of deletion and insertion are 1, and the specified numerical values of replacement are 2.
9. The private mailbox judgment device as claimed in claim 6, wherein the preprocessing unit further comprises a character adjustment module, configured to process the inbox prefix character or the outbox prefix character when the inbox prefix character or the outbox prefix character includes a plurality of numbers, specifically comprising: removing numbers and letters in the inbox prefix characters, and respectively obtaining an inbox prefix first character and an inbox prefix second character; and/or the number in the outbox prefix character, and acquiring the first character of the outbox prefix.
10. The private mailbox judgment apparatus according to claim 6, wherein said third calculation unit calculates the formula as: a similarity ratio of (the inbox prefix character length + the outbox prefix character length-the class edit distance)/(the inbox prefix character length + the outbox prefix character length), or (the inbox prefix character length + the addresser name extension character length-the class edit distance)/(the inbox prefix character length + the addresser name extension character length), or (the inbox prefix character length + the addresser mobile phone number length-the class edit distance)/(the inbox prefix character length + the addresser mobile phone number length).
CN201910173126.1A 2019-03-07 2019-03-07 Private mailbox judgment method and judgment device Active CN111669451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910173126.1A CN111669451B (en) 2019-03-07 2019-03-07 Private mailbox judgment method and judgment device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910173126.1A CN111669451B (en) 2019-03-07 2019-03-07 Private mailbox judgment method and judgment device

Publications (2)

Publication Number Publication Date
CN111669451A true CN111669451A (en) 2020-09-15
CN111669451B CN111669451B (en) 2022-10-21

Family

ID=72382278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910173126.1A Active CN111669451B (en) 2019-03-07 2019-03-07 Private mailbox judgment method and judgment device

Country Status (1)

Country Link
CN (1) CN111669451B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069374A (en) * 2020-09-18 2020-12-11 中国工商银行股份有限公司 Method and device for identifying serial numbers of multiple clients in bank
CN113255324A (en) * 2021-03-09 2021-08-13 西安循数信息科技有限公司 Method for disambiguating inventor names in patent data
CN115099832A (en) * 2022-06-29 2022-09-23 广州华多网络科技有限公司 Abnormal user detection method and device, equipment, medium and product thereof

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101068217A (en) * 2006-06-16 2007-11-07 腾讯科技(深圳)有限公司 Method and device for simplifying E-mail operation
CN101978669A (en) * 2008-03-19 2011-02-16 网圣公司 System and method for analysis of electronic information dissemination events
CN104899267A (en) * 2015-05-22 2015-09-09 中国电子科技集团公司第二十八研究所 Integrated data mining method for similarity of accounts on social network sites
JP2017054533A (en) * 2016-11-04 2017-03-16 エヌ・ティ・ティ・ソフトウェア株式会社 Illegal mail determination device and program
US20170251006A1 (en) * 2016-02-25 2017-08-31 Verrafid LLC System for detecting fraudulent electronic communications impersonation, insider threats and attacks
CN107707745A (en) * 2017-09-25 2018-02-16 百度在线网络技术(北京)有限公司 Method and apparatus for extracting information
CN108256587A (en) * 2018-02-05 2018-07-06 武汉斗鱼网络科技有限公司 Determining method, apparatus, computer and the storage medium of a kind of similarity of character string

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101068217A (en) * 2006-06-16 2007-11-07 腾讯科技(深圳)有限公司 Method and device for simplifying E-mail operation
CN101978669A (en) * 2008-03-19 2011-02-16 网圣公司 System and method for analysis of electronic information dissemination events
CN104899267A (en) * 2015-05-22 2015-09-09 中国电子科技集团公司第二十八研究所 Integrated data mining method for similarity of accounts on social network sites
US20170251006A1 (en) * 2016-02-25 2017-08-31 Verrafid LLC System for detecting fraudulent electronic communications impersonation, insider threats and attacks
JP2017054533A (en) * 2016-11-04 2017-03-16 エヌ・ティ・ティ・ソフトウェア株式会社 Illegal mail determination device and program
CN107707745A (en) * 2017-09-25 2018-02-16 百度在线网络技术(北京)有限公司 Method and apparatus for extracting information
CN108256587A (en) * 2018-02-05 2018-07-06 武汉斗鱼网络科技有限公司 Determining method, apparatus, computer and the storage medium of a kind of similarity of character string

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069374A (en) * 2020-09-18 2020-12-11 中国工商银行股份有限公司 Method and device for identifying serial numbers of multiple clients in bank
CN112069374B (en) * 2020-09-18 2024-04-30 中国工商银行股份有限公司 Identification method and device for multiple customer numbers of bank
CN113255324A (en) * 2021-03-09 2021-08-13 西安循数信息科技有限公司 Method for disambiguating inventor names in patent data
CN115099832A (en) * 2022-06-29 2022-09-23 广州华多网络科技有限公司 Abnormal user detection method and device, equipment, medium and product thereof

Also Published As

Publication number Publication date
CN111669451B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN105389349B (en) Dictionary update method and device
CN111669451B (en) Private mailbox judgment method and judgment device
KR20100087356A (en) Document merge
CN103810212A (en) Automated database index creation method and system
CN112560453A (en) Voice information verification method and device, electronic equipment and medium
CN105976302A (en) Configurable data comparing method and system
CN110516057B (en) Petition question answering method and device
CN110609908A (en) Case serial-parallel method and device
CN111339743B (en) Account number generation method and device
CN111914057A (en) Method and device for detecting and filtering sensitive words of customer service system
CN117892355B (en) Multiparty data joint analysis method and system based on privacy protection
CN114706950A (en) Long text data retrieval method, device, equipment and storage medium
CN111177401A (en) Power grid free text knowledge extraction method
CN112597748A (en) Corpus generation method, apparatus, device and computer readable storage medium
CN109510904B (en) Method and system for detecting call center outbound record
CN116244386B (en) Identification method of entity association relation applied to multi-source heterogeneous data storage system
KR101291076B1 (en) Method and apparatus for determining spam document
CN106649108A (en) Generation method and device of test data
CN115204123B (en) Collaborative editing document analysis method, analysis device, and storage medium
CN108090084A (en) A kind of knowledge management method and system
CN112785335A (en) Data processing method and system for electronic government affair performance assessment system
JP2017010376A (en) Mart-less verification support system and mart-less verification support method
CN112989814B (en) Search map construction method, search device, search apparatus, and storage medium
CN117150215B (en) Assessment result determining method and device, electronic equipment and storage medium
CN117971819B (en) Management method and system for automatically collecting stream data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant