CN109063007A - A kind of exchange medium cleaning method and device - Google Patents

A kind of exchange medium cleaning method and device Download PDF

Info

Publication number
CN109063007A
CN109063007A CN201810751760.4A CN201810751760A CN109063007A CN 109063007 A CN109063007 A CN 109063007A CN 201810751760 A CN201810751760 A CN 201810751760A CN 109063007 A CN109063007 A CN 109063007A
Authority
CN
China
Prior art keywords
exchange medium
data
cleaned
target
medium
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810751760.4A
Other languages
Chinese (zh)
Inventor
叶珩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810751760.4A priority Critical patent/CN109063007A/en
Publication of CN109063007A publication Critical patent/CN109063007A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This specification embodiment discloses a kind of exchange medium cleaning method, comprising: obtains the data to be cleaned in one or more classification entrances, one or more of classification entrances are divided according to the type of exchange medium;Data to be cleaned are cleaned based on default cleaning rule, exchange medium to obtain the target of preset kind.This specification embodiment also discloses a kind of exchange medium cleaning device, electronic equipment and computer readable storage medium.

Description

A kind of exchange medium cleaning method and device
Technical field
This application involves information technology field more particularly to a kind of exchange medium cleaning methods and device.
Background technique
Internet platform generally can all provide report entrance, and user is reported by reporting that entrance can input report information The exchange medium of swindle people is generally included in information, wherein exchange medium can be the exchange way information of the network user, such as Cell-phone number, WeChat ID or microblogging number etc..Exchange medium has great importance for the trial for swindling case, can be used for clique's knowledge Not, merit is analyzed or is tried qualitative etc..However, due to report information be it is artificial it is self-service fill in, noise is more, it is difficult to directly make With.Therefore, it is necessary to a kind of exchange medium cleaning program be needed, to extract exchange medium from report information.
Summary of the invention
This specification embodiment provides a kind of exchange medium cleaning method and device, for solve in report information noise compared with It is more, it is difficult to the problem of being used directly as exchange medium.
This specification embodiment adopts the following technical solutions:
In a first aspect, providing a kind of exchange medium cleaning method, comprising: obtain in one or more classification entrances to Data are cleaned, one or more of classification entrances are divided according to the type of exchange medium;It is advised based on default cleaning Then data to be cleaned are cleaned, exchange medium to obtain the target of preset kind.
Second aspect provides a kind of exchange medium cleaning device, comprising: data acquisition module obtains one or more Data to be cleaned in classification entrance, one or more of classification entrances are divided according to the type of exchange medium; Data cleansing module cleans data to be cleaned based on default cleaning rule, is situated between with obtaining the target exchange of preset kind Matter.
The third aspect provides a kind of electronic equipment, comprising: memory, processor and is stored on the memory simultaneously The computer program that can be run on the processor realizes following behaviour when the computer program is executed by the processor Make: obtaining the data to be cleaned in one or more classification entrances, one or more of classification entrances are according to exchange medium Type divided;Data to be cleaned are cleaned based on default cleaning rule, are handed over obtaining the target of preset kind Flow medium.
Fourth aspect provides a kind of computer readable storage medium, is stored on the computer readable storage medium Computer program realizes following operation when the computer program is executed by processor: obtaining in one or more classification entrances Data to be cleaned, it is one or more of classification entrances be according to exchange medium type divided;Based on default clear It washes rule to clean data to be cleaned, exchanges medium to obtain the target of preset kind.
At least one above-mentioned technical solution that the embodiment of the present application uses, obtain in one or more classification entrances to clear Data are washed, data to be cleaned are cleaned based on default cleaning rule, the cleaning to data to be cleaned is realized, finally obtains The target of preset kind exchanges medium.In addition, type namely the above-mentioned preset kind of obtained target exchange medium, it can be with It is identical or different with said one or the corresponding media type that exchanges of multiple classification entrances, so as to more be handed over Flow medium increases the utilization efficiency to data to be cleaned such as report information.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is the exchange medium cleaning method flow diagram that one embodiment of this specification provides;
Fig. 2 is the exchange medium cleaning method flow diagram that another embodiment of this specification provides;
Fig. 3 is the exchange medium cleaning device structural schematic diagram that one embodiment of this specification provides;
Fig. 4 is the electronic equipment hardware structural diagram for realizing each embodiment of this specification.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall in the protection scope of this application.
As shown in Figure 1, one embodiment of this specification provides a kind of exchange medium cleaning method, include the following steps:
S110: obtaining the data to be cleaned in one or more classification entrances, one or more of classification entrances be by It is divided according to the type of exchange medium.
The exchange medium mentioned in this specification, or it is referred to as exchange medium number, it can be the exchange side of the network user Formula information such as cell-phone number, WeChat ID, QQ number, microblogging number and other some social tools number, or even can also include website Network address etc..
Above-mentioned different exchange medium, can specifically be divided into multiple and different types, for example, being divided into the exchange of cell-phone number type Medium, the exchange medium of WeChat ID type, exchange medium of website type etc..
For the ease of distinguishing different types of exchange medium, the network platform can generally provide different classification entrances.For Above-mentioned classification entrance can be the network platform and be divided according to the type of exchange medium, for example including cell-phone number fill in into Mouth, WeChat ID fills in entrance and QQ number fills in entrance etc..User can input report information by above-mentioned classification entrance, such as The cell-phone number that entrance fills in swindler is filled in cell-phone number, fills in the WeChat ID etc. of swindler in WeChat ID entrance.
It include exchange medium and its in actual application, in the report information that user is inputted by above-mentioned classification entrance His some noises, for example, user's report information for inputting in cell-phone number classification entrance is: " cell-phone number of swindle people * * * is 135******** ", wherein " cell-phone number of swindle people * * * is " in report information is the data for needing to wash;Offence reporting letter Cell-phone number " 135******** " in breath is then the desired exchange medium of this specification embodiment, this specification embodiment It can be and extract exchange medium from the report information filled in.
In actual application, there is a part classification entrance with what user inserted and exchange the not corresponding situation of medium, example Such as, user fills in the WeChat ID that entrance fills in swindler in cell-phone number;User fills in the QQ number etc. of swindler in WeChat ID entrance Deng;Either, user fills in the cell-phone number etc. of swindler in the classification entrance of universal class.
S120: cleaning data to be cleaned based on default cleaning rule, is situated between with obtaining the target exchange of preset kind Matter.
The step is based on default cleaning rule and cleans to data to be cleaned, can wash making an uproar in data to be cleaned Sound, to obtain the target exchange medium of preset kind.
Preset kind at this, such as can be cell-phone number type, WeChat ID type etc..
In view of the exchange medium classification accuracy rate that user self-help is filled in actual application is not high, for example, network is flat Platform clearly identifies cell-phone number classification one column of entrance in filling in classification, and counts discovery only less than the cell-phone number of one third Classify under entrance from cell-phone number, most cell-phone number is then filled under other classification entrances by user.
Therefore, in order to clean to obtain more exchange media, increase the utilization efficiency of report information, obtained in step S120 Preset kind target exchange medium in one or more classification entrance institutes for mentioning of the preset kind mentioned and step S110 The type of corresponding exchange medium may be the same or different, namely be not limited to the corresponding exchange of above-mentioned classification entrance Media type.For example, step S110 is in the classification entrance for the classification entrance and QQ number type for obtaining WeChat ID type Report information, and the target of preset kind obtained in step S120 exchange medium can be cell-phone number.
Step S120 is specifically based on default cleaning rule, and to data to be cleaned, (embodiment is carried out by taking report information as an example Introduce) when being cleaned, it can be cleaned based on the feature of exchange medium, for example, it is contemplated that being to most exchange medium It is made of letter, number, middle scribing line and underscore, therefore, step S120 specifically can be letter, the number in report information Data except word, middle scribing line and underscore are deleted, using remainder data as exchange medium.
Furthermore, it is contemplated that the type for the exchange medium that step S120 is obtained, step S120 may be used also after obtaining exchange medium It is identified with the type to exchange medium, finally obtains (target) exchange medium of preset kind.For example, will be all by number Constitute, length is 1 for first in 11 and front two, and second is not 0,1,2,6,9 exchange media recognition conduct Cell-phone number;In another example by be all made of number, length be 10-12 and front three or it is preceding four with preset area code phase It is matched to be used as fixed line phone number etc..
The exchange medium cleaning method that this specification embodiment provides obtains to be cleaned in one or more classification entrances Data clean data to be cleaned based on default cleaning rule, realize the cleaning to data to be cleaned, finally obtain The target of preset kind exchanges medium.In addition, type namely the above-mentioned preset kind of obtained target exchange medium, Ke Yihe Said one or the corresponding exchange media type of multiple classification entrances are identical or different, so as to more be exchanged Medium increases the utilization efficiency to data to be cleaned such as report information.
Preset kind and entrance of classifying for target exchange medium mentioned above be corresponding, and to exchange media type identical It is either different, it is existing for example, if only corresponding according to classification entrance so as to obtain " more " exchange media Exchange media type, obtain with classification entrance type it is identical exchange medium, then can omit because user is not according to classification The corresponding type of entrance and the exchange medium correctly filled in, such as the cell-phone number in cell-phone number classification entrance is only obtained, and neglect Omit other exchange media in cell-phone number classification entrance.And the present embodiment will not then ignore the exchange that these mistakes are filled in and be situated between Matter, therefore can obtain more exchanging medium.
Optionally, it mentions in the step S120 of above-described embodiment and data to be cleaned being carried out clearly based on default cleaning rule It washes, exchanges medium to obtain the target of preset kind, specifically can be and text analyzing is carried out to the data to be cleaned, identify Target to preset kind exchanges medium.
Preferably, training exchange medium can be obtained in a manner of first passing through text analyzing study in advance before which executes Identification model, then step S120 then can be directly by data input AC media recognition model to be cleaned, which knows Other model can use the mode of text analyzing, identify in natural language description it is possible that exchange the place of medium number, from And isolate the target exchange medium of preset kind.The target of preset kind is obtained by carrying out text analyzing to data to be cleaned Medium is exchanged, the recognition speed and recognition accuracy of exchange medium are improved.
Optionally, it mentions in the step S120 of above-described embodiment and data to be cleaned being carried out clearly based on default cleaning rule It washes, specifically can be the first designated character identified in the data to be cleaned;Delete the number before first designated character According to, or delete the data after first designated character.Specifically, the first designated character can be emitting under full-shape state Number, the colon under half-angle state, Chinese character " for " or be "Yes", English " is " etc., which has fully considered user's Language in-put habit, thus the accuracy of the exchange medium improved.
Preferably, it mentions in the step S120 of above-described embodiment and data to be cleaned being carried out clearly based on default cleaning rule It washes, the data before deleting first designated character, or after the data after deletion first designated character, may be used also To delete the data except the second designated character, i.e., only retain the second designated character.
Above-mentioned second designated character includes letter and/or number, that is, retains number;Or retain letter;Or retain number and Letter.
Alternatively, second designated character includes at least one and alphabetical and/or number of middle scribing line and underscore. Retain number and middle scribing line;Or retain letter and middle scribing line;Or retain number, letter and middle scribing line;Or retain number under Scribing line;Or retain letter and underscore;Or retain number, letter and underscore;Or retain number, middle scribing line and underscore;Or Retain scribing line and underscore in letter;Or retain scribing line and underscore in digital, letter.
Above-described embodiment has fully considered the character feature of exchange medium, to delete the exchange medium in report information Except data, further improve exchange medium accuracy.
After the target exchange medium for obtaining preset kind in several embodiments above, it can also include the following steps:
If the length of target exchange medium except pre-set interval, deletes the target exchange medium, at this Pre-set interval, specifically can be [5,20], if target exchange medium length less than 5 either be greater than 20, can delete Target exchanges medium.The embodiment has fully considered the length characteristic of exchange medium, to delete non-alternating medium, further The accuracy of the exchange medium improved.
And/or
If duplicate number of characters is more than preset value in the target exchange medium, the target exchange medium is deleted, Preferably, the preset value at this can be entirely same equal to the length namely target exchange medium of target exchange medium Duplicate character then can exchange medium with delete target.The embodiment has fully considered the input habit feature of user, deletes The exchange medium that user arbitrarily fills in, the accuracy of the exchange medium further improved.
After the target exchange medium for obtaining preset kind in several embodiments above, it can also include the following steps:
If the beginning of the target exchange medium is preset characters, the preset characters are deleted;And/or
If the end of the target exchange medium is preset characters, the preset characters are deleted.
For example, the beginning of target exchange medium is QQ (can be subsequent similar with case-insensitive), then can delete " QQ ", rest part is as exchange medium, either, if the end of target exchange medium is QQ, can delete " QQ ", Remaining part is allocated as exchanging medium.The embodiment has fully considered the character feature of exchange medium, to delete in report information Exchange medium except redundant data, further improve exchange medium accuracy.
That mentions in above-mentioned several embodiments cleans data to be cleaned based on default cleaning rule, to be preset The target of type exchanges medium, can also be the keyword extracted in the data to be cleaned;By the corresponding number of the keyword Medium is exchanged according to the target as preset kind.The keyword may include www, com, http etc., then can extract network address. Above-mentioned keyword can also include live streaming, mailbox, microblogging etc., and import address, microblogging address, mailbox is broadcast live to identify Location etc., the corresponding data of above-mentioned keyword all have an apparent feature, such as email address includes@symbol and suffix etc., Easy extraction obtains the corresponding data of keyword.
Preferably, after above-mentioned several embodiments obtain the target exchange medium of preset kind, the method also includes: really The fixed target exchanges Jie's qualitative attribution;If the target exchange medium is the exchange medium for swindling attribute, to swindle attribute The target exchange medium add label, so as to be attached with the matched swindle case of medium is exchanged, to be It swindles case trial and adds an important dimension.
In practical applications, it is contemplated that in addition to the external noise of exchange medium is more in the report information that user self-help is filled in, For example, being explained with Chinese, extra prefix suffix, the number even disorderly filled in, symbol, Chinese etc..In order to improve to obtain Exchange medium accuracy, this specification additionally provide it is a kind of exchange medium cleaning method specific embodiment, such as Fig. 2 institute Show, Fig. 2 is the exchange medium cleaning method that one embodiment of this specification provides, and is included the following steps:
S210: obtaining the report information in one or more classification entrances, one or more of classification entrances be according to What the type of exchange medium was divided.
Step S210 specifically may refer to the step S110 in embodiment shown in FIG. 1, and step S210 will be to be cleaned Data are illustrated by taking report information as an example.
S220: the first designated character in identification report information.
Optionally, the first designated character in the step may include that English half-angle colon, English full-shape colon, Chinese are complete Angle colon, and Chinese half-angle colon etc..
Preferably, the first designated character in the step is other than above-mentioned colon, can also include Chinese character " for " or It is "Yes", English " is " etc..
S230: the data before the first designated character are deleted.
Preferably, if both having included half-angle colon in report information further includes having full-shape colon, half can first be deleted Then data before the colon of angle delete the data before full-shape colon, remaining data executes step S240 in report information.
The step has fully considered the language in-put habit of user, thus the accuracy of the exchange medium improved.
S240: the data except the second designated character are deleted.
The second designated character in the step includes letter and/or number;Alternatively,
Second designated character includes at least one and alphabetical and/or number of middle scribing line and underscore.
About the specific syntagmatic of the second designated character, the description of embodiment above may refer to.
The data except the second designated character are deleted in the step, namely only retain the second designated character, delete report Other data in information, remaining data executes step S250 in report information.
The step has fully considered the character feature of exchange medium, to delete except the exchange medium in report information Data, further improve exchange medium accuracy.
S250: the preset characters of beginning and end are deleted.
Preset characters at this can be QQ (case-insensitive), and QQ is that noise is filled in common QQ number report.The step Rapid S250 specifically can be the QQ of the capital and small letter combination of removal beginning and end.
The step point considers the character feature of exchange medium, to delete except the exchange medium in report information Redundant data, the accuracy of the exchange medium further improved.
S260: retain exchange medium of the length in pre-set interval, delete the exchange medium for same repeat character (RPT).
If exchanging the length of medium except pre-set interval, the target exchange medium, the preset areas at this are deleted Between, it specifically can be [5,20], if the length of exchange medium is either greater than 20 less than 5, exchange medium can be deleted.
The bound of pre-set interval can be counted to obtain according to the name length rule of all kinds of exchange media, be filled at this Divide the length characteristic for considering exchange medium, so that non-alternating medium is deleted, the exchange medium further improved Accuracy.
If exchanging medium is entirely the same duplicate character, exchange medium can be deleted.It is all several for example, deleting The exchange medium of word 0, or delete the exchange medium etc. for being all same letter.
The input habit feature that user has been fully considered at this deletes the exchange medium that user arbitrarily fills in, further The accuracy of the exchange medium improved.
By the operation of above-mentioned multiple steps, exchange medium can be obtained, it can also be to exchange after obtaining exchange medium The type of medium is identified, the exchange medium of preset kind is finally obtained, such as obtains cell-phone number, WeChat ID etc..
The exchange medium cleaning method that this specification embodiment provides obtains to be cleaned in one or more classification entrances Data clean data to be cleaned based on default cleaning rule, realize the cleaning to data to be cleaned, finally obtain The target of preset kind exchanges medium.In addition, type namely the above-mentioned preset kind of obtained target exchange medium, Ke Yihe Said one or the corresponding exchange media type of multiple classification entrances are identical or different, so as to more be exchanged Medium increases the utilization efficiency to data to be cleaned such as report information.
One or more exchange media under entrance of classifying all are included in use by this specification embodiment, by insecure point Class entrance, carrying out identification by the type to exchange medium becomes available.Show to increase 25% report data by experiment Utilization rate about covers 10% case.
This specification embodiment can to because the language expression for adding prefix, the reasons such as suffix is inconsistent can not be according to friendship The matched case of flow medium can connect, to add an important dimension for case trial.
Preferably, before the step S220 of embodiment shown in Fig. 2 is executed, it can also will include website and other groups The exchange medium classification of medium Expressive Features comes out, namely mentions in one embodiment hereinbefore keyword is corresponding Data exchange medium as the target of preset kind.
Preferably, the embodiment step S230 can also be deleted into S260 and length be greater than preset value data It individually takes out, by artificial review, for example, length is more than 20 report information, removes outside website, usually fill in multiple exchanges The case where medium, passes through the accuracy for the exchange medium that manual analysis can be further improved.
Preferably, after embodiment shown in Fig. 2 obtains exchange medium, exchange Jie's qualitative attribution can also be determined; If the exchange medium is the exchange medium for swindling attribute, label is added to swindle the exchange medium of attribute, so as to Will be attached with the matched swindle case of medium is exchanged, to add an important dimension for swindle case trial
Above instructions part describes exchange medium cleaning method embodiment in detail, as shown in figure 3, this specification also mentions A kind of exchange medium cleaning device is supplied, as shown in figure 3, the device includes:
Data acquisition module 310 can be used for obtaining the data to be cleaned in one or more classification entrances, one Or multiple classification entrances are divided according to the type of exchange medium;
Data cleansing module 320 can be used for cleaning data to be cleaned based on default cleaning rule, pre- to obtain If the target of type exchanges medium.
The exchange medium cleaning device that this specification embodiment provides obtains to be cleaned in one or more classification entrances Data clean data to be cleaned based on default cleaning rule, realize the cleaning to data to be cleaned, finally obtain The target of preset kind exchanges medium.In addition, type namely the above-mentioned preset kind of obtained target exchange medium, Ke Yihe Said one or the corresponding exchange media type of multiple classification entrances are identical or different, so as to more be exchanged Medium increases the utilization efficiency to data to be cleaned such as report information.
Optionally, as one embodiment, data cleansing module 320 is based on default cleaning rule and carries out to data to be cleaned Cleaning, with obtain preset kind target exchange medium, specifically can be data cleansing module 320 to the data to be cleaned into Row text analyzing, identification obtain the target exchange medium of preset kind.
Optionally, as one embodiment, data cleansing module 320 is based on default cleaning rule and carries out to data to be cleaned Cleaning, comprising:
Identify the first designated character in the data to be cleaned;
The data before first designated character are deleted, or delete the data after first designated character.
Optionally, as one embodiment, data cleansing module 320 is based on default cleaning rule and carries out to data to be cleaned Cleaning, further includes:
The data in the data to be cleaned except second designated character are deleted, second designated character includes letter And/or number;Alternatively,
Second designated character includes at least one and alphabetical and/or number of middle scribing line and underscore.
Optionally, as one embodiment, described device further includes again cleaning module (not shown), be can be used for
If the length of the target exchange medium except pre-set interval, deletes the target exchange medium;And/or
If duplicate number of characters is more than preset value in the target exchange medium, the target exchange medium is deleted
Optionally, as one embodiment, the cleaning module again (not shown) be can be also used for
If the beginning of the target exchange medium is preset characters, the preset characters are deleted;And/or
If the end of the target exchange medium is preset characters, the preset characters are deleted.
Optionally, as one embodiment, data cleansing module 320 is based on default cleaning rule and carries out to data to be cleaned Cleaning exchanges medium to obtain the target of preset kind, specifically includes:
Extract the keyword in the data to be cleaned;
Medium is exchanged using the corresponding data of the keyword as the target of preset kind.
Optionally, as one embodiment, described device further includes label adding module (not shown), is determined for The target exchanges Jie's qualitative attribution;
If the target exchange medium is the exchange medium for swindling attribute, medium is exchanged to swindle the target of attribute Add label.
Corresponding this specification above is referred to according to the above-mentioned exchange medium cleaning device of this specification embodiment to implement The process of the exchange medium cleaning method of example, also, each unit/module in the exchange medium cleaning device and it is above-mentioned other Operation and/or function is respectively in order to realize the corresponding process in exchange medium cleaning method, for sake of simplicity, details are not described herein.
Below in conjunction with Fig. 4 detailed description according to the electronic equipment of this specification embodiment.With reference to Fig. 4, in hardware view, Electronic equipment includes processor, optionally, including internal bus, network interface, memory.Wherein, as shown in figure 4, memory It may include memory, such as high-speed random access memory (Random-Access Memory, RAM), it is also possible to further include non- Volatile memory (non-volatile memory), for example, at least 1 magnetic disk storage etc..Certainly, which may be used also It can include hardware required for realizing other business.
Processor, network interface and memory can be connected with each other by internal bus, which can be industry Standard architecture (Industry Standard Architecture, ISA) bus, Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The bus can be divided into address bus, data/address bus, Control bus etc..Only to be indicated with a four-headed arrow in Fig. 4, it is not intended that an only bus or one kind convenient for indicating The bus of type.
Memory, for storing program.Specifically, program may include program code, and said program code includes calculating Machine operational order.Memory may include memory and nonvolatile memory, and provide instruction and data to processor.
Processor is from the then operation into memory of corresponding computer program is read in nonvolatile memory, in logical layer The device of forwarding chat message is formed on face.Processor executes the program that memory is stored, and is specifically used for executing following behaviour Make:
The data to be cleaned in one or more classification entrances are obtained, one or more of classification entrances are according to exchange What the type of medium was divided;
Data to be cleaned are cleaned based on default cleaning rule, exchange medium to obtain the target of preset kind.
By obtaining the data to be cleaned in one or more classification entrances, based on default cleaning rule to data to be cleaned It is cleaned, realizes the cleaning to data to be cleaned, finally obtained the target exchange medium of preset kind.In addition, obtaining Target exchange medium type namely above-mentioned preset kind, can be with said one or multiple classification entrances are corresponding exchanges Media type is identical or different, so as to obtain more exchanging medium, increases and for example reports data to be cleaned The utilization efficiency of information.
The method that the method, apparatus that above-mentioned Fig. 1 and embodiment illustrated in fig. 2 disclose executes can be applied in processor, or Person is realized by processor.Processor may be a kind of IC chip, the processing capacity with signal.During realization, Each step of the above method can be completed by the integrated logic circuit of the hardware in processor or the instruction of software form.On The processor stated can be at general processor, including central processing unit (Central Processing Unit, CPU), network Manage device (Network Processor, NP) etc.;Can also be digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate Array (Field-Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or crystalline substance Body pipe logical device, discrete hardware components.May be implemented or execute disclosed each method in the embodiment of the present application, step and Logic diagram.General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with The step of method disclosed in the embodiment of the present application, can be embodied directly in hardware decoding processor and execute completion, or with decoding Hardware and software module combination in processor execute completion.Software module can be located at random access memory, flash memory, read-only storage In the storage medium of this fields such as device, programmable read only memory or electrically erasable programmable memory, register maturation.It should The step of storage medium is located at memory, and processor reads the information in memory, completes the above method in conjunction with its hardware.
The method that electronic equipment shown in Fig. 4 can also carry out Fig. 1 to Fig. 2, and realize exchange medium cleaning method Fig. 1 extremely The function of embodiment illustrated in fig. 2, details are not described herein for the embodiment of the present application.
Certainly, other than software realization mode, other implementations are not precluded in the electronic equipment of the application, for example patrol Collect device or the mode of software and hardware combining etc., that is to say, that the executing subject of following process flow is not limited to each patrol Unit is collected, hardware or logical device are also possible to.
This specification embodiment also provides a kind of computer readable storage medium, is stored on computer readable storage medium Computer program realizes following operation when the computer program is executed by processor:
The data to be cleaned in one or more classification entrances are obtained, one or more of classification entrances are according to exchange What the type of medium was divided;
Data to be cleaned are cleaned based on default cleaning rule, exchange medium to obtain the target of preset kind.
By obtaining the data to be cleaned in one or more classification entrances, based on default cleaning rule to data to be cleaned It is cleaned, realizes the cleaning to data to be cleaned, finally obtained the target exchange medium of preset kind.In addition, obtaining Target exchange medium type namely above-mentioned preset kind, can be with said one or multiple classification entrances are corresponding exchanges Media type is identical or different, so as to obtain more exchanging medium, increases and for example reports data to be cleaned The utilization efficiency of information.
Wherein, the computer readable storage medium, as read-only memory (Read-Only Memory, abbreviation ROM), Random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc..
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (11)

1. a kind of exchange medium cleaning method, comprising:
The data to be cleaned in one or more classification entrances are obtained, one or more of classification entrances are according to exchange medium Type divided;
Data to be cleaned are cleaned based on default cleaning rule, exchange medium to obtain the target of preset kind.
2. according to the method described in claim 1, described clean data to be cleaned based on default cleaning rule, to obtain The target of preset kind exchanges medium, comprising:
Text analyzing is carried out to the data to be cleaned, identification obtains the target exchange medium of preset kind.
3. according to the method described in claim 1, described clean data to be cleaned based on default cleaning rule, comprising:
Identify the first designated character in the data to be cleaned;
The data before first designated character are deleted, or delete the data after first designated character.
4. also being wrapped according to the method described in claim 3, described clean data to be cleaned based on default cleaning rule It includes:
Delete the data in the data to be cleaned except second designated character, second designated character include letter and/or Number;Alternatively,
Second designated character includes at least one and alphabetical and/or number of middle scribing line and underscore.
5. method according to any one of claims 1 to 4, after the target for obtaining preset kind exchanges medium, institute State method further include:
If the length of the target exchange medium except pre-set interval, deletes the target exchange medium;And/or
If duplicate number of characters is more than preset value in the target exchange medium, the target exchange medium is deleted
6. according to the method described in claim 1, the method is also wrapped after the target exchange medium for obtaining preset kind It includes:
If the beginning of the target exchange medium is preset characters, the preset characters are deleted;And/or
If the end of the target exchange medium is preset characters, the preset characters are deleted.
7. according to the method described in claim 1, described clean data to be cleaned based on default cleaning rule, to obtain The target of preset kind exchanges medium, comprising:
Extract the keyword in the data to be cleaned;
Medium is exchanged using the corresponding data of the keyword as the target of preset kind.
8. according to the method described in claim 1, the method is also wrapped after the target exchange medium for obtaining preset kind It includes:
Determine target exchange Jie's qualitative attribution;
If the target exchange medium is the exchange medium for swindling attribute, medium addition is exchanged to swindle the target of attribute Label.
9. a kind of exchange medium cleaning device, comprising:
Data acquisition module obtains the data to be cleaned in one or more classification entrances, one or more of classification entrances It is to be divided according to the type of exchange medium;
Data cleansing module cleans data to be cleaned based on default cleaning rule, is handed over obtaining the target of preset kind Flow medium.
10. a kind of electronic equipment, comprising: memory, processor and be stored on the memory and can be on the processor The computer program of operation realizes following operation when the computer program is executed by the processor:
The data to be cleaned in one or more classification entrances are obtained, one or more of classification entrances are according to exchange medium Type divided;
Data to be cleaned are cleaned based on default cleaning rule, exchange medium to obtain the target of preset kind.
11. a kind of computer readable storage medium, computer program, the meter are stored on the computer readable storage medium Following operation is realized when calculation machine program is executed by processor:
The data to be cleaned in one or more classification entrances are obtained, one or more of classification entrances are according to exchange medium Type divided;
Data to be cleaned are cleaned based on default cleaning rule, exchange medium to obtain the target of preset kind.
CN201810751760.4A 2018-07-10 2018-07-10 A kind of exchange medium cleaning method and device Pending CN109063007A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810751760.4A CN109063007A (en) 2018-07-10 2018-07-10 A kind of exchange medium cleaning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810751760.4A CN109063007A (en) 2018-07-10 2018-07-10 A kind of exchange medium cleaning method and device

Publications (1)

Publication Number Publication Date
CN109063007A true CN109063007A (en) 2018-12-21

Family

ID=64819420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810751760.4A Pending CN109063007A (en) 2018-07-10 2018-07-10 A kind of exchange medium cleaning method and device

Country Status (1)

Country Link
CN (1) CN109063007A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731976A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Method for finding and sorting private data in data table
US20160178414A1 (en) * 2014-12-17 2016-06-23 General Electric Company System and methods for addressing data quality issues in industrial data
CN106776951A (en) * 2016-12-02 2017-05-31 航天星图科技(北京)有限公司 One kind cleaning contrast storage method
CN107239573A (en) * 2017-06-28 2017-10-10 环球智达科技(北京)有限公司 Data filtering method
CN107690130A (en) * 2016-08-03 2018-02-13 ***通信集团公司 A kind of information identifying method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160178414A1 (en) * 2014-12-17 2016-06-23 General Electric Company System and methods for addressing data quality issues in industrial data
CN104731976A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Method for finding and sorting private data in data table
CN107690130A (en) * 2016-08-03 2018-02-13 ***通信集团公司 A kind of information identifying method and system
CN106776951A (en) * 2016-12-02 2017-05-31 航天星图科技(北京)有限公司 One kind cleaning contrast storage method
CN107239573A (en) * 2017-06-28 2017-10-10 环球智达科技(北京)有限公司 Data filtering method

Similar Documents

Publication Publication Date Title
CN106126558B (en) A kind of public sentiment monitoring method and device
US8782051B2 (en) System and method for text categorization based on ontologies
CN108984500B (en) Method for extracting amount information, terminal device and medium
CN108510313A (en) A kind of prediction of information transferring rate, information recommendation method and device
KR102015235B1 (en) Path Lookup Methods, Apparatus, Devices, and Nonvolatile Computer Storage Media
CN105550359B (en) Webpage sorting method and device based on vertical search and server
US20130198240A1 (en) Social Network Analysis
CN110458920A (en) A kind of handwriting erasing method and device
US10929615B2 (en) Tone analysis of legal documents
CN109508448A (en) Short information method, medium, device are generated based on long article and calculate equipment
CN103870541A (en) Social network user interest mining method and system
CN110888981A (en) Title-based document clustering method and device, terminal equipment and medium
CN107153702A (en) A kind of data processing method and device
CN109743309A (en) A kind of illegal request recognition methods, device and electronic equipment
CN109978044A (en) The training method and device of training data generation method and device and model
CN109299276A (en) One kind converting the text to word insertion, file classification method and device
CN110232156B (en) Information recommendation method and device based on long text
CN108021713B (en) Document clustering method and device
CN111310453B (en) User theme vectorization representation method and system based on deep learning
CN104715040A (en) Data classification method and device
CN105786929B (en) A kind of information monitoring method and device
CN107038193A (en) A kind for the treatment of method and apparatus of text message
CN109063007A (en) A kind of exchange medium cleaning method and device
CN110427492A (en) Generate the method, apparatus and electronic equipment of keywords database
CN106649315A (en) Method and device for processing path navigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221

RJ01 Rejection of invention patent application after publication