CN101446970B - Method for censoring and process text contents issued by user and device thereof - Google Patents

Method for censoring and process text contents issued by user and device thereof Download PDF

Info

Publication number
CN101446970B
CN101446970B CN2008102200098A CN200810220009A CN101446970B CN 101446970 B CN101446970 B CN 101446970B CN 2008102200098 A CN2008102200098 A CN 2008102200098A CN 200810220009 A CN200810220009 A CN 200810220009A CN 101446970 B CN101446970 B CN 101446970B
Authority
CN
China
Prior art keywords
content
user
text
issue
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008102200098A
Other languages
Chinese (zh)
Other versions
CN101446970A (en
Inventor
刘怀军
刘昌毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2008102200098A priority Critical patent/CN101446970B/en
Publication of CN101446970A publication Critical patent/CN101446970A/en
Application granted granted Critical
Publication of CN101446970B publication Critical patent/CN101446970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for censoring and processing text contents issued by a user and a device thereof. The method comprises the following steps: receiving the text contents issued by the user and judging user information according to a list rule database; if the user information neither belongs to a white list or a white rule nor a black list or a black rule, calculating a first similarity of a first characteristic vector of the text contents of the user and a second characteristic vector of pre-established spam sample contents, and judging whether the text contents issued by the user are qualified contents according to the first similarity, if the text contents are the qualified contents, publishing the text contents issued by the user; or sending the text contents issued by the user for manual censoring. The method and the device can help censor and filter the user information and the text contents issued by the user without total manual censoring of the information issued by the user, thus greatly avoid the manual censoring time and saving the human resources and correspondingly enhancing the censoring efficiency.

Description

A kind of method and device thereof that the content of text audit of user's issue is handled
Technical field
The present invention relates to the communications field, a kind of method and device thereof that the content of text audit of user's issue is handled.
Background technology
At present; Ask community (network address: http://wenwen.soso.com) be similar to that Baidu is known, Sina likes to ask etc. question and answer type service; The user can ask a question or answers the problem that other people propose at the page, has made things convenient for user's obtaining information to a great extent.Now, ask community and approximately have more than 20 ten thousand new problems generations every day, ask the information that the user submits in the community and all examine, need to consume a large amount of manual examination and verification time via manual work, the waste of manpower resource, and review efficiency is lower.
Summary of the invention
The invention provides a kind of method and device thereof that the content of text audit of user's issue is handled, it can save a large amount of manual examination and verification time, has improved review efficiency.
Technical scheme of the present invention is: a kind of method that the content of text audit of user's issue is handled comprises step:
Receive the content of text of user's issue, according to list rule database judges information, said list rule database comprises blacklist, black rule, white list and white rule;
If said user profile neither belongs to white list or white rule, do not belong to blacklist or black rule yet, then the content of text to said user carries out format conversion, extracts the notional word in the said content of text;
Calculate the contrary document frequency weighted value of each notional word in the document database of setting up in advance that extracts, obtain first proper vector of forming by said contrary document frequency weighted value;
First similarity of second proper vector of the spam samples content of calculating said first proper vector and setting up in advance; Whether the content of text of judging said user's issue according to said first similarity is qualified content; If qualified content is then announced the content of text of said user's issue.
The invention also discloses a kind of device that the content of text audit of user's issue is handled; It comprises; Auditing module; Be used to receive the content of text of user's issue, according to list rule database judges information, said list rule database comprises blacklist, black rule, white list and white rule;
Modular converter is used for neither belonging to white list or white rule in said user profile, and when also not belonging to blacklist or black rule, the content of text that said user is issued carries out format conversion, extracts the notional word in the said content of text;
Computing module is used for calculating the contrary document frequency weighted value of each notional word of extraction at the document database of setting up in advance, obtains first proper vector of being made up of said contrary document frequency weighted value; First similarity of second proper vector of the spam samples content of calculating said first proper vector simultaneously and setting up in advance;
Judge module is used for judging according to said first similarity whether said user's content of text is qualified content, if qualified content is then announced the content of text of said user's issue.
The method and apparatus that the content of text audit of user's issue is handled of the present invention; Only to neither belonging to white list or white rule; The content of text that does not also belong to user's issue of blacklist or black rule is examined filtration treatment; Can and the underproof content of text of user's issue be sent to manual work with the content of text of the user's issue that belongs to black rule and blacklist and examine, the qualified content of text that content of text that the user who belongs to white rule and white list is issued and user issue is directly announced; Need not all examine user's information releasing like this, can save a large amount of manual examination and verification time, save human resources, also improve review efficiency accordingly via manual work.
Description of drawings
Fig. 1 is the method flow diagram that the present invention handles the content of text audit of user's issue;
Fig. 2 is the structured flowchart () of the present invention to the device of the content of text audit processing of user's issue;
Fig. 3 is the structured flowchart (two) of the present invention to the device of the content of text audit processing of user's issue;
Fig. 4 is the structured flowchart (three) of the present invention to the device of the content of text audit processing of user's issue.
Embodiment
The method and apparatus that the content of text audit of user's issue is handled of the present invention; Only to neither belonging to white list or white rule; The content of text that does not also belong to user's issue of blacklist or black rule is examined filtration treatment; To belong to black rule and blacklist user's issue content of text and the underproof content of text of user's issue sent to manual work examine, the qualified content of text that content of text that the user who belongs to white rule and white list is issued and user issue is directly announced; Need not all examine user's information releasing like this, can save a large amount of manual examination and verification time, save human resources, also improve review efficiency accordingly via manual work.
Below in conjunction with accompanying drawing and specific embodiment the present invention is done a detailed elaboration.
The method that the content of text audit of user issue is handled of the present invention can be applied in to be asked in the question and answer type services such as community, Baidu are known, Sina likes to ask.
The method that the content of text audit of user's issue is handled of the present invention comprises step, like Fig. 1,
The content of text of S100, reception user issue.S101, according to list rule database judges information; Said list rule database comprises blacklist, black rule, white list and white rule.In one embodiment, blacklist can be to have big probability that the user list of junk information is provided, and white list is to have big probability that the user list of proper information is provided; Black rule is to set according to user's grade or credit rating, and its expression user's lower grade is or credit rating is very low, and white rule also is to set according to user's grade or credit rating, and its grade of representing the user is than higher or credit rating is very high.
If the said user profile of S102 neither belongs to white list or white rule, do not belong to blacklist or black rule yet, then the content of text to said user's issue carries out format conversion, extracts the notional word in the said content of text.In one embodiment, format conversion can comprise that said content of text is carried out the traditional font to be changed, remove the conversion in unnecessary space etc. to half-angle to simplified conversion, full-shape, and notional word is the core word of content of text, and function word is not as core word.
Contrary document frequency (IDF) weighted value of each notional word in the document database of setting up in advance that S103, calculating are extracted obtains first proper vector of being made up of said contrary document frequency (IDF) weighted value.In one embodiment, the document database can be made up of the content of text of all user's issues.Contrary document frequency (IDF) weighted value of each notional word that calculate to extract in the document database of setting up in advance, specifically can for: according to formula Wgt = t f × Lg U V Calculate contrary document frequency (IDF) weighted value of each notional word; Wherein wgt is contrary document frequency (IDF) weighted value, t fBe the frequency values that said notional word occurs in said user's content of text, U is the total number of documents in the said document database, and V is for the number of files of said notional word occurring.
S104, calculate said first proper vector and first similarity of second proper vector of the spam samples content set up in advance.Second proper vector of spam samples content can obtain in advance; It is the same with first proper vector that it obtains process; Take out a spam samples content,, extract notional word its format conversion; Calculate the contrary document frequency weighted value of each notional word in said document database then, form second proper vector by these weighted values.In one embodiment, calculate said first proper vector and first similarity of second proper vector of the spam samples content set up in advance, be specially: according to formula
Cos ( X , Y ) = Σ α = 1 , β = 1 α = m , β = n x α y β Σ α = 1 m x α 2 Σ β = 1 n y β 2
Cos(X,Y)
Calculate said first similarity; Wherein represent said first similarity,
X={x 1,K,x m},Y={y 1,K,y n}
Represent said first proper vector and second proper vector respectively.
S105, judge according to said first similarity whether the content of text of said user issue is qualified content.This determination methods has a variety of modes, can set according to user's needs.In one embodiment, can set a predetermined threshold,, otherwise judge that the content of text of this user's issue is qualified content if the value of said first similarity, can judge then that the content of text of this user's issue is defective content greater than this threshold value.
If qualified content, then carry out the content of text that step S107 announces said user's issue, the content of text of said user's issue is sent to manual work examine otherwise can carry out step S106 in one embodiment.
In one embodiment, belong to blacklist or black rule, the content of text of said user's issue is sent to manual work examine if can also comprise step S102 user profile after the step S101.If the said user profile of S103 belongs to white list or white rule, with the content of text of announcing said user's issue.
In order to judge comprehensively accurately that further whether the content that the user issues is qualified content, reduces the probability of erroneous judgement.In one embodiment; Neither belong to white list or white rule in judges information; When not belonging to blacklist or black rule again; Can also comprise step, detect second similarity of content of text with the feature database of setting up in advance that comprises phone number format, webpage format and Mars word form etc. of said user issue, judge according to this second similarity and first similarity whether the content of text that said user issues is qualified content.When whether the content of text of judges issue is qualified content; Can distribute weights respectively for first similarity and second similarity; Whether detect the weights sum greater than a predetermined value; If greater than a predetermined value, can judge that the content of text of this user's issue is defective content, otherwise be qualified content.Whether the value that also can only detect this second similarity in addition greater than a predetermined value, if greater than could judge directly that the content of text of this user's issue is defective content.
In order to reach same purpose, judge comprehensively accurately further whether the content of user's issue is qualified content, reduce the probability of erroneous judgement.In one embodiment; Neither belong to white list or white rule in judges information; When not belonging to blacklist or black rule again; Can also comprise step, add up the number of characters of the content of text of said user's issue, judge according to this number of characters, first similarity and second similarity whether the content of text of said user's issue is qualified content.When whether the content of text of judges issue is qualified content; Can distribute weights respectively for number of characters, first similarity and second similarity; Whether detect the weights sum greater than a predetermined value; If greater than a predetermined value, can judge that the content of text of this user's issue is defective content, otherwise be qualified content.Also can set a predetermined value with regard to this number of characters separately in addition, if when detecting number of characters less than this predetermined value, content of text that directly can the judges issue is defective content.
In order to reach same purpose, judge comprehensively accurately further whether the content of user's issue is qualified content, reduce the probability of erroneous judgement.In one embodiment; Neither belong to white list or white rule in judges information; When not belonging to blacklist or black rule again; Can also comprise step; The third phase that detects the content of text of said user's issue and the data bank of setting up in advance that can not announce words (this data bank is to some special words and short sentence or the set of interior perhaps other settings of requirement shielding at no distant date) judges like degree, said number of characters, first similarity and second similarity whether the content of text that said user issues is qualified content according to this third phase like degree.When whether the content of text of judges issue is qualified content; Can distribute weights respectively like degree, number of characters, first similarity and second similarity for third phase; Whether detect the weights sum greater than predetermined value; If greater than a predetermined value, can judge that the content of text of this user's issue is defective content, otherwise be qualified content.Also can detect this third phase in addition separately and seemingly whether spend greater than a predetermined value, if greater than, can judge that then the content of text of this user's issue is defective content.
The present invention has also disclosed a kind of device that the content of text audit of user's issue is handled, and like Fig. 2, it comprises auditing module, modular converter, computing module and the judge module that connects successively;
Auditing module is used to receive the content of text that the user issues, and according to list rule database judges information, said list rule database comprises blacklist, black rule, white list and white rule.In one embodiment, blacklist can be to have big probability that the user list of junk information is provided, and white list is to have big probability that the user list of proper information is provided; Black rule is to set according to user's grade or credit rating, and its expression user's lower grade is or credit rating is very low, and white rule also is to set according to user's grade or credit rating, and its grade of representing the user is than higher or credit rating is very high.
Modular converter is used for neither belonging to white list or white rule in said user profile, and when also not belonging to blacklist or black rule, the content of text that said user is issued carries out format conversion, extracts the notional word in the said content of text.In one embodiment, format conversion can comprise that said content of text is carried out body to be changed, remove the conversion in unnecessary space etc. to half-angle to simplified conversion, full-shape, and notional word is the core word of content of text, and function word is not as core word.
Computing module is used for calculating contrary document frequency (IDF) weighted value of each notional word of extraction at the document database of setting up in advance, obtains first proper vector of being made up of said contrary document frequency (IDF) weighted value; First similarity of second proper vector of the spam samples content of calculating said first proper vector simultaneously and setting up in advance.In one embodiment, the document database can be made up of the content of text of all user's issues.Contrary document frequency (IDF) weighted value of each notional word that calculate to extract in the document database of setting up in advance, specifically can for: according to formula Wgt = t f × Lg U V Calculate contrary document frequency (IDF) weighted value of each notional word; Wherein wgt is contrary document frequency (IDF) weighted value, t fBe the frequency values that said notional word occurs in said user's content of text, U is the total number of documents in the said document database, and V is for the number of files of said notional word occurring.Second proper vector of spam samples content can obtain in advance; It is the same with first proper vector that it obtains process; Take out a spam samples content,, extract notional word its format conversion; Calculate the contrary document frequency weighted value of each notional word in said document database then, form second proper vector by these weighted values.In one embodiment, calculate said first proper vector and first similarity of second proper vector of the spam samples content set up in advance, be specially: according to formula
Cos ( X , Y ) = Σ α = 1 , β = 1 α = m , β = n x α y β Σ α = 1 m x α 2 Σ β = 1 n y β 2
Cos(X,Y)
Calculate said first similarity; Wherein represent said first similarity,
X={x 1,K,x m},Y={y 1,K,y n}
Represent said first proper vector and second proper vector respectively.
Judge module is used for judging according to said first similarity whether said user's content of text is qualified content, if qualified content is then announced the content of text of said user's issue.In one embodiment, be defective content if judge said user's content of text, the content of text that then said judge module is issued said user sends to manual work and examines.
In one embodiment, said auditing module belongs to blacklist or black rule in user profile, the content of text of said user's issue is sent to manual work examine; Belong to white list or white rule in said user profile, with the content of text of announcing said user's issue.
In order to judge comprehensively accurately that further whether the content that the user issues is qualified content, reduces the probability of erroneous judgement.Like Fig. 3; Between said auditing module and said judge module, also be connected with detection module; Be used for neither belonging to white list or white rule in user profile; When not belonging to blacklist or black rule again, detect the content of text of said user's issue and second similarity of the feature database that comprises phone number format, webpage format and Mars word form of foundation in advance; And/or detect said user's content of text and the third phase of the data bank that can not announce words set up in advance like degree; And said second similarity and/or third phase sent to said judge module like degree, said judge module judges like degree whether the content of text that said user issues is qualified content according to said first similarity, second similarity and/or third phase.When whether the content of text of judges issue is qualified content; Can distribute weights respectively like degree for first similarity, second similarity and/or third phase; Whether detect the weights sum greater than predetermined value; If greater than a predetermined value, can judge that the content of text of this user's issue is defective content, otherwise be qualified content.
In order to reach identical purpose, judge comprehensively accurately further whether the content of user's issue is qualified content, reduce the probability of erroneous judgement.Like Fig. 4; Between said auditing module and said judge module, also be connected with statistical module; Be used for neither belonging to white list or white rule, when not belonging to blacklist or black rule again, add up the number of characters of the content of text of said user's issue in user profile; And this number of characters sent to said judge module, said judge module judges like degree whether the content of text that said user issues is qualified content according to this number of characters, said first similarity, second similarity and/or third phase.When whether the content of text of judges issue is qualified content; Can distribute weights respectively like degree for number of characters, first similarity, second similarity and/or third phase; Whether detect the weights sum greater than predetermined value; If greater than a predetermined value, can judge that the content of text of this user's issue is defective content, otherwise be qualified content.
In sum; The method and apparatus that the content of text audit of user's issue is handled of the present invention; Can examine filtration treatment to the content of text of user profile and user's issue; To belong to black rule and blacklist user's issue content of text and the underproof content of text of user's issue sent to manual work examine, the qualified content of text that content of text that the user who belongs to white rule and white list is issued and user issue is directly announced; Need not all examine user's information releasing like this, can save a large amount of manual examination and verification time, save human resources, also improve review efficiency accordingly via manual work.
Above-described embodiment of the present invention does not constitute the qualification to protection domain of the present invention.Any modification of within spirit of the present invention and principle, being done, be equal to replacement and improvement etc., all should be included within the claim protection domain of the present invention.

Claims (8)

1. the method that the content of text audit of user's issue is handled is characterized in that, comprises step:
Receive the content of text of user's issue, according to list rule database judges information, said list rule database comprises blacklist, black rule, white list and white rule;
If said user profile neither belongs to white list or white rule, do not belong to blacklist or black rule yet, then the content of text to said user's issue carries out format conversion, extracts the notional word in the said content of text, and notional word is the core word of content of text;
Calculate the contrary document frequency weighted value of each notional word in the document database of setting up in advance that extracts, obtain first proper vector of forming by said contrary document frequency weighted value;
First similarity of second characteristic vector of the spam samples content of calculating said first characteristic vector and setting up in advance; Whether the content of text of judging said user's issue based on said first similarity is qualified content; If qualified content is then announced the content of text of said user's issue;
Said first similarity is according to formula
Figure FDA0000140173500000011
Calculate, wherein, Cos (X, Y) said first similarity of expression, X={x 1..., x m, Y={y 1..., y nRepresent said first proper vector and second proper vector respectively.
2. the method that the content of text audit of user's issue is handled according to claim 1; It is characterized in that: neither belong to white list or white rule in said user profile; When not belonging to blacklist or black rule yet; Also comprise step; Second similarity of the feature database that comprises phone number format, webpage format and Mars word form that detects the content of text of said user's issue and set up in advance judges according to said second similarity and first similarity whether the content of text of said user's issue is qualified content.
3. the method that the content of text audit of user's issue is handled according to claim 2; It is characterized in that: neither belong to white list or white rule in said user profile; When not belonging to blacklist or black rule yet; Also comprise step, add up the number of characters of the content of text of said user's issue, judge according to this number of characters, first similarity and second similarity whether the content of text of said user's issue is qualified content.
4. the method that the content of text audit of user's issue is handled according to claim 3; It is characterized in that: neither belong to white list or white rule in said user profile; When not belonging to blacklist or black rule yet; Also comprise step; The third phase that comprises the data bank that can not announce words that detects the content of text of said user's issue and set up in advance judges like degree, said number of characters, first similarity and second similarity whether the content of text of said user's issue is qualified content according to this third phase like degree.
5. according to the described method that the content of text audit of user's issue is handled of the arbitrary claim of claim 1 to 4; It is characterized in that: the contrary document frequency weighted value of each notional word that said calculating is extracted in the document database of setting up in advance is specially: according to formula
Figure FDA0000140173500000021
Calculate the contrary document frequency weighted value of each notional word; Wherein wgt is contrary document frequency weighted value, t fBe the frequency values that said notional word occurs in said user's content of text, U is the total number of documents in the said document database, and V is for the number of files of said notional word occurring.
6. the content of text to user's issue is examined the device of handling, and it is characterized in that: comprise,
Auditing module is used to receive the content of text that the user issues, and according to list rule database judges information, said list rule database comprises blacklist, black rule, white list and white rule;
Modular converter; Be used for neither belonging to white list or white rule in said user profile, when also not belonging to blacklist or black rule, the content of text that said user is issued carries out format conversion; Extract the notional word in the said content of text, notional word is the core word of content of text;
Computing module is used for calculating the contrary document frequency weighted value of each notional word of extraction at the document database of setting up in advance, obtains first proper vector of being made up of said contrary document frequency weighted value; First similarity of second proper vector of the spam samples content of calculating said first proper vector simultaneously and setting up in advance;
Judge module is used for judging according to said first similarity whether the content of text of said user's issue is qualified content, if qualified content is then announced the content of text of said user's issue;
Said computing module is according to formula
Figure FDA0000140173500000031
Calculate said first similarity, wherein, Cos (X, Y) said first similarity of expression, X={x 1..., x m, Y={y 1..., y nRepresent said first proper vector and second proper vector respectively.
7. the device that the content of text audit of user's issue is handled according to claim 6; It is characterized in that: also comprise detection module; Neither belong to white list or white rule in said user profile; When also not belonging to blacklist or black rule, be used to detect the content of text of said user's issue and second similarity of the feature database that comprises phone number format, webpage format and Mars word form of foundation in advance; Seemingly spend with the third phase that comprises the data bank that to announce words of content of text that detects said user and foundation in advance; And said second similarity and third phase sent to said judge module like degree, said judge module judges like degree whether the content of text that said user issues is qualified content according to said first similarity, second similarity and third phase.
8. the device that the content of text audit of user's issue is handled according to claim 7; It is characterized in that: also comprise statistical module; Neither belong to white list or white rule in said user profile; When not belonging to blacklist or black rule yet; Be used to add up the number of characters of said content of text, and said number of characters is sent to said judge module, said judge module judges like the degree and first similarity whether the content of text of said user's issue is qualified content according to said number of characters, second similarity, third phase.
CN2008102200098A 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof Active CN101446970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102200098A CN101446970B (en) 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102200098A CN101446970B (en) 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof

Publications (2)

Publication Number Publication Date
CN101446970A CN101446970A (en) 2009-06-03
CN101446970B true CN101446970B (en) 2012-07-04

Family

ID=40742648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102200098A Active CN101446970B (en) 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof

Country Status (1)

Country Link
CN (1) CN101446970B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102801640B (en) * 2011-05-23 2016-06-01 腾讯科技(深圳)有限公司 A kind of method and apparatus of message examination & verification
CN102982011B (en) * 2011-09-07 2017-05-31 百度在线网络技术(北京)有限公司 A kind of method and apparatus for recognizing out-of-sequence text
WO2013056513A1 (en) * 2011-10-18 2013-04-25 成都竟创科技有限公司 Interactive time-sharing and segmented surface participation method based on spreading media
CN102681979B (en) * 2012-05-15 2015-04-22 北京师范大学 Content editing intelligent verifying method facing to open knowledge community
CN102880636A (en) * 2012-08-03 2013-01-16 深圳证券信息有限公司 Bad information detection method and server
CN103634283B (en) * 2012-08-24 2017-11-28 腾讯科技(深圳)有限公司 The feedback method and cloud server of a kind of auditing result
CN104301341B (en) * 2013-07-16 2019-01-29 腾讯科技(深圳)有限公司 Information processing method, the apparatus and system of information publishing platform
CN104572393A (en) * 2013-10-24 2015-04-29 世纪禾光科技发展(北京)有限公司 Buyer and seller login monitoring method and buyer and seller login monitoring system
CN103647753B (en) * 2013-11-19 2017-05-24 北京奇安信科技有限公司 LAN file security management method, server and system
CN103778226A (en) * 2014-01-23 2014-05-07 北京奇虎科技有限公司 Method for establishing language information recognition model and language information recognition device
CN105376199B (en) * 2014-08-25 2019-09-13 腾讯科技(北京)有限公司 A kind of information processing method and system, server, client
CN104580529B (en) * 2015-02-03 2018-03-23 郑州悉知信息科技股份有限公司 A kind of signal auditing method and device
JP6791641B2 (en) * 2016-03-18 2020-11-25 ヤフー株式会社 Ad review support device, ad review support method and ad review support program
CN105763555A (en) * 2016-03-31 2016-07-13 世纪禾光科技发展(北京)有限公司 Website risk control server and method and client
CN106372057A (en) * 2016-08-25 2017-02-01 乐视控股(北京)有限公司 Content auditing method and apparatus
CN106504082A (en) * 2016-10-21 2017-03-15 百望股份有限公司 A kind of answering method for tax control field
CN107578268A (en) * 2017-07-31 2018-01-12 上海与德科技有限公司 The dispensing content auditing method and server and jettison system of shared billboard
CN108932283B (en) * 2018-05-21 2024-03-05 平安科技(深圳)有限公司 Customer information screening method, system, computer device and storage medium
CN109862062B (en) * 2018-10-24 2022-10-18 平安科技(深圳)有限公司 Content uploading management method and device, electronic equipment and storage medium
CN109271768B (en) * 2018-10-26 2021-02-05 Oppo广东移动通信有限公司 Distribution information management method, distribution information management device, storage medium and terminal
CN111126928B (en) * 2018-10-29 2024-03-22 阿里巴巴集团控股有限公司 Method and device for auditing release content
US11218506B2 (en) * 2018-12-17 2022-01-04 Microsoft Technology Licensing, Llc Session maturity model with trusted sources
CN111651981B (en) * 2019-02-19 2023-04-21 阿里巴巴集团控股有限公司 Data auditing method, device and equipment
CN110334181A (en) * 2019-06-05 2019-10-15 上海易点时空网络有限公司 Original content based on similarity detection declares method and device
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN110659386B (en) * 2019-09-12 2022-11-22 北京达佳互联信息技术有限公司 Digital resource processing method and device, electronic equipment and storage medium
CN110929055B (en) * 2019-11-15 2023-05-02 北京达佳互联信息技术有限公司 Multimedia quality detection method and device, electronic equipment and storage medium
CN114598699B (en) * 2020-12-07 2023-07-28 国家广播电视总局广播电视科学研究院 File content auditing method and device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1741012A (en) * 2004-08-23 2006-03-01 富士施乐株式会社 Test search apparatus and method
CN101159704A (en) * 2007-10-23 2008-04-09 浙江大学 Microcontent similarity based antirubbish method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1741012A (en) * 2004-08-23 2006-03-01 富士施乐株式会社 Test search apparatus and method
CN101159704A (en) * 2007-10-23 2008-04-09 浙江大学 Microcontent similarity based antirubbish method

Also Published As

Publication number Publication date
CN101446970A (en) 2009-06-03

Similar Documents

Publication Publication Date Title
CN101446970B (en) Method for censoring and process text contents issued by user and device thereof
CN105005594B (en) Abnormal microblog users recognition methods
CN107609121A (en) Newsletter archive sorting technique based on LDA and word2vec algorithms
KR101716905B1 (en) Method for calculating entity similarities
CN101784022A (en) Method and system for filtering and classifying short messages
CN103064987B (en) A kind of wash sale information identifying method
CN107291780A (en) A kind of user comment information methods of exhibiting and device
CN102158428B (en) Rapid and high-accuracy junk mail filtering method
CN103441924A (en) Method and device for spam filtering based on short text
JP2011227889A (en) Method for calculating semantic similarity between message and conversation based on extended entity extraction
CN102542063B (en) Content filtering method, device and system
CN101661513A (en) Detection method of network focus and public sentiment
CN105389389A (en) Network public opinion transmission situation media linked analysis method
CN107424065A (en) The method and system of electronic invoice in a kind of processing Email
CN105912645A (en) Intelligent question and answer method and apparatus
CN101833579A (en) Method and system for automatically detecting academic misconduct literature
CN112492606B (en) Classification recognition method and device for spam messages, computer equipment and storage medium
CN107341157B (en) Customer service conversation clustering method and device
CN106874448A (en) A kind of method and apparatus that earthquake descriptor is excavated from microblogging
CN111309855A (en) Text information processing method and system
CN104731772A (en) Improved feature evaluation function based Bayesian spam filtering method
CN206451175U (en) A kind of Tibetan language paper copy detection system based on Tibetan language sentence level
CN106649338A (en) Information filtering policy generation method and apparatus
CN101594313A (en) A kind of spam judgement, classification, filter method and system based on potential semantic indexing
CN101329668A (en) Method and apparatus for generating information regulation and method and system for judging information types

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131015

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20131015

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Patentee after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.