CN106372057A - Content auditing method and apparatus - Google Patents

Content auditing method and apparatus Download PDF

Info

Publication number
CN106372057A
CN106372057A CN201610727794.0A CN201610727794A CN106372057A CN 106372057 A CN106372057 A CN 106372057A CN 201610727794 A CN201610727794 A CN 201610727794A CN 106372057 A CN106372057 A CN 106372057A
Authority
CN
China
Prior art keywords
content
verification
examination
described content
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610727794.0A
Other languages
Chinese (zh)
Inventor
李术长
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LeTV Holding Beijing Co Ltd
LeTV Information Technology Beijing Co Ltd
Original Assignee
LeTV Holding Beijing Co Ltd
LeTV Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LeTV Holding Beijing Co Ltd, LeTV Information Technology Beijing Co Ltd filed Critical LeTV Holding Beijing Co Ltd
Priority to CN201610727794.0A priority Critical patent/CN106372057A/en
Publication of CN106372057A publication Critical patent/CN106372057A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a content auditing method and apparatus. According to the method and the apparatus provided by embodiments of the invention, to-be-submitted contents of a user are obtained; whether the contents contain preset sensitive vocabularies or not is detected; and if the contents do not contain the preset sensitive vocabularies, the contents are processed by utilizing a classification model to obtain junk degree parameters of the contents, so that whether the contents pass the auditing or not can be determined according to the junk degree parameters. The auditing process is free of artificial participation, the operation is simple, and the accuracy is high, so that the efficiency and reliability of content auditing are improved.

Description

The checking method of content and device
Technical field
The present invention relates to Internet technology, more particularly, to a kind of checking method of content and device.
Background technology
With the development of communication technology, terminal is integrated with increasing function, so that the systemic-function row of terminal More and more corresponding applications (application, app) are contained in table.User all can be needed in major applications to submit to individual The user data of people, for example, the pet name of user, the contact method of user, user content to be released etc..
Generally, these data, after submitting to, need, by carrying out manual examination and verification process, only to pass through the data of examination & verification, The user data successfully as individual subscriber can be submitted to.This mode, complex operation, and easy error, thus in result in Hold the efficiency of examination & verification and the reduction of reliability.
Content of the invention
The many aspects of the present invention provide a kind of checking method of content and device, with improve content auditing efficiency and can By property.
An aspect of of the present present invention, provides a kind of checking method of content, comprising:
Obtain user's content to be committed, detect the sensitive vocabulary whether comprising in described content to pre-set;
If described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, described content is processed, with Obtain the spam degree parameter of described content;
According to described spam degree parameter, determine described content whether by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described acquisition is treated After the content of examination & verification, also include:
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out to described content Whether manual examination and verification are processed, determine described content by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described according to institute State spam degree parameter, determine described content whether by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described according to institute State spam degree parameter, determine described content whether by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines described interior Hold not over examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user is in described blacklist, will be described interior Hold and carry out reporting process, for carrying out manual examination and verification process to described content, determine described content whether by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, if described Spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is carried out reporting process, For manual examination and verification process is carried out to described content, after determining described content whether by examination & verification, also include:
By the manual examination and verification result of described content and described content, as training sample;
Using described training sample, update described disaggregated model.
Another aspect of the present invention, provides a kind of examination & verification device of content, comprising:
Acquiring unit, the content to be committed for obtaining user, detects whether comprise in described content to pre-set quick Sense vocabulary;
Taxon, if do not comprise the sensitive vocabulary pre-setting, using disaggregated model, to described interior for described content Hold and processed, to obtain the spam degree parameter of described content;
Whether analytic unit, for according to described spam degree parameter, determining described content by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described grouping sheet Unit, is additionally operable to
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out to described content Whether manual examination and verification are processed, determine described content by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described analysis list Unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described analysis list Unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines described interior Hold not over examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user is in described blacklist, will be described interior Hold and carry out reporting process, for carrying out manual examination and verification process to described content, determine described content whether by examination & verification.
Aspect as above and arbitrary possible implementation, it is further provided a kind of implementation, described analysis list Unit, is additionally operable to
By the manual examination and verification result of described content and described content, as training sample;And
Using described training sample, update described disaggregated model.
From described technical scheme, the embodiment of the present invention is passed through to obtain user's content to be committed, detects described content In whether comprise the sensitive vocabulary that pre-sets, if described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, Described content is processed, to obtain the spam degree parameter of described content, enabling according to described spam degree parameter, determine Described content, whether by examination & verification, manually need not participate in review process, simple to operate, and accuracy is high, thus improve content The efficiency of examination & verification and reliability.
In addition, adopting technical scheme provided by the present invention, being capable of significant increase Consumer's Experience.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the specific embodiment of the present invention.
Brief description
By reading the detailed description of hereafter preferred implementation, various other advantages and benefit are common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as to the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
The schematic flow sheet of the checking method of the content that Fig. 1 provides for one embodiment of the invention;
The structural representation of the examination & verification device of the content that Fig. 2 provides for another embodiment of the present invention.
Specific embodiment
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here Limited.On the contrary, these embodiments are provided to be able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
It should be noted that involved subscriber terminal equipment in the embodiment of the present invention can include but is not limited to mobile phone, Personal digital assistant (personal digital assistant, pda), radio hand-held equipment, panel computer (tablet Computer), PC (personal computer, pc), Mp 3 player, mp4 player, wearable device be (for example, Intelligent glasses, intelligent watch, Intelligent bracelet etc.) etc..
In addition, the terms "and/or", a kind of only incidence relation of description affiliated partner, expression there may be Three kinds of relations, for example, a and/or b, can represent: individualism a, there are a and b, these three situations of individualism b simultaneously.Separately Outward, character "/" herein, typically represent forward-backward correlation to as if a kind of relation of "or".
The schematic flow sheet of the checking method of the content that Fig. 1 provides for one embodiment of the invention, as shown in Figure 1.
101st, obtain user's content to be committed, detect the sensitive vocabulary whether comprising in described content to pre-set.
If 102 described contents do not comprise the sensitive vocabulary pre-setting, using disaggregated model, at described content Reason, to obtain the spam degree parameter of described content.
So-called spam degree parameter, for an evaluating of the authentic and valid degree of description content.
It should be noted that before 102, existing training method can be adopted, build disaggregated model.The instruction being adopted Practice the training sample included in sample set, can be the known sample through mark, as such, it is possible to directly utilize known to these Sample is trained, to build disaggregated model;Or can also a part be known sample through mark, another part is not There is the unknown sample through mark, then, then can be trained first with known sample, to build preliminary classification model, so Afterwards, recycle preliminary classification model that unknown sample is predicted, to obtain classification results, and then then can be according to unknown sample Classification results, unknown sample is labeled, to form known sample, as the known sample newly increasing, using newly increasing Known sample, and original known sample re-starts training, to build new disaggregated model, until constructed classification Till model or known sample meet the cut-off condition of disaggregated model, such as classification accuracy is more than or equal to pre-set accurate The quantity of rate threshold value or known sample is more than or equal to amount threshold pre-setting etc., and the present embodiment is not especially limited to this Fixed.
103rd, according to described spam degree parameter, determine described content whether by examination & verification.
It should be noted that 101~103 executive agent can be partly or entirely the application being located locally terminal, Or can also be plug-in unit or the SDK (software in the application be arranged in local terminal Development kit, sdk) etc. functional unit, or can also be the process engine in network side server, or Can also be the distributed system positioned at network side, the present embodiment is not particularly limited to this.
It is understood that described application can be mounted in the local program (nativeapp) in terminal, or also may be used To be a web page program (webapp) of browser in terminal, the present embodiment is not particularly limited to this.
So, the content to be committed by obtaining user, detects the sensitive word whether comprising in described content to pre-set Converge, if described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, described content is processed, to obtain The spam degree parameter of described content, enabling according to described spam degree parameter, determines that described content, need not whether by examination & verification Artificial participation review process, simple to operate, and accuracy is high, thus improve efficiency and the reliability of content auditing.
In the present invention, by the rubbish contents accounting of the user data such as user basic information and user's content to be issued Less, can be filtered by key word is that sensitive vocabulary filters and the very high content of degree of safety audited automatically by disaggregated model, so The inner capacitiess needing manual examination and verification to process can be greatly reduced, substantial amounts of manual examination and verification process time and manpower money can be saved Source, can effectively improve the efficiency of content auditing.
Alternatively, in a possible implementation of the present embodiment, sensitive word lists can be pre-set, Several sensitive vocabulary are safeguarded in this sensitive word lists.Obtaining user's content to be committed, detecting in described content is After the no sensitive vocabulary comprising to pre-set, specifically may determine that whether comprise in described content quick in sensitive word lists Sense vocabulary.
If described content does not comprise described sensitivity vocabulary, 102 can be continued executing with, using disaggregated model, to described interior Hold and processed, to obtain the spam degree parameter of described content.
If described content comprises described sensitivity vocabulary, described content can be carried out reporting process, for described interior Hold and carry out manual examination and verification process, determine described content whether by examination & verification.Specifically, manual examination and verification are processed rule and mark Standard, can be particularly limited to this using any rule of the prior art and standard, the present embodiment.
After manual examination and verification process is carried out to described content, can also be artificial by described content and described content further Examination & verification result, as training sample.Then, recycle described training sample, update described disaggregated model.
Alternatively, in a possible implementation of the present embodiment, in 102, the disaggregated model being adopted is permissible Including but not limited to word-dividing mode, Bayes's classification module and training dictionary module, the present embodiment is not particularly limited to this. In 102, specifically can execute and operate as follows:
First, in word-dividing mode, specifically word segmentation processing can be carried out to described content, to obtain word segmentation result.Right After described content carries out word segmentation processing, in order to improve the efficiency of subsequent treatment and reduce noise, to obtain after word segmentation processing Each word carries out filtration treatment, filtration treatment including but not limited to set forth below: filters out what default deactivation vocabulary was comprised Word;Wherein, generic word list is to be in advance based on function word, auxiliary word, pronoun, article, adverbial word, modal particle that word frequency statisticses go out etc., this A little words generally do not possess independent competency.Specifically can be by default high frequency bar be reached to the frequency of occurrences in existing resource The word of part is collected obtaining, for example, auxiliary word " " there is the very high frequency of occurrences, but it generally has very low energy of expressing the meaning Power, therefore, is collected in deactivation vocabulary.
Secondly, in Bayes's classification module, specifically can be using the vocabulary being stored in training dictionary module and this word The category attribute converging, obtains the category attribute of each word segmentation result;And then, then can be belonged to according to the classification of each word segmentation result Property, obtain the rubbish scoring of each word segmentation result, the rubbish scoring according to each word segmentation result is it becomes possible to obtain described content Spam degree parameter.Calculate the spam degree parameter of content, multiple methods of the prior art can be adopted, detailed description may refer to Related content of the prior art, here is omitted.
Alternatively, in a possible implementation of the present embodiment, in addition it is also necessary to pre-set parameter before 103 Threshold value.Specifically, an empirical value can be set, as parameter threshold, or optimized algorithm can also be utilized, draw one Optimal value, as parameter threshold, the present embodiment is not particularly limited to this.In the spam degree obtaining user's content to be committed After parameter, specifically may determine that described spam degree parameter whether less than or equal to the parameter threshold pre-setting.
During a concrete implementation, if described spam degree parameter is less than or equal to the parameter threshold pre-setting, Then can determine that described content passes through examination & verification.
During another concrete implementation, if described spam degree parameter is more than described parameter threshold, determine described interior Hold not over examination & verification.
During another concrete implementation, if described spam degree parameter is more than described parameter threshold, and described user In described blacklist, then can determine described content not over examination & verification.So, this content need not be carried out again with any examining Core, it is possible to directly determine that this content is not passed through to audit, can effectively improve the efficiency of content auditing.
Wherein, the user being stored in blacklist, can be tactful according to specifying, determination.Described specified strategy is permissible The content submitted to by certain user is processed through manual examination and verification, determines that the number of times that this content does not pass through to audit is more than and specifies threshold Value, for example, 5 times, etc., the present embodiment is not particularly limited to this.
During another concrete implementation, if described spam degree parameter is more than described parameter threshold, and described user Not in described blacklist, then described content can be carried out reporting process, for manual examination and verification process is carried out to described content, Determine described content whether by examination & verification.Specifically, manual examination and verification are processed rule and standard, can be using in prior art Any rule and standard, the present embodiment is not particularly limited to this.
After manual examination and verification process is carried out to described content, can also be artificial by described content and described content further Examination & verification result, as training sample.Then, recycle described training sample, update described disaggregated model.
In the present embodiment, by obtaining user's content to be committed, detect whether comprise in described content to pre-set Sensitive vocabulary, if described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, is processed to described content, To obtain the spam degree parameter of described content, enabling according to described spam degree parameter, determine described content whether by examining Core, manually need not participate in review process, simple to operate, and accuracy is high, thus improve efficiency and the reliability of content auditing.
In addition, adopting technical scheme provided by the present invention, being capable of significant increase Consumer's Experience.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement because According to the present invention, some steps can be carried out using other orders or simultaneously.Secondly, those skilled in the art also should know Know, embodiment described in this description belongs to preferred embodiment, involved action and the module not necessarily present invention Necessary.
In the described embodiment, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiment.
The structural representation of the examination & verification device of the content that Fig. 2 provides for another embodiment of the present invention, as shown in Figure 2.This reality The examination & verification device applying the content of example can include acquiring unit 21, taxon 22 and analytic unit 23.Wherein, acquiring unit 21, the content to be committed for obtaining user, detects the sensitive vocabulary whether comprising in described content to pre-set;Taxon 22, if do not comprise the sensitive vocabulary pre-setting for described content, using disaggregated model, described content is processed, with Obtain the spam degree parameter of described content;Analytic unit 23, for according to described spam degree parameter, determining whether described content leads to Cross examination & verification.
It should be noted that the examination & verification device of the content that provided of the present embodiment partly or entirely can be for being located locally The application of terminal, or can also be plug-in unit or the SDK in the application be arranged in local terminal Functional units such as (software development kit, sdk), or can also be the process in network side server Engine, or can also be the distributed system positioned at network side, the present embodiment is not particularly limited to this.
It is understood that described application can be mounted in the local program (nativeapp) in terminal, or also may be used To be a web page program (webapp) of browser in terminal, the present embodiment is not particularly limited to this.
Alternatively, in a possible implementation of the present embodiment, described taxon 23, can also use further If comprising described sensitivity vocabulary in described content, described content is carried out reporting process, for manually being examined to described content Whether core is processed, determine described content by examination & verification.
Further, described taxon 22, can also be further used for manually examining described content and described content Core result, as training sample;And utilize described training sample, update described disaggregated model.
Alternatively, in a possible implementation of the present embodiment, described analytic unit 23, specifically can be used for institute Stating analytic unit 23, if specifically can be used for described spam degree parameter to be less than or equal to the parameter threshold pre-setting, determining institute State content and pass through examination & verification;If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
Alternatively, in a possible implementation of the present embodiment, described analytic unit 23, if specifically can be used for Described spam degree parameter is less than or equal to the parameter threshold pre-setting, and determines that described content passes through examination & verification;If described spam degree Parameter is more than described parameter threshold, and described user, in described blacklist, determines described content not over examination & verification;If described Spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is carried out reporting process, For carrying out manual examination and verification process to described content, determine described content whether by examination & verification.
Further, described analytic unit 23, can also be further used for manually examining described content and described content Core result, as training sample;And utilize described training sample, update described disaggregated model.
It should be noted that method in the corresponding embodiment of Fig. 1, the examination & verification device of the content that can be provided by the present embodiment Realize.Describe the related content that may refer in the corresponding embodiment of Fig. 1 in detail, here is omitted.
In the present embodiment, user's content to be committed is obtained by acquiring unit, whether comprises pre- in the described content of detection The sensitive vocabulary first arranging, if the described content of taxon does not comprise the sensitive vocabulary pre-setting, using disaggregated model, to institute State content to be processed, with obtain the spam degree parameter of described content so that analytic unit can according to described spam degree parameter, Determine that described content, whether by examination & verification, manually need not participate in review process, simple to operate, and accuracy is high, thus improve The efficiency of content auditing and reliability.
In addition, adopting technical scheme provided by the present invention, being capable of significant increase Consumer's Experience.
Described above illustrate and describes some preferred embodiments of the application, but as previously mentioned it should be understood that the application Be not limited to form disclosed herein, be not to be taken as the exclusion to other embodiment, and can be used for various other combinations, Modification and environment, and can be in invention contemplated scope described herein, by technology or the knowledge of above-mentioned teaching or association area It is modified.And the change that those skilled in the art are carried out and change without departing from spirit and scope, then all should be in this Shen Please be in the protection domain of claims.

Claims (10)

1. a kind of checking method of content is it is characterised in that include:
Obtain user's content to be committed, detect the sensitive vocabulary whether comprising in described content to pre-set;
If described content does not comprise the sensitive vocabulary pre-setting, using disaggregated model, described content is processed, to obtain The spam degree parameter of described content;
According to described spam degree parameter, determine described content whether by examination & verification.
2. method according to claim 1 is it is characterised in that after the pending content of described acquisition, also include:
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out manually to described content Whether examination & verification is processed, determine described content by examination & verification.
3. method according to claim 1 and 2 it is characterised in that described according to described spam degree parameter, determine described interior Whether hold by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
4. method according to claim 1 and 2 it is characterised in that described according to described spam degree parameter, determine described interior Whether hold by examination & verification, comprising:
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines that described content does not have Have by examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is entered Whether row reports process, for carrying out manual examination and verification process to described content, determine described content by examination & verification.
If 5. method according to claim 4 is it is characterised in that described spam degree parameter is more than described parameter threshold It is worth, and described user not in described blacklist, described content is carried out reporting process, for manually being examined to described content Whether core is processed, after determining described content by examination & verification, also include:
By the manual examination and verification result of described content and described content, as training sample;
Using described training sample, update described disaggregated model.
6. a kind of examination & verification device of content is it is characterised in that include:
Acquiring unit, the content to be committed for obtaining user, detects the sensitive word whether comprising in described content to pre-set Converge;
Taxon, if do not comprise the sensitive vocabulary pre-setting for described content, using disaggregated model, enters to described content Row is processed, to obtain the spam degree parameter of described content;
Whether analytic unit, for according to described spam degree parameter, determining described content by examination & verification.
7. device according to claim 6, it is characterised in that described taxon, is additionally operable to
If described content comprises described sensitivity vocabulary, described content is carried out reporting process, for carrying out manually to described content Whether examination & verification is processed, determine described content by examination & verification.
8. the method according to claim 6 or 7 is it is characterised in that described analytic unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, determine described content not over examination & verification.
9. the device according to claim 6 or 7 is it is characterised in that described analytic unit, specifically for
If described spam degree parameter is less than or equal to the parameter threshold pre-setting, determine that described content passes through examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, in described blacklist, determines that described content does not have Have by examination & verification;
If described spam degree parameter is more than described parameter threshold, and described user, not in described blacklist, described content is entered Whether row reports process, for carrying out manual examination and verification process to described content, determine described content by examination & verification.
10. device according to claim 9 is it is characterised in that described analytic unit, is additionally operable to described content and described The manual examination and verification result of content, as training sample;And utilize described training sample, update described disaggregated model.
CN201610727794.0A 2016-08-25 2016-08-25 Content auditing method and apparatus Pending CN106372057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610727794.0A CN106372057A (en) 2016-08-25 2016-08-25 Content auditing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610727794.0A CN106372057A (en) 2016-08-25 2016-08-25 Content auditing method and apparatus

Publications (1)

Publication Number Publication Date
CN106372057A true CN106372057A (en) 2017-02-01

Family

ID=57878279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610727794.0A Pending CN106372057A (en) 2016-08-25 2016-08-25 Content auditing method and apparatus

Country Status (1)

Country Link
CN (1) CN106372057A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612893A (en) * 2017-09-01 2018-01-19 北京百悟科技有限公司 The auditing system and method and structure short message examination & verification model method of short message
CN108304537A (en) * 2018-01-30 2018-07-20 上海康斐信息技术有限公司 Retain the method and system of user's message
CN108960782A (en) * 2018-07-10 2018-12-07 北京木瓜移动科技股份有限公司 content auditing method and device
CN109831751A (en) * 2019-01-04 2019-05-31 上海创蓝文化传播有限公司 A kind of short message content air control system and method based on natural language processing
CN109918202A (en) * 2019-03-08 2019-06-21 上海七牛信息技术有限公司 Information processing method, device and storage medium
CN111159354A (en) * 2019-12-31 2020-05-15 中国银行股份有限公司 Sensitive information detection method, device, equipment and system
CN111611312A (en) * 2020-05-19 2020-09-01 四川万网鑫成信息科技有限公司 Data desensitization method based on rule engine and block chain technology
CN112232715A (en) * 2020-11-19 2021-01-15 湖南红网新媒体集团有限公司 Volunteer service online management method, system and device
CN112579771A (en) * 2020-12-08 2021-03-30 腾讯科技(深圳)有限公司 Content title detection method and device
CN113312449A (en) * 2021-05-17 2021-08-27 华南理工大学 Text auditing method, system and medium based on keywords and deep learning
CN113360566A (en) * 2021-08-06 2021-09-07 成都明途科技有限公司 Information content monitoring method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446970A (en) * 2008-12-15 2009-06-03 腾讯科技(深圳)有限公司 Method for censoring and process text contents issued by user and device thereof
CN201550138U (en) * 2009-09-10 2010-08-11 北京盛景无限文化传媒有限公司 System for providing mobile stream medium service
CN102098332A (en) * 2010-12-30 2011-06-15 北京新媒传信科技有限公司 Method and device for examining and verifying contents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446970A (en) * 2008-12-15 2009-06-03 腾讯科技(深圳)有限公司 Method for censoring and process text contents issued by user and device thereof
CN201550138U (en) * 2009-09-10 2010-08-11 北京盛景无限文化传媒有限公司 System for providing mobile stream medium service
CN102098332A (en) * 2010-12-30 2011-06-15 北京新媒传信科技有限公司 Method and device for examining and verifying contents

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612893B (en) * 2017-09-01 2020-06-02 北京百悟科技有限公司 Short message auditing system and method and short message auditing model building method
CN107612893A (en) * 2017-09-01 2018-01-19 北京百悟科技有限公司 The auditing system and method and structure short message examination & verification model method of short message
CN108304537A (en) * 2018-01-30 2018-07-20 上海康斐信息技术有限公司 Retain the method and system of user's message
CN108960782A (en) * 2018-07-10 2018-12-07 北京木瓜移动科技股份有限公司 content auditing method and device
CN109831751A (en) * 2019-01-04 2019-05-31 上海创蓝文化传播有限公司 A kind of short message content air control system and method based on natural language processing
CN109918202A (en) * 2019-03-08 2019-06-21 上海七牛信息技术有限公司 Information processing method, device and storage medium
CN111159354A (en) * 2019-12-31 2020-05-15 中国银行股份有限公司 Sensitive information detection method, device, equipment and system
CN111611312A (en) * 2020-05-19 2020-09-01 四川万网鑫成信息科技有限公司 Data desensitization method based on rule engine and block chain technology
CN112232715A (en) * 2020-11-19 2021-01-15 湖南红网新媒体集团有限公司 Volunteer service online management method, system and device
CN112579771A (en) * 2020-12-08 2021-03-30 腾讯科技(深圳)有限公司 Content title detection method and device
CN112579771B (en) * 2020-12-08 2024-05-07 腾讯科技(深圳)有限公司 Content title detection method and device
CN113312449A (en) * 2021-05-17 2021-08-27 华南理工大学 Text auditing method, system and medium based on keywords and deep learning
CN113360566A (en) * 2021-08-06 2021-09-07 成都明途科技有限公司 Information content monitoring method and system

Similar Documents

Publication Publication Date Title
CN106372057A (en) Content auditing method and apparatus
KR101752251B1 (en) Method and device for identificating a file
CN108108902B (en) Risk event warning method and device
CN108121795B (en) User behavior prediction method and device
US10963912B2 (en) Method and system for filtering goods review information
CN108959329B (en) Text classification method, device, medium and equipment
CN108280542A (en) A kind of optimization method, medium and the equipment of user's portrait model
CN110610193A (en) Method and device for processing labeled data
CN112860841A (en) Text emotion analysis method, device and equipment and storage medium
CN106960248B (en) Method and device for predicting user problems based on data driving
CN107305575A (en) The punctuate recognition methods of human-machine intelligence's question answering system and device
CN111460250A (en) Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus
CN104216876A (en) Informative text filter method and system
CN104915356A (en) Text classification correcting method and device
CN104142912A (en) Accurate corpus category marking method and device
CN109508373A (en) Calculation method, equipment and the computer readable storage medium of enterprise's public opinion index
CN108280164A (en) A kind of short text filtering and sorting technique based on classification related words
CN107491536A (en) A kind of examination question method of calibration, examination question calibration equipment and electronic equipment
CN109902157A (en) A kind of training sample validation checking method and device
CN104809104A (en) Method and system for identifying micro-blog textual emotion
CN109858035A (en) A kind of sensibility classification method, device, electronic equipment and readable storage medium storing program for executing
CN104850540A (en) Sentence recognizing method and sentence recognizing device
CN103886097A (en) Chinese microblog viewpoint sentence recognition feature extraction method based on self-adaption lifting algorithm
CN109933775B (en) UGC content processing method and device
CN105786929A (en) Information monitoring method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170201