CN114116958A - Auditing method, auditing device, electronic equipment and storage medium - Google Patents

Auditing method, auditing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114116958A
CN114116958A CN202110365261.3A CN202110365261A CN114116958A CN 114116958 A CN114116958 A CN 114116958A CN 202110365261 A CN202110365261 A CN 202110365261A CN 114116958 A CN114116958 A CN 114116958A
Authority
CN
China
Prior art keywords
information
name
unit
audited
abbreviation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110365261.3A
Other languages
Chinese (zh)
Inventor
郑爱国
郭洋
彭南博
田国刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202110365261.3A priority Critical patent/CN114116958A/en
Publication of CN114116958A publication Critical patent/CN114116958A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an auditing method, an auditing device, electronic equipment and a storage medium, wherein the method comprises the following steps: in the process of checking the name information to be checked, under the condition that the name information to be checked is determined to be an abbreviation, acquiring full-name unit name information corresponding to the name unit information to be checked according to a preset abbreviation library; carrying out structured information extraction on the full-name unit name information to obtain first structured information of the full-name unit name information; acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information; and determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal. Therefore, by combining the structural information of the unit name information, the name to be audited is accurately audited, and the accuracy of auditing the name unit is improved.

Description

Auditing method, auditing device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an auditing method and apparatus, an electronic device, and a storage medium.
Background
With the rapid development of information technology, more and more offline services are gradually online, and in various auditing service scenes, the verification of a client work unit is a common scene. In the related art, generally, a work unit to be verified by a user and a work unit corresponding to the user and disclosed by a third-party organization are subjected to text similarity calculation, and it is determined that unit name information is consistent according to the text similarity. However, in the process of implementing the present application, the applicant finds that a large error exists in the above-mentioned manner of examining the information of the work units by means of text similarity.
Disclosure of Invention
The application provides an auditing method, an auditing device, electronic equipment and a storage medium.
An embodiment of an aspect of the present application provides an auditing method, including: receiving an audit request of a request terminal, wherein the audit request comprises name information of a unit to be audited and corresponding user information; determining the name information of the unit to be audited as an abbreviation; acquiring full-name unit name information corresponding to the unit information of the name to be audited according to a preset abbreviation library; carrying out structured information extraction on the full name unit name information to obtain first structured information of the full name unit name information; acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information; and determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal.
In an embodiment of the present application, the determining that the name information of the unit to be audited is an abbreviation includes: inquiring whether an abbreviation matched with the name information of the unit to be checked exists in the abbreviation library; and if the abbreviation matched with the text information exists in the abbreviation library, determining the name information of the unit to be audited as the abbreviation.
In an embodiment of the present application, the determining that the name information of the unit to be audited is an abbreviation includes: extracting information of the name information of the unit to be audited to obtain administrative region information and industry attribute information of the name information of the unit to be audited; and under the condition that the attribute values of the administrative area information and the industry attribute information are both null, determining the name information of the unit to be audited as an abbreviation.
In an embodiment of the present application, the extracting information of the name information of the unit to be audited to obtain administrative area information and industry attribute information of the name information of the unit to be audited includes: matching the name information of the unit to be audited by using a regular expression, and extracting main part information which accords with the format of the regular expression; performing structured information extraction on the main part information to obtain administrative region information and industry attribute information of the name information of the unit to be audited
In one embodiment of the present application, the abbreviation library is constructed by: obtaining a distributed representation vector of the abbreviation sample; acquiring distributed expression vectors of a plurality of full-name unit name sample information; determining similarity between the abbreviation sample and each of the full-name unit name sample information according to the distributed representation vector of the abbreviation sample and the distributed representation vectors of the full-name unit name sample information; according to the similarity, target full-name unit name sample information with the maximum similarity to the abbreviation sample is selected from the multiple pieces of full-name unit name sample information; and constructing the abbreviation library according to the corresponding relation between the abbreviation sample and the target full-name unit name sample information.
In an embodiment of the application, the determining, according to the first structured information and the second structured information, a review result of the name information of the unit to be reviewed includes: comparing the text information corresponding to the same type of structural elements in the first structural information and the second structural information; if the text information corresponding to the same type of structural elements in the first structural information and the second structural information is the same, determining that the name information of the unit to be audited passes the audit; and if the text information parts corresponding to the same type of structural elements in the first structural information and the second structural information are the same, determining that the name information of the unit to be audited does not pass the audit.
According to the auditing method, in the process of auditing the name information to be audited, under the condition that the name information to be audited is determined to be an abbreviation, the full-name unit name information corresponding to the name unit information to be audited is obtained according to a preset abbreviation library; carrying out structured information extraction on the full-name unit name information to obtain first structured information of the full-name unit name information; acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information; and determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal. Therefore, by combining the structural information of the unit name information, the name to be audited is accurately audited, and the accuracy of auditing the name unit is improved.
An embodiment of another aspect of the present application provides an audit device, including: the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an auditing request of a request terminal, and the auditing request comprises name information of a unit to be audited and corresponding user information; the first determining module is used for determining that the name information of the unit to be audited is an abbreviation; the first acquisition module is used for acquiring full-name unit name information corresponding to the unit information of the name to be audited according to a preset abbreviation library; the first extraction module is used for carrying out structured information extraction on the full-name unit name information to obtain first structured information of the full-name unit name information; the second acquisition module is used for acquiring standard unit name information corresponding to the user information and acquiring second structured information corresponding to the standard unit name information; and the second determining module is used for determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal.
In an embodiment of the application, the first determining module is specifically configured to: inquiring whether an abbreviation matched with the name information of the unit to be checked exists in the abbreviation library; and if the abbreviation matched with the text information exists in the abbreviation library, determining the name information of the unit to be audited as the abbreviation.
In one embodiment of the present application, the first determining module includes: the information extraction unit is used for extracting the information of the name information of the unit to be audited so as to obtain administrative region information and industry attribute information of the name information of the unit to be audited; and the determining unit is used for determining the name information of the unit to be audited as the abbreviation under the condition that the attribute values of the administrative area information and the industry attribute information are both null.
In an embodiment of the application, the information extracting unit is specifically configured to: matching the name information of the unit to be audited by using a regular expression, and extracting main part information which accords with the format of the regular expression; and performing structured information extraction on the main part information to obtain administrative area information and industry attribute information of the name information of the unit to be audited.
In one embodiment of the present application, the apparatus further comprises: the third acquisition module is used for acquiring the distributed expression vectors of the abbreviation samples; the fourth acquisition module is used for acquiring a plurality of distributed expression vectors of the full-name unit name sample information; a third determining module, configured to determine, according to the distributed representation vector of the abbreviation sample and the distributed representation vectors of the multiple pieces of full-name unit name sample information, a similarity between the abbreviation sample and each piece of full-name unit name sample information; the selection module is used for selecting target full-name unit name sample information with the maximum similarity with the abbreviation sample from the plurality of full-name unit name sample information according to the similarity; and the construction module is used for constructing the abbreviation library according to the corresponding relation between the abbreviation sample and the target full-name unit name sample information.
In an embodiment of the application, the first structured information and the second structured information have the same types and number of structured elements, and the number of structured elements is multiple, and the second determining module is specifically configured to: comparing the text information corresponding to the same type of structural elements in the first structural information and the second structural information; if the text information corresponding to the same type of structural elements in the first structural information and the second structural information is the same, determining that the name information of the unit to be audited passes the audit; and if the text information parts corresponding to the same type of structural elements in the first structural information and the second structural information are the same, determining that the name information of the unit to be audited does not pass the audit.
In the auditing device of the embodiment of the application, in the process of auditing the name information to be audited, under the condition that the name information to be audited is determined to be an abbreviation, the full-name unit name information corresponding to the unit information of the name to be audited is obtained according to a preset abbreviation library; carrying out structured information extraction on the full-name unit name information to obtain first structured information of the full-name unit name information; acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information; and determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal. Therefore, by combining the structural information of the unit name information, the name to be audited is accurately audited, and the accuracy of auditing the name unit is improved.
An embodiment of another aspect of the present application provides an electronic device, including: an electronic device, comprising: a memory, a processor; the memory stores computer instructions, and when the computer instructions are executed by the processor, the auditing method of the embodiment of the application is realized.
Another embodiment of the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute an auditing method disclosed in an embodiment of the present application.
Another embodiment of the present application provides a computer program product, and when executed by an instruction processor in the computer program product, the auditing method in the embodiment of the present application is implemented.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic flow diagram of an auditing method according to one embodiment of the present application.
Fig. 2 is a schematic flow chart of the construction of the abbreviation library.
Fig. 3 is a schematic structural diagram of an auditing apparatus according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of an auditing apparatus according to another embodiment of the present application.
FIG. 5 is a block diagram of an electronic device according to one embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
An auditing method, apparatus and electronic device according to an embodiment of the present application are described below with reference to the drawings.
FIG. 1 is a schematic flow diagram of an auditing method according to one embodiment of the present application. It should be noted that an execution subject of the auditing method provided in this embodiment is an auditing device, the auditing device may be implemented in a software and/or hardware manner, the auditing device in this embodiment may be configured in an e-commerce platform, the e-commerce platform may be configured in an electronic device, and the electronic device in this embodiment may include a terminal device, a server, and other devices.
As shown in fig. 1, the auditing method may include:
step 101, receiving an audit request of a request end, wherein the audit request comprises name information of a unit to be audited and corresponding user information.
And 102, determining the name information of the unit to be checked as an abbreviation.
It can be understood that, in different application scenarios, the implementation manner of determining the name information of the unit to be audited as an abbreviation is different, and the following examples are illustrated:
as an exemplary embodiment, whether an abbreviation matched with name information of a unit to be checked exists in an abbreviation library can be inquired; and if the abbreviation matched with the text information exists in the abbreviation library, determining the name information of the unit to be checked as the abbreviation.
As another exemplary implementation manner, information extraction is performed on the name information of the unit to be audited, so as to obtain administrative region information and industry attribute information of the name information of the unit to be audited; and under the condition that the attribute values of the administrative area information and the industry attribute information are both null, determining the name information of the unit to be audited as the abbreviation.
And 103, acquiring full name unit name information corresponding to the unit information of the name to be audited according to a preset abbreviation library.
And 104, performing structural information extraction on the full-name unit name information to obtain first structural information of the full-name unit name information.
In some embodiments, one possible implementation manner of the above-mentioned structured information extraction on the full-scale unit name information to obtain the first structured information of the full-scale unit name information is as follows: the information extraction is carried out on the structural information of the full-name unit name information through a preset structural information extraction model so as to obtain first structural information of the full-name unit name information.
In other embodiments, another possible implementation manner of the above-mentioned structured information extraction on the full-scale unit name information to obtain the first structured information of the full-scale unit name information is as follows: the method comprises the steps of extracting administrative region information and industry attribute information of full-name unit name information to obtain the administrative region information and the industry attribute information of the full-name unit name information, and then determining enterprise word size information of the full-name unit name information according to text information which is not extracted from the full-name unit name information.
As an exemplary embodiment, the dictionary may be divided by a preset administrative district, and the administrative district information is extracted from the full-name unit name information based on a regular expression.
In some embodiments, the industry attribute information can be extracted from the full name unit name information through a preset industry attribute dictionary.
For example, the name information of the unit to be checked is Venusan division of Hongxiang, the name information of the unit to be checked is determined to be an abbreviation, and the full-name unit name information corresponding to the abbreviation is determined to be Suzhou Hongxiang materialonly property management limited company according to a preset abbreviation library. Correspondingly, the administrative district information and the industry attribute information are extracted from the full-name unit name information, so that the administrative district information corresponding to the full-name unit name information is Suzhou, the industry attribute information is property management, and the enterprise word size information corresponding to the full-name unit name information can be determined as follows: pinwei Hongxing.
And 105, acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information.
In some embodiments, one possible implementation manner of obtaining the second structured information corresponding to the standard unit name information is as follows: and extracting the information of the standard unit name information through a preset structured information extraction model to obtain second structured information of the standard unit name information.
In other embodiments, one possible implementation manner of obtaining the second structured information corresponding to the standard unit name information is as follows: and then, determining the enterprise word size information of the standard unit name information according to the text information which is not extracted from the standard unit name information.
As an exemplary embodiment, the dictionary may be divided by a preset administrative district, and the administrative district information is extracted from the standard unit name information based on a regular expression.
In some embodiments, the industry attribute information can be extracted from the standard unit name information through a preset industry attribute dictionary.
The industry attribute dictionary is pre-established, and may include words such as security service, insurance, and the like.
In some embodiments, an exemplary implementation of building an industry property dictionary is: performing word segmentation on the sample text through a word segmentation algorithm, combining the words with a large number of co-occurrences, and manually verifying the words to be used as a standard industry property dictionary; for example, the aforementioned security service is classified into security and service in the word segmentation, and the co-occurrence frequency of security and service found by combination is relatively high, and can be defined as an industry-specific word.
And step 106, determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal.
According to the auditing method, in the process of auditing the name information to be audited, under the condition that the name information to be audited is determined to be an abbreviation, the full-name unit name information corresponding to the name unit information to be audited is obtained according to a preset abbreviation library; carrying out structured information extraction on the full-name unit name information to obtain first structured information of the full-name unit name information; acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information; and determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal. Therefore, by combining the structural information of the unit name information, the name to be audited is accurately audited, and the accuracy of auditing the name unit is improved.
Based on the above embodiment, in an embodiment of the present application, information extraction is performed on the name information of the unit to be audited, so as to obtain administrative area information and industry attribute information of the name information of the unit to be audited, and one possible implementation manner of the information extraction is as follows: and matching the name information of the unit to be audited by using the regular expression, and extracting the main part information which accords with the format of the regular expression. And performing structured information extraction on the main part information to obtain administrative area information and industry attribute information of the name information of the unit to be audited.
The regular expression format is preset, for example, the regular expression format may be XX (branch )? '' XX (college | elementary school) (. division)? ' company (+?.
In an embodiment of the application, in order to determine administrative region information and industry attribute information of name information of a unit to be audited, before matching the name information of the unit to be audited by using a regular expression and extracting main part information conforming to the format of the regular expression from the information, the name information of the unit to be audited can be preprocessed.
The preprocessing may include, but is not limited to, removing spaces and tabulation characters before and after a character string, turning full angles to half angles, turning numbers into Chinese in a unified manner, removing messy codes, turning complex characters into simple characters, unifying lower-case letters, and the like.
In practical application, the processing performed on the name information of the unit to be audited may be preset according to actual requirements, and the preprocessing is not limited to this.
In an embodiment of the present application, in order to accurately determine the full-name unit name information corresponding to an abbreviation word based on the abbreviation word, an abbreviation word library may be pre-constructed in the following manner, as shown in fig. 2, which may include:
step 201, obtaining a distributed representation vector of the abbreviation sample.
In some embodiments, the distributed representation vectors of the abbreviation samples may be obtained in a federated learning manner.
In particular, the distributed representation vectors of the abbreviation samples can be obtained through an encryption gradient return in a federal learning mode under the condition that data is not exported.
In some embodiments, data obtained from multiple data terminals may be jointly trained to obtain a corresponding federated model, and then distributed representation vectors of the abbreviation samples are determined through the federated model.
Step 202, obtaining a plurality of distributed expression vectors of sample information of the full-name unit name.
In some embodiments, for each full-scale unit name sample information, a distributed representation vector corresponding to the full-scale unit name sample information may be obtained by a federally-learned party.
Specifically, under the condition that data is not exported from a warehouse, a distributed expression vector corresponding to sample information of the full-name unit name can be obtained through encryption gradient return in a federal learning mode.
In some embodiments, data obtained from a plurality of data terminals may be trained to obtain a corresponding federal model, and then a distributed representation vector of sample information of a full-name unit name is determined through the federal model.
And step 203, determining the similarity between the abbreviation sample and each full-name unit name sample information according to the distributed expression vector of the abbreviation sample and the distributed expression vectors of the full-name unit name sample information.
And step 204, selecting target full-name unit name sample information with the maximum similarity to the abbreviation sample from the plurality of full-name unit name sample information according to the similarity.
And step 205, constructing an abbreviation library according to the corresponding relation between the abbreviation samples and the target full-name unit name sample information.
Based on the above embodiment, the first structural information and the second structural information have the same types and number of structural elements, and the number of structural elements is multiple, and one possible implementation manner of determining the auditing result of the name information of the unit to be audited according to the first structural information and the second structural information is as follows: and comparing the text information corresponding to the same type of structural elements in the first structural information and the second structural information. And if the text information corresponding to the same type of structural elements in the first structural information and the second structural information is the same, determining that the name information of the unit to be audited passes the audit. And if the text information parts corresponding to the same type of structural elements in the first structural information and the second structural information are the same, determining that the name information of the unit to be audited does not pass the audit.
Corresponding to the auditing methods provided in the above embodiments, an embodiment of the present application further provides an auditing apparatus, and since the auditing apparatus provided in the embodiment of the present application corresponds to the auditing methods provided in the above embodiments, the implementation manner of the auditing method is also applicable to the auditing apparatus provided in the embodiment, and is not described in detail in this embodiment.
Fig. 3 is a schematic structural diagram of an auditing apparatus according to an embodiment of the present application.
As shown in fig. 3, the auditing apparatus 300 includes a receiving module 301, a first determining module 302, a first obtaining module 303, a first extracting module 304, a second obtaining module 305, and a second determining module 306, where:
the receiving module 301 is configured to receive an audit request from a request end, where the audit request includes name information of a unit to be audited and corresponding user information.
The first determining module 302 is configured to determine that the name information of the unit to be audited is an abbreviation.
The first obtaining module 303 is configured to obtain, according to a preset abbreviation library, full-name unit name information corresponding to the unit information of the name to be checked.
The first extracting module 304 is configured to perform structured information extraction on the full name unit information to obtain first structured information of the full name unit information.
A second obtaining module 305, configured to obtain standard unit name information corresponding to the user information, and obtain second structured information corresponding to the standard unit name information.
And the second determining module 306 is configured to determine an audit result of the name information of the unit to be audited according to the first structural information and the second structural information, and feed the audit result back to the request end.
In an embodiment of the present application, the first determining module 302 is specifically configured to: and inquiring whether the abbreviation matched with the name information of the unit to be checked exists in the abbreviation library. And if the abbreviation matched with the text information exists in the abbreviation library, determining the name information of the unit to be checked as the abbreviation.
In an embodiment of the present application, as shown in fig. 4, the first determining module 302 may include:
the information extraction unit 3021 is configured to extract information of the name information of the unit to be audited, so as to obtain administrative area information and industry attribute information of the name information of the unit to be audited.
The determining unit 3022 is configured to determine that the name information of the unit to be checked is an abbreviation when the attribute values of the administrative area information and the industry attribute information are both null.
In an embodiment of the present application, the information extracting unit 3021 is specifically configured to: and matching the name information of the unit to be audited by using the regular expression, and extracting the main part information which accords with the format of the regular expression. And performing structured information extraction on the main part information to obtain administrative area information and industry attribute information of the name information of the unit to be audited.
In one embodiment of the present application, as shown in fig. 4, the apparatus may further include:
a third obtaining module 307, configured to obtain a distributed representation vector of the abbreviation sample.
A fourth obtaining module 308, configured to obtain a plurality of distributed representation vectors of sample information of full-name unit names.
A third determining module 309, configured to determine similarity between the abbreviation sample and each full-name unit name sample information according to the distributed representation vector of the abbreviation sample and the distributed representation vectors of the multiple full-name unit name sample information.
And a selecting module 310, configured to select, according to the similarity, target full-name unit name sample information with the greatest similarity to the abbreviation sample from the multiple pieces of full-name unit name sample information.
And the constructing module 311 is configured to construct an abbreviation library according to a corresponding relationship between the abbreviation sample and the target full-name unit name sample information.
In an embodiment of the application, the first structured information and the second structured information have the same type and number of structured elements, and the number of structured elements is multiple, and the second determining module 306 is specifically configured to: and comparing the text information corresponding to the same type of structural elements in the first structural information and the second structural information. And if the text information corresponding to the same type of structural elements in the first structural information and the second structural information is the same, determining that the name information of the unit to be audited passes the audit. And if the text information parts corresponding to the same type of structural elements in the first structural information and the second structural information are the same, determining that the name information of the unit to be audited does not pass the audit.
In the auditing device of the embodiment of the application, in the process of auditing the name information to be audited, under the condition that the name information to be audited is determined to be an abbreviation, the full-name unit name information corresponding to the unit information of the name to be audited is obtained according to a preset abbreviation library; carrying out structured information extraction on the full-name unit name information to obtain first structured information of the full-name unit name information; acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information; and determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal. Therefore, by combining the structural information of the unit name information, the name to be audited is accurately audited, and the accuracy of auditing the name unit is improved.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
FIG. 5 is a block diagram of an electronic device according to one embodiment of the present application.
As shown in fig. 5, the electronic device includes:
memory 501, processor 502, and computer instructions stored on memory 501 and executable on processor 502.
The processor 502, when executing the instructions, implements the auditing methods provided in the embodiments described above.
Further, the electronic device further includes:
a communication interface 503 for communication between the memory 501 and the processor 502.
Memory 501 for storing computer instructions executable on processor 502.
The memory 501 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 502 is configured to implement the auditing method of the above embodiments when executing the program.
If the memory 501, the processor 502 and the communication interface 503 are implemented independently, the communication interface 503, the memory 501 and the processor 502 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on a chip, the memory 501, the processor 502, and the communication interface 503 may complete communication with each other through an internal interface.
The processor 502 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present application also provides a computer program product, which when executed by an instruction processor in the computer program product implements the auditing method of the embodiments of the present application.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (14)

1. An auditing method, characterized in that the method comprises:
receiving an audit request of a request terminal, wherein the audit request comprises name information of a unit to be audited and corresponding user information;
determining the name information of the unit to be audited as an abbreviation;
acquiring full-name unit name information corresponding to the unit information of the name to be audited according to a preset abbreviation library;
carrying out structured information extraction on the full name unit name information to obtain first structured information of the full name unit name information;
acquiring standard unit name information corresponding to the user information, and acquiring second structured information corresponding to the standard unit name information;
and determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal.
2. The method of claim 1, wherein the determining that the name information of the unit to be audited is an abbreviation comprises:
inquiring whether an abbreviation matched with the name information of the unit to be checked exists in the abbreviation library;
and if the abbreviation matched with the text information exists in the abbreviation library, determining the name information of the unit to be audited as the abbreviation.
3. The method of claim 1, wherein the determining that the name information of the unit to be audited is an abbreviation comprises:
extracting information of the name information of the unit to be audited to obtain administrative region information and industry attribute information of the name information of the unit to be audited;
and under the condition that the attribute values of the administrative area information and the industry attribute information are both null, determining the name information of the unit to be audited as an abbreviation.
4. The method according to claim 3, wherein the extracting information of the name information of the unit to be audited to obtain administrative area information and industry attribute information of the name information of the unit to be audited comprises:
matching the name information of the unit to be audited by using a regular expression, and extracting main part information which accords with the format of the regular expression;
and performing structured information extraction on the main part information to obtain administrative area information and industry attribute information of the name information of the unit to be audited.
5. The method of claim 2, wherein the acronym library is built by:
obtaining a distributed representation vector of the abbreviation sample;
acquiring distributed expression vectors of a plurality of full-name unit name sample information;
determining similarity between the abbreviation sample and each of the full-name unit name sample information according to the distributed representation vector of the abbreviation sample and the distributed representation vectors of the full-name unit name sample information;
according to the similarity, target full-name unit name sample information with the maximum similarity to the abbreviation sample is selected from the multiple pieces of full-name unit name sample information;
and constructing the abbreviation library according to the corresponding relation between the abbreviation sample and the target full-name unit name sample information.
6. The method according to claim 2, wherein the first structured information and the second structured information have the same type and number of structured elements, and the number of structured elements is multiple, and determining the review result of the name information of the unit to be reviewed according to the first structured information and the second structured information includes:
comparing the text information corresponding to the same type of structural elements in the first structural information and the second structural information;
if the text information corresponding to the same type of structural elements in the first structural information and the second structural information is the same, determining that the name information of the unit to be audited passes the audit;
and if the text information parts corresponding to the same type of structural elements in the first structural information and the second structural information are the same, determining that the name information of the unit to be audited does not pass the audit.
7. An auditing apparatus, the apparatus comprising:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an auditing request of a request terminal, and the auditing request comprises name information of a unit to be audited and corresponding user information;
the first determining module is used for determining that the name information of the unit to be audited is an abbreviation;
the first acquisition module is used for acquiring full-name unit name information corresponding to the unit information of the name to be audited according to a preset abbreviation library;
the first extraction module is used for carrying out structured information extraction on the full-name unit name information to obtain first structured information of the full-name unit name information;
the second acquisition module is used for acquiring standard unit name information corresponding to the user information and acquiring second structured information corresponding to the standard unit name information;
and the second determining module is used for determining an auditing result of the name information of the unit to be audited according to the first structural information and the second structural information, and feeding the auditing result back to the request terminal.
8. The apparatus of claim 7, wherein the first determining module is specifically configured to:
inquiring whether an abbreviation matched with the name information of the unit to be checked exists in the abbreviation library;
and if the abbreviation matched with the text information exists in the abbreviation library, determining the name information of the unit to be audited as the abbreviation.
9. The apparatus of claim 7, wherein the first determining module comprises:
the information extraction unit is used for extracting the information of the name information of the unit to be audited so as to obtain administrative region information and industry attribute information of the name information of the unit to be audited;
and the determining unit is used for determining the name information of the unit to be audited as the abbreviation under the condition that the attribute values of the administrative area information and the industry attribute information are both null.
10. The apparatus as claimed in claim 9, wherein the information extraction unit is specifically configured to:
matching the name information of the unit to be audited by using a regular expression, and extracting main part information which accords with the format of the regular expression;
and performing structured information extraction on the main part information to obtain administrative area information and industry attribute information of the name information of the unit to be audited.
11. The apparatus of claim 8, wherein the apparatus further comprises:
the third acquisition module is used for acquiring the distributed expression vectors of the abbreviation samples;
the fourth acquisition module is used for acquiring a plurality of distributed expression vectors of the full-name unit name sample information;
a third determining module, configured to determine, according to the distributed representation vector of the abbreviation sample and the distributed representation vectors of the multiple pieces of full-name unit name sample information, a similarity between the abbreviation sample and each piece of full-name unit name sample information;
the selection module is used for selecting target full-name unit name sample information with the maximum similarity with the abbreviation sample from the plurality of full-name unit name sample information according to the similarity;
and the construction module is used for constructing the abbreviation library according to the corresponding relation between the abbreviation sample and the target full-name unit name sample information.
12. The apparatus of claim 8, wherein the first structured information and the second structured information have the same type and number of structured elements, and wherein the number of structured elements is multiple, and wherein the second determining module is specifically configured to:
comparing the text information corresponding to the same type of structural elements in the first structural information and the second structural information;
if the text information corresponding to the same type of structural elements in the first structural information and the second structural information is the same, determining that the name information of the unit to be audited passes the audit;
and if the text information parts corresponding to the same type of structural elements in the first structural information and the second structural information are the same, determining that the name information of the unit to be audited does not pass the audit.
13. An electronic device, comprising: a memory, a processor; the memory has stored therein computer instructions which, when executed by the processor, carry out an auditing method according to any one of claims 1-6.
14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the auditing method of any of claims 1-6.
CN202110365261.3A 2021-04-01 2021-04-01 Auditing method, auditing device, electronic equipment and storage medium Pending CN114116958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110365261.3A CN114116958A (en) 2021-04-01 2021-04-01 Auditing method, auditing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110365261.3A CN114116958A (en) 2021-04-01 2021-04-01 Auditing method, auditing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114116958A true CN114116958A (en) 2022-03-01

Family

ID=80359254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110365261.3A Pending CN114116958A (en) 2021-04-01 2021-04-01 Auditing method, auditing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114116958A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328937A (en) * 2022-03-10 2022-04-12 中国医学科学院医学信息研究所 Scientific research institution information processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328937A (en) * 2022-03-10 2022-04-12 中国医学科学院医学信息研究所 Scientific research institution information processing method and device

Similar Documents

Publication Publication Date Title
CN108427731B (en) Page code processing method and device, terminal equipment and medium
US20110038531A1 (en) Learning string transformations from examples
CN111159329B (en) Sensitive word detection method, device, terminal equipment and computer readable storage medium
CN110737818A (en) Network release data processing method and device, computer equipment and storage medium
CN111506608B (en) Structured text comparison method and device
CN110427375B (en) Method and device for identifying field type
CN112100359A (en) Test case searching method, device, equipment and storage medium
CN115098556A (en) User demand matching method and device, electronic equipment and storage medium
CN111143359A (en) Query statement generation method and device
CN114116958A (en) Auditing method, auditing device, electronic equipment and storage medium
CN111460098A (en) Text matching method and device and terminal equipment
CN113779266A (en) Knowledge graph-based information processing method and device
CN110188033B (en) Data detection device, method, computer device, and computer-readable storage medium
CN111126056B (en) Method and device for identifying trigger words
CN115544132A (en) Data import method and system and electronic equipment
CN112612817B (en) Data processing method, device, terminal equipment and computer readable storage medium
CN112417020B (en) Service expansion realization method, device, computer equipment and storage medium
CN115033496A (en) Data processing method and device and electronic equipment
CN113094415B (en) Data extraction method, data extraction device, computer readable medium and electronic equipment
CN109710651B (en) Data type identification method and device
CN112784596A (en) Method and device for identifying sensitive words
CN113822692B (en) Commodity information processing method, commodity information processing device, electronic equipment and storage medium
CN110263399B (en) Data processing method and device based on Hsps and electronic equipment
CN113766545B (en) Identity recognition method and device for wireless network
US20240223463A1 (en) Method and System for Analysis of Hardware Infrastructure Deployment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination