CN106294433B - Equipment information processing method and device - Google Patents

Equipment information processing method and device Download PDF

Info

Publication number
CN106294433B
CN106294433B CN201510276430.0A CN201510276430A CN106294433B CN 106294433 B CN106294433 B CN 106294433B CN 201510276430 A CN201510276430 A CN 201510276430A CN 106294433 B CN106294433 B CN 106294433B
Authority
CN
China
Prior art keywords
information
input text
information base
base
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510276430.0A
Other languages
Chinese (zh)
Other versions
CN106294433A (en
Inventor
涂建超
程搏
蔡林霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510276430.0A priority Critical patent/CN106294433B/en
Publication of CN106294433A publication Critical patent/CN106294433A/en
Application granted granted Critical
Publication of CN106294433B publication Critical patent/CN106294433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a device information processing method, which comprises the following steps: after a first information base and a second information base are established, reading equipment information to be processed in the first information base; splicing the to-be-processed equipment information into a URL address of a search engine; accessing the URL address through a search engine to acquire the to-be-processed equipment information as an input text, and comparing the input text with the input text of the second information base; associating the input text with the input text of the second information repository when the input text matches the input text of the second information repository. The invention also discloses a device information processing device. The invention has the advantages of convenient equipment information reading, high accuracy of the acquired equipment information and improvement of the intelligent degree of equipment information acquisition and maintenance.

Description

Equipment information processing method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing device information.
Background
With the continuous development of terminal technology, more and more terminals enter people's daily life and work, and along with the increase of terminals, the brands, models and systems of the terminals are also increasing. Taking the Android system as an example, due to the openness of the Android platform, after the Android platform is manually operated by a flash, a ROOT and the like, hardware parameters of the terminal cannot be acquired, or acquired information is manually modified, so that the Android platform has five doors and no standard.
At present, hardware information of a terminal is collected by sampling through an API (application program interface) of a smart phone, and besides the insufficient sample amount, the hardware information can not cover various complex real hardware environments (such as machine flushing, root and the like) in the actual process due to the fact that the actual scene used by a user is separated, so that the accuracy and the coverage of the collected hardware data are low; the model acquisition mode mainly adopts manual collection and maintenance of model information, and the collected model information cannot be directly matched with information acquired by a terminal, so that the usability is extremely low.
In summary, the device information (hardware information, model information, etc.) acquired in the existing method has poor accuracy and poor readability, and the degree of intelligence required to be manually acquired and maintained is low.
Disclosure of Invention
The embodiment of the invention provides an equipment information processing method and device, and aims to solve the problems that equipment information (hardware information, machine type information and the like) acquired in the existing mode is poor in accuracy and legibility, and low in degree of intelligence due to manual acquisition and maintenance.
In order to achieve the above object, an embodiment of the present invention provides an apparatus information processing method, including:
after a first information base and a second information base are established, reading equipment information to be processed in the first information base;
splicing the to-be-processed equipment information into a URL address of a search engine;
accessing the URL address through a search engine to acquire the to-be-processed equipment information as an input text, and comparing the input text with the input text of the second information base;
associating the input text with the input text of the second information repository when the input text matches the input text of the second information repository.
In order to achieve the above object, an embodiment of the present invention further provides an apparatus information processing device, including:
the reading module is used for reading the equipment information to be processed in the first information base after the first information base and the second information base are established;
the splicing module is used for splicing the to-be-processed equipment information into a URL (uniform resource locator) address of a search engine;
the acquisition module is used for accessing the URL address through a search engine to acquire the to-be-processed equipment information as an input text;
the comparison module is used for comparing the input text with the input text of the second information base;
and the association module is used for associating the input text with the input text of the second information base when the input text is matched with the input text of the second information base.
According to the method, the information to be processed is spliced into the URL address of the search engine, and the equipment information is associated through comparison of the input texts, namely, the word conversion relation of the equipment information is established. The problems that the acquired equipment information (hardware information, model information and the like) is poor in accuracy and legibility and low in intelligent degree of manual acquisition and maintenance are effectively solved, the equipment information is convenient to read, the acquired equipment information is high in accuracy, and the intelligent degree of equipment information acquisition and maintenance is improved.
Drawings
Fig. 1 is a schematic diagram of a hardware architecture related to an apparatus for acquiring device information according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a first embodiment of an apparatus information obtaining method according to the present invention;
FIG. 3 is a schematic flowchart illustrating an embodiment of comparing the input text with the input text in the second information base;
fig. 4 is a flowchart illustrating a second embodiment of the device information obtaining method according to the present invention;
FIG. 5 is an overall architecture diagram of an embodiment of the lighthouse function library of the present invention;
FIG. 6 is a block diagram of the overall design framework of an embodiment of the data processing portion of the present invention;
FIG. 7 is a flow chart illustrating an embodiment of a data processing portion of the present invention;
FIG. 8 is a diagram illustrating word segmentation result processing and optimization testing in accordance with an embodiment of the present invention;
FIG. 9 is a diagram illustrating an embodiment of brand categorization results used as statistics after categorization in accordance with the present invention;
FIG. 10 is a diagram illustrating an embodiment of the model categorization result used as a statistic after categorization according to the present invention;
fig. 11 is a functional block diagram of a device information acquiring apparatus according to a first embodiment of the present invention;
FIG. 12 is a schematic diagram of a detailed functional module of the first embodiment of the comparison module in FIG. 11;
fig. 13 is a functional block diagram of the device information acquiring apparatus according to the first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: after a first information base and a second information base are established, reading equipment information to be processed in the first information base; splicing the to-be-processed equipment information into a URL address of a search engine; accessing the URL address through a search engine to acquire the to-be-processed equipment information as an input text, and comparing the input text with the input text of the second information base; associating the input text with the input text of the second information repository when the input text matches the input text of the second information repository. The equipment information acquired by the search engine is automated, the word description is established to convert the corresponding relation, and the semantic regularity of brands and model fields is realized, so that the equipment information is convenient to read, the acquired equipment information is high in accuracy, and the intelligent degree of equipment information acquisition and maintenance is improved.
The problems that the accuracy of the acquired equipment information (hardware information, model information and the like) is poor, the readability is poor, and the intelligent degree of manual acquisition and maintenance is low exist in the existing equipment information processing mode.
The embodiment of the invention constructs an equipment information acquisition device, and the equipment information acquisition device associates equipment information by splicing information to be processed into a URL (uniform resource locator) address of a search engine and comparing input texts, namely establishing an equipment information word conversion relation. The problems that the acquired equipment information (hardware information, model information and the like) is poor in accuracy and legibility and low in intelligent degree of manual acquisition and maintenance are effectively solved, the equipment information is convenient to read, the acquired equipment information is high in accuracy, and the intelligent degree of equipment information acquisition and maintenance is improved.
The device information acquiring apparatus of the present embodiment may be carried on a PC, or may be carried on an electronic terminal such as a mobile phone or a tablet computer, which is capable of acquiring and querying device information. The hardware architecture involved in the device information acquisition apparatus may be as shown in fig. 1.
Fig. 1 shows a hardware architecture related to a device information acquisition apparatus according to an embodiment of the present invention. As shown in fig. 1, the hardware related to the device information acquiring apparatus includes: a processor 301, e.g. a CPU, a network interface 304, a user interface 303, a memory 305, a communication bus 302. The communication bus 302 is used for implementing connection communication between the components in the information push platform. The user interface 303 may include a Display (Display), a Keyboard (Keyboard), a mouse, and the like, and is configured to receive information input by a user and send the received information to the processor 305 for processing. The display screen may be an LCD display screen, an LED display screen, or a touch screen, and is used for displaying data to be displayed by the device information acquiring apparatus, such as an operation interface for displaying device information query, device information acquisition, and the like. The optional user interface 303 may also include a standard wired interface, a wireless interface. The network interface 304 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). Memory 305 may be a high-speed RAM memory or may be a non-volatile memory, such as a disk memory. The memory 305 may alternatively be a storage device separate from the processor 301 described above. As shown in fig. 1, the memory 305, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device information acquisition program.
In the hardware related to the device information acquiring apparatus shown in fig. 1, the network interface 304 is mainly used for connecting to an application platform and performing data communication with the application platform; the user interface 303 is mainly used for connecting a client, performing data communication with the client, and receiving information and instructions input by the client; and the processor 301 may be configured to invoke the device information acquisition program stored in the memory 305 and perform the following operations:
after a first information base and a second information base are established, reading equipment information to be processed in the first information base;
splicing the to-be-processed equipment information into a URL address of a search engine;
accessing the URL address through a search engine to acquire the to-be-processed equipment information as an input text, and comparing the input text with the input text of the second information base;
associating the input text with the input text of the second information repository when the input text matches the input text of the second information repository.
Further, in one embodiment, the processor 301 invoking the device information acquisition program stored in the memory 305 may perform the following operations:
segmenting the input text to obtain an input text after segmentation;
and acquiring a word segmentation input text from the second information base, and comparing the input text after word segmentation with the word segmentation input text.
Further, in one embodiment, the processor 301 invoking the device information acquisition program stored in the memory 305 may perform the following operations:
when the input text is not matched with the input text of the second information base, selecting a preset number of input texts from the input text and the input text of the second information base according to a preset mode and storing the input texts;
and receiving a correlation instruction based on the selected input text, and correlating the input text corresponding to the correlation instruction.
Further, in one embodiment, the processor 301 invoking the device information acquisition program stored in the memory 305 may perform the following operations:
and receiving the equipment information reported by the SDK, and storing the reported equipment information as a first information base.
Further, in one embodiment, the processor 301 invoking the device information acquisition program stored in the memory 305 may perform the following operations:
and acquiring equipment information through a third-party website, and segmenting the acquired equipment information to be used as a segmentation input text and storing the segmentation input text as a second information base.
According to the scheme, the information to be processed is spliced into the URL address of the search engine, and the equipment information is associated through comparison of the input texts, namely, the word conversion relation of the equipment information is established. The problems that the acquired equipment information (hardware information, model information and the like) is poor in accuracy and legibility and low in intelligent degree of manual acquisition and maintenance are effectively solved, the equipment information is convenient to read, the acquired equipment information is high in accuracy, and the intelligent degree of equipment information acquisition and maintenance is improved.
Based on the hardware architecture, the embodiment of the equipment information acquisition method is provided.
As shown in fig. 2, a first embodiment of an apparatus information acquiring method according to the present invention is proposed, where the apparatus information acquiring method includes:
step S10, after the first information base and the second information base are established, the device information to be processed in the first information base is read;
in this embodiment, a first information base and a second information base are established in advance, the first information base includes device information, the device information includes, but is not limited to, device hardware information such as a device brand, a machine type, a RAM, a ROM, and the like, and the second information base is device information, including, but not limited to, device hardware information such as a brand, a machine type, whether a primary key is included, and the like. The process of constructing the first information base comprises the following steps: and receiving the equipment information reported by the SDK, and storing the reported equipment information as a first information base. Specifically, through calling an API interface of the intelligent device, device hardware information including a brand, a model, a RAM, a ROM and the like is reported through a fixed event rqd _ model, and according to daily experience and actual data analysis, it is found that, generally, a model can be uniquely determined by a brand + a model + a ROM + a network system of an intelligent device, and when the four parameters are consistent, other parameter information is the same (except for special cases such as emulations and machine-flushing, the information can be used as one of judgment factors for machine-flushing). The network system information is related to daily scenes of people, and can be temporarily not used as a KEY KEY value for judging the unique machine type after analysis; the ROM parameters are numerical, and the normalization rules are relatively simple, so in this embodiment, the ROM parameters are mainly used for the automation specification of the brands and the model fields. A first information table representation example is shown in Table 1:
Figure BDA0000724767510000061
TABLE 1
The process of creating the second information base may include: and acquiring equipment information through a third-party website, and segmenting the acquired equipment information to be used as a segmentation input text and storing the segmentation input text as a second information base. The third-party website comprises a mainstream mobile phone official website, an industry and credit department website, a third-party mobile phone information website and the like, equipment information is obtained from the websites to form network machine type library data, and the obtained equipment information is subjected to numeralization through a word segmentation tool to serve as word segmentation input texts. A second information table representation example is shown in Table 2:
name of field Meaning of a field Whether or not to make a key Example of field values
Brand Brand Y Three stars
Model Model type Y GT-I9100
……
TABLE 2
After the first information base and the second information base are established, the device information data to be processed in the first information base is read, namely the data needing normalized processing is read from the first information base. Preferably, the device information to be processed is keyword information of the device information stored in the first information base, and is stored in a text file by line as an input source.
Step S20, splicing the to-be-processed equipment information into a URL address of a search engine;
in this embodiment, Python provides a ready-made http protocol method, and concatenates the input keyword information into a URL address of a search engine in a parameter form (for example, if "MI 2 mobile phone" is input, the URL address is concatenated to http: m.***.com/s.
Step S30, accessing the URL address through a search engine to acquire the equipment information to be processed as an input text, and comparing the input text with the input text of the second information base;
after the equipment information is spliced in the URL address, the URL is accessed, and a returned data packet is captured to be used as an input text for word segmentation analysis. Performing word segmentation on a text, and comparing the input text with an input text in a second information base to determine whether the input text is matched with the input text in the second information base
Specifically, referring to fig. 3, the process of comparing the input text with the input text of the second information base includes:
step S31, segmenting the input text to obtain segmented input text;
step S32, obtaining a word segmentation input text from the second information base, and comparing the input text after word segmentation with the word segmentation input text.
Performing word segmentation on the input text through a word segmentation tool to obtain a word segmented input text, wherein the word segmentation operation utilizes a ready-made tool provided by the open source project jieba word segmentation to perform keyword extraction, for example: jieba analysis extract tags (sensor copk), states: the content is an input text to be extracted, and the content is a text returned by searching keywords in the first information base; topk is the keyword with the largest return weight, and 5 keywords with the largest return weight are preferred in the project; the topk is keyword information required to be used for manual classification.
And step S40, when the input text is matched with the input text of the second information base, the input text is associated with the input text of the second information base.
And when the input text is matched with the input text of the second information base, namely when the input text after word segmentation is matched with the word segmentation input text in the second information base, automatically writing back the database, and combining and writing back the matched input text to the database.
In the embodiment, the information to be processed is spliced into the URL address of the search engine, and the device information is associated by comparing the input texts, that is, the device information word conversion relationship is established. The problems that the acquired equipment information (hardware information, model information and the like) is poor in accuracy and legibility and low in intelligent degree of manual acquisition and maintenance are effectively solved, the equipment information is convenient to read, the acquired equipment information is high in accuracy, and the intelligent degree of equipment information acquisition and maintenance is improved.
Further, a second embodiment of the present invention is proposed based on the first embodiment of the above-described device information acquisition method. As shown in fig. 4, after the step S30, the method may further include:
step S50, when the input text is not matched with the input text of the second information base, selecting a preset number of input texts from the input text and the input text of the second information base according to a preset mode and storing the input texts;
step S60, receiving a correlation instruction based on the selected input text, and correlating the input text corresponding to the correlation instruction.
In this embodiment, the predetermined manner is TF-IDF (term frequency-inverse document frequency index), which is a statistical method for evaluating the importance of a word to one of a set of files or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by the engine as a measure or rating of the degree of relevance between a document and a user query. The preset number is preferably 5, that is, 5 pieces of keyword information with the largest weight value are selected and written into a manual classification system for manual selection, and a 'relationship' is manually established, that is, an association instruction based on the selected input text is received, and the input text corresponding to the association instruction is associated. The above process has been automated by Python scripts, deploying routine scheduling tasks in the data factory, and executing on a daily basis.
In the embodiment, when the input file is not matched with the input text in the second information base, part of the input file is output according to a preset mode for manually establishing an association relationship, so that the accuracy of the equipment information is further ensured.
To better describe the information processing process of the device of the present invention, taking a lighthouse as an example, refer to fig. 5, which is an overall architecture diagram of a lighthouse function library, some terms are explained in lighthouse functions, qimei: the identification ID for identifying the unique identification of the mobile terminal is solved in the lighthouse project, the ID is calculated by a mathematical method based on various inherent IDs (such as IMEI, ID such as MAC, IMSI and the like, and unique terminal equipment cannot be effectively identified in an actual complex scene), and finally the purpose of confirming the unique terminal equipment is achieved;
a lighthouse: the operation solution based on the terminal has functions of user analysis, terminal analysis, network analysis, APP quality optimization and the like, and provides a platform product of omnibearing operation service for the mobile APP;
and (3) beacon SDK: in the lighthouse solution, the software development tool kit is used for being embedded into an intelligent terminal APP and used for acquiring the relevant information of the intelligent terminal and the relevant information of the APP within the user authorization range;
a word bank: in the word segmentation process, the word segmentation success rate is improved for the linguistic data in the specific field, and the word bank of the linguistic data in the field is provided, which refers to information obtained through a web crawler, a vocabulary set of intelligent terminal brand and model information is sorted and screened out, and the vocabulary set is sorted into a mobile phone brand word bank and a mobile phone model word bank.
And (4) reporting by a lighthouse: forming a lighthouse function library by using real massive user terminal hardware information;
the model base maintained by the department of industry and trust, the evaluation portal, the official network of the manufacturer and the business; the system comprises a plurality of data sources, a crawler network model base and a database, wherein the data sources almost cover all brand and model information on the market to form the crawler network model base;
the data processing part integrally designs a framework, and is divided into four parts according to an information automatic planning process, and reference is made to fig. 6:
1. constructing a lighthouse function library and a crawler network machine type library;
2. building a semantic association relation of keywords between the lighthouse function library and the crawler function library;
3. manual intervention, leakage detection and gap filling;
4. and merging the information of the two libraries.
The detailed step flow is shown in fig. 7:
the word segmentation result processing and optimization test is shown in fig. 8: take the following data as an example: XIAOMIMI3@ millet, MI3, XIAOMI, secret, XIAOMIMI3 WCDMA. The input query key word is XIAOMIMI 3; the returned word segmentation results are: returning top5 to the word stock of the top5 words sorted according to the word frequency and the inverse language frequency for one-to-one matching, and if the matching is successful, establishing the relationship; and if the matching is not successful, entering a manual matching link. The calculation results used as statistics after classification are shown in FIG. 9 as brand classification results; as shown in fig. 10, the model classification result is obtained. The crawler program in this embodiment may be implemented in different languages (e.g., perl, ruby, etc.); meanwhile, aiming at different scenes and purposes, a personalized word stock and a personalized language database can be established by self to adjust the accuracy of word segmentation and the TF-IDF index; the search engine can be replaced by different search engines or can be established by self; the word segmentation tool can adopt other similar tools or self-writing tools; the artificial association rule also has certain artificial traces, and a more suitable association rule can be made according to different specific application scenes. The application value of the invention lies in that a set of automatic 'semantic conversion relation' system can be constructed by using the disclosed technology and tools under the condition of limited human input, so that the accuracy and readability of the terminal function library information are greatly improved, and meanwhile, the manual maintenance cost is reduced.
Correspondingly, the invention provides a preferred embodiment of the device information acquisition device. Referring to fig. 8, the device information acquiring apparatus includes an acquiring module 10, a splicing module 20, and a pushing module 30.
The reading module 10 is configured to, after a first information base and a second information base are established, read information of a device to be processed in the first information base;
in this embodiment, a first information base and a second information base are established in advance, the first information base includes device information, the device information includes, but is not limited to, device hardware information such as a device brand, a machine type, a RAM, a ROM, and the like, and the second information base is device information, including, but not limited to, device hardware information such as a brand, a machine type, whether a primary key is included, and the like. The process of constructing the first information base comprises the following steps: and receiving the equipment information reported by the SDK, and storing the reported equipment information as a first information base. Specifically, through calling an API interface of the intelligent device, device hardware information including a brand, a model, a RAM, a ROM and the like is reported through a fixed event rqd _ model, and according to daily experience and actual data analysis, it is found that, generally, a model can be uniquely determined by a brand + a model + a ROM + a network system of an intelligent device, and when the four parameters are consistent, other parameter information is the same (except for special cases such as emulations and machine-flushing, the information can be used as one of judgment factors for machine-flushing). The network system information is related to daily scenes of people, and can be temporarily not used as a KEY KEY value for judging the unique machine type after analysis; the ROM parameters are numerical, and the normalization rules are relatively simple, so in this embodiment, the ROM parameters are mainly used for the automation specification of the brands and the model fields. A first information table representation example is shown in Table 1:
the process of creating the second information base may include: and acquiring equipment information through a third-party website, and segmenting the acquired equipment information to be used as a segmentation input text and storing the segmentation input text as a second information base. The third-party website comprises a mainstream mobile phone official website, an industry and credit department website, a third-party mobile phone information website and the like, equipment information is obtained from the websites to form network machine type library data, and the obtained equipment information is subjected to numeralization through a word segmentation tool to serve as word segmentation input texts. A second information table representation example is shown in Table 2:
after the first information base and the second information base are established, the device information data to be processed in the first information base is read, namely the data needing normalized processing is read from the first information base. Preferably, the device information to be processed is keyword information of the device information stored in the first information base, and is stored in a text file by line as an input source.
The splicing module 20 is configured to splice the to-be-processed device information into a URL address of a search engine;
in this embodiment, Python provides a ready-made http protocol method, and concatenates the input keyword information into a URL address of a search engine in a parameter form (for example, if "MI 2 mobile phone" is input, the URL address is concatenated to http: m.***.com/s.
The obtaining module 30 is configured to access the URL address through a search engine to obtain the to-be-processed device information as an input text;
the comparison module 40 is configured to compare the input text with the input text of the second information base;
after the equipment information is spliced in the URL address, the URL is accessed, and a returned data packet is captured to be used as an input text for word segmentation analysis. Performing word segmentation on a text, and comparing the input text with an input text in a second information base to determine whether the input text is matched with the input text in the second information base
Specifically, referring to fig. 12, the comparing module 40 includes a word segmentation unit 41 and a comparing unit 42,
the word segmentation unit 41 is configured to segment words of the input text to obtain a segmented input text;
the comparing unit 42 is configured to obtain a word segmentation input text from the second information base, and compare the input text after word segmentation with the word segmentation input text.
Performing word segmentation on the input text through a word segmentation tool to obtain a word segmented input text, wherein the word segmentation operation utilizes a ready-made tool provided by the open source project jieba word segmentation to perform keyword extraction, for example: jieba analysis extract tags (sensor copk), states: the content is an input text to be extracted, and the content is a text returned by searching keywords in the first information base; topk is the keyword with the largest return weight, and 5 keywords with the largest return weight are preferred in the project; the topk is keyword information required to be used for manual classification.
The associating module 50 is configured to associate the input text with the input text of the second information base when the input text matches the input text of the second information base.
And when the input text is matched with the input text of the second information base, namely when the input text after word segmentation is matched with the word segmentation input text in the second information base, automatically writing back the database, and combining and writing back the matched input text to the database.
In the embodiment, the information to be processed is spliced into the URL address of the search engine, and the device information is associated by comparing the input texts, that is, the device information word conversion relationship is established. The problems that the acquired equipment information (hardware information, model information and the like) is poor in accuracy and legibility and low in intelligent degree of manual acquisition and maintenance are effectively solved, the equipment information is convenient to read, the acquired equipment information is high in accuracy, and the intelligent degree of equipment information acquisition and maintenance is improved.
Further, a second embodiment of the apparatus information acquisition device of the present invention is proposed based on the first embodiment of the apparatus information acquisition device described above. As shown in fig. 13, the device information acquiring apparatus may further include: a selection module 60, a saving module 70 and a receiving module 80,
the selecting module 60 is configured to select a preset number of input texts from the input texts and the input texts of the second information base according to a preset manner when the input texts are not matched with the input texts of the second information base;
the saving module 70 is configured to save the selected input text;
the receiving module 80 is configured to receive an association instruction based on the selected input text;
the associating module 50 is further configured to associate the input text corresponding to the associating instruction.
In this embodiment, the predetermined manner is TF-IDF (term frequency-inverse document frequency index), which is a statistical method for evaluating the importance of a word to one of a set of files or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by the engine as a measure or rating of the degree of relevance between a document and a user query. The preset number is preferably 5, that is, 5 pieces of keyword information with the largest weight value are selected and written into a manual classification system for manual selection, and a 'relationship' is manually established, that is, an association instruction based on the selected input text is received, and the input text corresponding to the association instruction is associated. The above process has been automated by Python scripts, deploying routine scheduling tasks in the data factory, and executing on a daily basis.
In the embodiment, when the input file is not matched with the input text in the second information base, part of the input file is output according to a preset mode for manually establishing an association relationship, so that the accuracy of the equipment information is further ensured.
To better describe the information processing process of the device of the present invention, taking a lighthouse as an example, refer to fig. 5, which is an overall architecture diagram of a lighthouse function library, some terms are explained in lighthouse functions, qimei: the identification ID for identifying the unique identification of the mobile terminal is solved in the lighthouse project, the ID is calculated by a mathematical method based on various inherent IDs (such as IMEI, ID such as MAC, IMSI and the like, and unique terminal equipment cannot be effectively identified in an actual complex scene), and finally the purpose of confirming the unique terminal equipment is achieved;
a lighthouse: the operation solution based on the terminal has functions of user analysis, terminal analysis, network analysis, APP quality optimization and the like, and provides a platform product of omnibearing operation service for the mobile APP;
and (3) beacon SDK: in the lighthouse solution, the software development tool kit is used for being embedded into an intelligent terminal APP and used for acquiring the relevant information of the intelligent terminal and the relevant information of the APP within the user authorization range;
a word bank: in the word segmentation process, the word segmentation success rate is improved for the linguistic data in the specific field, and the word bank of the linguistic data in the field is provided, which refers to information obtained through a web crawler, a vocabulary set of intelligent terminal brand and model information is sorted and screened out, and the vocabulary set is sorted into a mobile phone brand word bank and a mobile phone model word bank.
And (4) reporting by a lighthouse: forming a lighthouse function library by using real massive user terminal hardware information;
the model base maintained by the department of industry and trust, the evaluation portal, the official network of the manufacturer and the business; the system comprises a plurality of data sources, a crawler network model base and a database, wherein the data sources almost cover all brand and model information on the market to form the crawler network model base;
the data processing part integrally designs a framework, and is divided into four parts according to an information automatic planning process, and reference is made to fig. 6:
1. constructing a lighthouse function library and a crawler network machine type library;
2. building a semantic association relation of keywords between the lighthouse function library and the crawler function library;
3. manual intervention, leakage detection and gap filling;
4. and merging the information of the two libraries.
The detailed step flow is shown in fig. 7:
the word segmentation result processing and optimization test is shown in fig. 8: take the following data as an example: XIAOMIMI3@ millet, MI3, XIAOMI, secret, XIAOMIMI3 WCDMA. The input query key word is XIAOMIMI 3; the returned word segmentation results are: returning top5 to the word stock of the top5 words sorted according to the word frequency and the inverse language frequency for one-to-one matching, and if the matching is successful, establishing the relationship; and if the matching is not successful, entering a manual matching link. The calculation results used as statistics after classification are shown in FIG. 9 as brand classification results; as shown in fig. 10, the model classification result is obtained. The crawler program in this embodiment may be implemented in different languages (e.g., perl, ruby, etc.); meanwhile, aiming at different scenes and purposes, a personalized word stock and a personalized language database can be established by self to adjust the accuracy of word segmentation and the TF-IDF index; the search engine can be replaced by different search engines or can be established by self; the word segmentation tool can adopt other similar tools or self-writing tools; the artificial association rule also has certain artificial traces, and a more suitable association rule can be made according to different specific application scenes. The application value of the invention lies in that a set of automatic 'semantic conversion relation' system can be constructed by using the disclosed technology and tools under the condition of limited human input, so that the accuracy and readability of the terminal function library information are greatly improved, and meanwhile, the manual maintenance cost is reduced.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims (12)

1. An apparatus information processing method characterized by comprising:
after a first information base and a second information base are established, reading equipment information to be processed in the first information base;
splicing the to-be-processed equipment information into a URL address of a search engine;
accessing the URL address through a search engine to acquire the to-be-processed equipment information as an input text, and comparing the input text with the input text of the second information base;
associating the input text with the input text of the second information repository when the input text matches the input text of the second information repository.
2. The device information processing method according to claim 1, wherein the step of comparing the input text with the input text of the second information base includes:
segmenting the input text to obtain an input text after segmentation;
and acquiring a word segmentation input text from the second information base, and comparing the input text after word segmentation with the word segmentation input text.
3. The device information processing method according to claim 1 or 2, wherein after the step of comparing the input text with the input text of the second information base, further comprising:
when the input text is not matched with the input text of the second information base, selecting a preset number of input texts from the input text and the input text of the second information base according to a preset mode and storing the input texts;
and receiving a correlation instruction based on the selected input text, and correlating the input text corresponding to the correlation instruction.
4. The device information processing method according to claim 1 or 2, wherein, before the step of reading the device information to be processed in the first information base after the first information base and the second information base are established, the method further comprises:
and receiving the equipment information reported by the SDK, and storing the reported equipment information as a first information base.
5. The device information processing method according to claim 1 or 2, wherein, before the step of reading the device information to be processed in the first information base after the first information base and the second information base are established, the method further comprises:
and acquiring equipment information through a third-party website, and segmenting the acquired equipment information to be used as a segmentation input text and storing the segmentation input text as a second information base.
6. An apparatus information processing apparatus characterized by comprising:
the reading module is used for reading the equipment information to be processed in the first information base after the first information base and the second information base are established;
the splicing module is used for splicing the to-be-processed equipment information into a URL (uniform resource locator) address of a search engine;
the acquisition module is used for accessing the URL address through a search engine to acquire the to-be-processed equipment information as an input text;
the comparison module is used for comparing the input text with the input text of the second information base;
and the association module is used for associating the input text with the input text of the second information base when the input text is matched with the input text of the second information base.
7. The device information processing apparatus according to claim 6, wherein the comparison module includes:
the word segmentation unit is used for segmenting the input text to obtain a segmented input text;
and the comparison unit is used for acquiring the word segmentation input text from the second information base and comparing the input text after word segmentation with the word segmentation input text.
8. The device information processing apparatus according to claim 7, wherein the device information processing apparatus further comprises:
the selection module is used for selecting a preset number of input texts from the input texts and the input texts of the second information base according to a preset mode when the input texts are not matched with the input texts of the second information base;
the storage module is used for storing the selected input text;
the receiving module is used for receiving the association instruction based on the selected input text;
the association module is further used for associating the input text corresponding to the association instruction.
9. The device information processing apparatus according to claim 8, wherein the receiving module is further configured to receive the device information reported by the SDK;
the storage module is further configured to store the reported device information as a first information base.
10. The device information processing apparatus according to claim 9, wherein the obtaining module is further configured to obtain the device information through a third-party website;
the word segmentation unit is also used for segmenting the acquired equipment information;
and the storage module is also used for storing the word segmentation input text after the obtained device information is segmented into words as a second information base.
11. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 5.
12. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 5 by means of the computer program.
CN201510276430.0A 2015-05-26 2015-05-26 Equipment information processing method and device Active CN106294433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510276430.0A CN106294433B (en) 2015-05-26 2015-05-26 Equipment information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510276430.0A CN106294433B (en) 2015-05-26 2015-05-26 Equipment information processing method and device

Publications (2)

Publication Number Publication Date
CN106294433A CN106294433A (en) 2017-01-04
CN106294433B true CN106294433B (en) 2020-03-03

Family

ID=57634887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510276430.0A Active CN106294433B (en) 2015-05-26 2015-05-26 Equipment information processing method and device

Country Status (1)

Country Link
CN (1) CN106294433B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284384B (en) * 2018-10-10 2021-01-01 拉扎斯网络科技(上海)有限公司 Text analysis method and device, electronic equipment and readable storage medium
CN112256862A (en) * 2020-09-08 2021-01-22 山东黄金矿业(莱州)有限公司三山岛金矿 Data mapping relation establishing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591972A (en) * 2011-12-31 2012-07-18 北京百度网讯科技有限公司 Method and device for providing goods search results
US8676778B2 (en) * 1995-12-14 2014-03-18 Graphon Corporation Method and apparatus for electronically publishing information on a computer network
CN103678443A (en) * 2012-09-19 2014-03-26 弗里塞恩公司 Method and system for providing content provider-specified URL keyword navigation
US9122730B2 (en) * 2012-05-30 2015-09-01 International Business Machines Corporation Free-text search for integrating management of applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8676778B2 (en) * 1995-12-14 2014-03-18 Graphon Corporation Method and apparatus for electronically publishing information on a computer network
CN102591972A (en) * 2011-12-31 2012-07-18 北京百度网讯科技有限公司 Method and device for providing goods search results
US9122730B2 (en) * 2012-05-30 2015-09-01 International Business Machines Corporation Free-text search for integrating management of applications
CN103678443A (en) * 2012-09-19 2014-03-26 弗里塞恩公司 Method and system for providing content provider-specified URL keyword navigation

Also Published As

Publication number Publication date
CN106294433A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
CN109726103B (en) Test report generation method, device, equipment and storage medium
CN108021929B (en) Big data-based mobile terminal e-commerce user portrait establishing and analyzing method and system
CN110019486B (en) Data acquisition method, device, equipment and storage medium
CN111666401B (en) Document recommendation method, device, computer equipment and medium based on graph structure
US10942733B2 (en) Open-source-license analyzing method and apparatus
CN106919625B (en) Internet user attribute identification method and device
KR20190026641A (en) Method of character recognition of claims document, apparatus, server and storage medium
CN115236260B (en) Chromatographic data storage method and device, electronic equipment and storage medium
CN108011976B (en) Internet access terminal model identification method and computer equipment
CN103838754A (en) Information searching device and method
CN113051362A (en) Data query method and device and server
CN113868498A (en) Data storage method, electronic device, device and readable storage medium
CN111913954A (en) Intelligent data standard catalog generation method and device
CN111897528A (en) Low-code platform for enterprise online education
CN105260365A (en) Terminal information processing method and device
CN106294433B (en) Equipment information processing method and device
CN108959289B (en) Website category acquisition method and device
CN112084111A (en) Data processing method, system and device
CN114265777B (en) Application program testing method and device, electronic equipment and storage medium
KR102532216B1 (en) Method for establishing ESG database with structured ESG data using ESG auxiliary tool and ESG service providing system performing the same
CN115757995A (en) Method and device for processing characteristic-free data label, computer equipment and storage medium
CN115543339A (en) Code conversion method and device, computer equipment and storage medium
CN108268545B (en) Method and device for establishing hierarchical user label library
CN112612866B (en) Knowledge base text synchronization method and device, electronic equipment and storage medium
CN114550157A (en) Bullet screen gathering identification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant