CN111949639A - Brand library creating method and system and brand query and analysis platform - Google Patents

Brand library creating method and system and brand query and analysis platform Download PDF

Info

Publication number
CN111949639A
CN111949639A CN201910415359.8A CN201910415359A CN111949639A CN 111949639 A CN111949639 A CN 111949639A CN 201910415359 A CN201910415359 A CN 201910415359A CN 111949639 A CN111949639 A CN 111949639A
Authority
CN
China
Prior art keywords
brand
data
shop
store
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910415359.8A
Other languages
Chinese (zh)
Inventor
翟向东
马晓甦
朱祎
王志永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaxing Shurong Data Technology Co.,Ltd.
Original Assignee
Shanghai Shurong Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shurong Data Technology Co Ltd filed Critical Shanghai Shurong Data Technology Co Ltd
Priority to CN201910415359.8A priority Critical patent/CN111949639A/en
Publication of CN111949639A publication Critical patent/CN111949639A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a brand library creating method and system and a brand query and analysis platform, wherein the brand library creating method can deal with the processing of mass data, the association of brands and corresponding off-line shops is established through screening and fusing data in a plurality of data sources related to the brands, the brands and the off-line shops thereof are labeled, and the brand library is established and serves as a convenient, scientific and visual brand analysis tool to provide support for the subsequent data analysis of related operators.

Description

Brand library creating method and system and brand query and analysis platform
Technical Field
The present application relates to the field of big data processing, and in particular, to a brand library creating method, a brand library creating system, a brand library creating apparatus, a computer-readable storage medium, and a brand query and analysis platform.
Background
Brands are very important assets for enterprises, can make direct contribution to product income of enterprises, and can bring contributions exceeding general income to enterprises, including more stable market share, more loyal customer advocates, more suitable consumption habit preference and the like.
With the continuous improvement of the market economy and the increasing fierce market competition, the location selection of off-line shops related to brands is particularly important.
The traditional off-line shop site selection method is mostly judged by experience. The judgment basis is generally obtained by performing field examination, screening a plurality of alternative addresses according to experience or conventional conditions, then respectively going to each alternative address to perform field examination, and performing decision judgment after obtaining each index. Factors influencing site selection include location, operating environment, population density and flow, transportation, public facilities and the like, but the data acquisition difficulty is high, the data universality is poor, the subsequent manual processing is often performed, the cost is high, the period is long, the accuracy is limited, and the large-scale application is difficult. Most of the obtained results are the stacking of one-way data, the results are not visual, multidimensional analysis and standardization rules are lacked, the interpretability is not strong, the accuracy of the site selection method is difficult to verify, and beneficial references cannot be provided for site selection of other shops.
Disclosure of Invention
In view of the above shortcomings of the related art, the present application aims to provide a brand library creation method, system, device, storage medium and platform, which are used for solving the problems of complicated data processing flow, low efficiency, single analysis, poor guidance and the like related to brands and off-line stores thereof in the related art.
To achieve the above and other objects, a first aspect of the present application discloses a brand library creating method including the steps of:
obtaining multi-source data related to a brand;
performing brand screening on the multi-source data to obtain brand basic information of each brand; the brand basic information comprises a brand name;
according to the basic information of the brand, basic information of stores of off-line stores of the brand in a specified area range is obtained from the multi-source data, and association between the brand and the corresponding off-line stores is established; the shop basic information comprises shop position information;
and performing labeling operation on the brands and off-line shops thereof based on the multi-source data to create a brand library.
The brand library establishing method disclosed by the application can deal with the processing of mass data, and can establish the association between the brand and the corresponding off-line shop and the labeling operation on the brand and the off-line shop thereof by screening and fusing data in a plurality of data sources related to the brand, and establish the brand library as a convenient, scientific and visual brand analysis tool to provide support for the subsequent data analysis of related owners.
In certain embodiments of the first aspect of the present application, the data source is obtained by at least one of: collecting from network resources through a data collection program; obtaining at least one database, wherein the database comprises a public database and a private database; and (4) self-establishing data.
In certain embodiments of the first aspect of the present application, the step of brand screening the multi-source data to obtain brand basic information of each brand includes the following steps: cleaning the multi-source data according to a data cleaning rule; performing text recognition on the cleaned data, and extracting text contents containing suspected brand names from the text; and fusing the extracted content containing the suspected brand name to obtain a brand name set.
In certain embodiments of the first aspect of the present application, before cleansing the multi-source data or before text recognition of the cleansed data, the method further comprises the following steps: and classifying the multi-source data according to the industry field.
In certain embodiments of the first aspect of the present application, the brand library creating method further comprises the step of setting a data cleansing rule.
In certain embodiments of the first aspect of the present application, the respective data cleansing rules are set for at least one of different databases, different industries, and different brands.
In certain embodiments of the first aspect of the present application, the brand library creating method further comprises the steps of: a brand basic information table containing brand basic information of each brand is generated.
In certain embodiments of the first aspect of the present application, the step of obtaining, from the multi-source data, store basic information of an offline store within a specified area with the brand includes the steps of: extracting shop data corresponding to a certain brand from the multi-source data according to confirmed brand basic information of the brand; the shop data comprises basic shop information of off-line shops; fusing the extracted shop data to obtain a shop set; an association of the brand with a corresponding set of stores is established.
In some embodiments of the first aspect of the present application, the fusing of the extracted store data is performed based on store location information.
In certain embodiments of the first aspect of the present application, the store location information includes at least one of address information and geographic coordinate information.
In certain embodiments of the first aspect of the present application, the brand library creating method further comprises the steps of: a store basic information table containing the store basic information of each offline store corresponding to a certain brand is generated.
In some embodiments of the first aspect of the present application, when the store basic information in the multi-source data is in a text form, text recognition is performed on the multi-source data, and text content containing suspected store basic information is extracted from a text.
In some embodiments of the first aspect of the present application, when the store basic information in the multi-source data is in a map form, a point in the map where the offline store is displayed is mapped to geographic coordinate information by a mapping relationship.
In certain embodiments of the first aspect of the present application, the brand library creating method further comprises the steps of: and associating the extracted shop data with the data source where the shop data is located, and establishing a tracing information chain related to the shop data.
In certain embodiments of the first aspect of the present application, the traceability information chain includes a creation time of the store data in the data source where the store data is located.
In certain embodiments of the first aspect of the present application, the tagging of brands and their off-line shops based on the multi-source data comprises the steps of: extracting a brand basic attribute corresponding to the brand or a shop basic attribute corresponding to the off-line shop from the multi-source data; labeling the basic brand attributes or the basic shop attributes based on preset label rules; and establishing association between the label and the corresponding brand or the off-line shop thereof.
In certain embodiments of the first aspect of the present application, the brand library creating method further includes the step of presetting tag rules of brands and off-line stores thereof.
In certain embodiments of the first aspect of the present application, the brand library creating method further comprises the steps of: and establishing a brand relationship between at least two brands or a store relationship of the off-line stores of at least two brands according to the store basic information of the off-line stores in the brand library, the brands and the labels of the off-line stores.
In certain embodiments of the first aspect of the present application, the brand library creating method further comprises the steps of: and acquiring basic information of a market of the off-line shop of the brand from the multi-source data according to the basic information of the brand and the basic information of the shop, and establishing the association between the off-line shop of the brand and the market.
In certain embodiments of the first aspect of the present application, the brand library creating method further comprises the steps of: and acquiring agent basic information of an agent corresponding to the brand from the multi-source data according to the brand basic information, and establishing association between the brand and the agent.
In certain embodiments of the first aspect of the present application, the brand library creation method further comprises the step of updating the brand library.
A second aspect of the present application discloses a brand library creating system including:
the system comprises an information acquisition unit, a processing unit and a display unit, wherein the information acquisition unit is used for acquiring multi-source data related to brands;
the brand screening unit is used for carrying out brand screening on the multi-source data to obtain brand basic information of each brand; the brand basic information comprises a brand name;
the shop association unit is used for acquiring shop basic information of off-line shops of the brand in a specified area range from the multi-source data according to the brand basic information and establishing association between the brand and the corresponding off-line shop; the shop basic information comprises shop position information; and
and the labeling processing unit is used for performing labeling operation on the brands and off-line shops thereof based on the multi-source data to create a brand library.
The brand library creating system comprises an information acquisition unit, a brand screening unit, a shop association unit and a labeling processing unit, can deal with the processing of mass data, screens and fuses data in a plurality of data sources related to brands, establishes the association between the brands and corresponding off-line shops, performs labeling operation on the brands and off-line shops thereof, and establishes a brand library to serve as a convenient, scientific and visual brand analysis tool to provide support for the subsequent data analysis of related operators.
In certain embodiments of the second aspect of the present application, the data source is obtained by at least one of: collecting from network resources through a data collection program; obtaining at least one database, wherein the database comprises a public database and a private database; and (4) self-establishing data.
In certain embodiments of the second aspect of the present application, the brand screening unit may further include: the data cleaning module is used for cleaning the multi-source data according to the data cleaning rule; the text recognition module is used for performing text recognition on the cleaned data and extracting text contents containing suspected brand names from the text; and the brand fusion module is used for fusing the extracted content containing the suspected brand name to obtain a brand name set.
In certain embodiments of the second aspect of the present application, the brand screening unit further comprises: and the industry classification module is used for classifying the multi-source data according to the industry field.
In certain embodiments of the second aspect of the present application, the brand screening unit further comprises: and the cleaning rule setting module is used for setting a cleaning rule of the data.
In some embodiments of the second aspect of the present application, the store associating unit further includes: the shop extraction module is used for extracting shop data corresponding to a certain brand from the multi-source data according to the confirmed brand basic information of the brand; the shop data comprises basic shop information of off-line shops; the shop fusion module is used for fusing the extracted shop data to obtain a shop set; and the shop association module is used for establishing the association between the brand and the corresponding shop set.
In certain embodiments of the second aspect of the present application, the store fusion module fuses the extracted store data based on store location information.
In certain embodiments of the second aspect of the present application, the store location information includes at least one of address information and geographic coordinate information.
In some embodiments of the second aspect of the present application, the store association module is further configured to associate the extracted store data with a data source where the store data is located, and establish a traceability information chain related to the store data.
In certain embodiments of the second aspect of the present application, the creation time of the store data in the data source where the store data is located is included in the traceability information chain.
In certain embodiments of the second aspect of the present application, the labeling processing unit comprises: the basic attribute extraction module is used for extracting a brand basic attribute corresponding to the brand or a shop basic attribute corresponding to the off-line shop from the multi-source data; the label endowing module is used for marking a label on the basic attribute of the brand or the basic attribute of the shop based on a preset label rule; and the label association module is used for establishing association between the label and the corresponding brand or off-line shop thereof.
In certain embodiments of the second aspect of the present application, the labeling processing unit further comprises: and the label rule setting module is used for setting label rules of brands and off-line shops of the brands.
In certain embodiments of the second aspect of the present application, the brand library creation system further comprises: and the brand relation unit is used for establishing the brand relation between at least two brands or the store relation of the off-line stores of at least two brands according to the store basic information of the off-line stores in the brand library, the brands and the labels of the off-line stores.
In certain embodiments of the second aspect of the present application, the brand library creation system further comprises: and the market association unit is used for acquiring the market basic information of the market where the off-line shop of the brand is located from the multi-source data according to the brand basic information and the store basic information, and establishing the association between the off-line shop of the brand and the market.
In certain embodiments of the second aspect of the present application, the brand library creation system further comprises: and the agent association unit is used for acquiring the agent basic information of the agent corresponding to the brand from the multi-source data according to the brand basic information and establishing the association between the brand and the agent.
In certain embodiments of the second aspect of the present application, the brand library creation system further comprises: and the updating unit is used for updating the brand library.
A third aspect of the present application discloses a brand library creating apparatus comprising:
a memory to store instructions;
a processor coupled to the memory, the processor configured to perform a brand library creation method as described in the preceding item based on instructions stored by the memory.
A fourth aspect of the present application discloses a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the brand library creation method as described above.
A fifth aspect of the present application discloses a brand query and analysis platform, comprising:
the brand library creation system as described above;
a data storage unit for storing brand library information created by the brand library creating system;
the data processing unit is used for carrying out data processing on the brand library information in the data storage unit according to an operation instruction; the operation instruction comprises at least one of instant query and multidimensional data analysis.
In certain embodiments of the fifth aspect of the present application, the brand query and analysis platform further comprises: and the visual display unit is used for providing an operation interface and displaying the processing result of the data processing unit by adopting a visual interface.
The brand query and analysis platform comprises a brand library creating system, a data storage unit and a data processing unit, can deal with the processing of mass data, screens and fuses data in a plurality of data sources related to brands, associates the brands with corresponding off-line shops, performs labeling operation on the brands and off-line shops thereof, establishes a brand library, and can perform data processing such as instant query, multidimensional data analysis and the like according to operation instructions of users. The brand query and analysis platform can be used as a convenient, scientific and visual brand analysis tool, and provides support for subsequent data analysis of related manufacturers.
Drawings
FIG. 1 shows a schematic deployment diagram of a brand library creation system of the present application in one embodiment.
FIG. 2 is a block diagram of the card library creation system of FIG. 1 in one embodiment.
FIG. 3 is a schematic diagram of a brand screening unit in one embodiment.
FIG. 4 is a schematic diagram of a store association unit in one embodiment.
FIG. 5 shows a schematic diagram of longitude and latitude coordinate matching for a matching rule.
Fig. 6 is a schematic structural diagram of a labeling processing unit in an embodiment.
Fig. 7 is a block diagram of the card library creation system of fig. 1 in another embodiment.
Fig. 8 is a block diagram showing a structure of the brand library creating system of fig. 1 in a further embodiment.
FIG. 9 is a block diagram of the brand query and analysis platform in one embodiment.
FIG. 10 is a flow diagram illustrating a brand library creation method of the present application in one embodiment.
Fig. 11 is a schematic diagram illustrating a further detailed flow of step S33.
Fig. 12 is a schematic diagram illustrating a further detailed flow of step S35.
Fig. 13 is a schematic diagram illustrating a further detailed flow of step S37.
Detailed Description
The following description of the embodiments of the present application is provided for illustrative purposes, and other advantages and capabilities of the present application will become apparent to those skilled in the art from the present disclosure.
In the following description, reference is made to the accompanying drawings that describe several embodiments of the application. It is to be understood that other embodiments may be utilized and that compositional and operational changes may be made without departing from the spirit and scope of the present disclosure. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the patent of the present application. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. For example, in the present application, the "at least one client" includes one client and a plurality of clients, or the "at least one content presentation device" includes one content presentation device and a plurality of content presentation devices. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination.
In the prior art, obtaining business information related to brands and off-line businesses involves a large amount of data. These data sources are wide, may be stored on a variety of storage systems and/or devices, and may have different formats. Such data may be derived from internal data, network information, public databases, private databases, and the like. Data may be stored in different locations (e.g., different cities, countries, etc.) and different types of media (e.g., networks, disk storage, optical disks, etc.). The data may be available in the form of Web services, databases, flat files, log files, and the like. The types and/or formats of the various types of data may also differ. Therefore, it is difficult to obtain as complete data as possible, and more importantly, the universality of a large amount of data dispersed in different data sources is poor, the workload of querying and extracting effective information from the large amount of data is large, the cost is high, the period is long, the accuracy is limited, and the wide-range application is difficult. In view of the above, there is a need to integrate large amounts of data from various data sources in an efficient manner in order to obtain a meaningful analysis. For example, data analysis of brands and off-line stores is obtained so as to perform site selection planning of the off-line stores.
The brand library creation system and the brand library creation method disclosed by the application can deal with the processing of mass data, establish the association between the brand and the corresponding off-line shop and perform labeling operation on the brand and the off-line shop thereof by screening and fusing data in a plurality of data sources related to the brand, and establish the brand library, so that the brand library is used as a convenient, scientific and intuitive brand analysis tool to provide support for the subsequent data analysis of related operators.
Referring to fig. 1, a schematic diagram of a brand library creation system according to an embodiment of the present application is shown. As shown in FIG. 1, the brand library creation system of the present application is configured to acquire and integrate data from one or more data sources related to brands to create a brand library. These data sources may provide various types of data that are acceptable to brand library creation system 1, and may be, for example, internal data, network information, public databases, private databases, etc., and may be, for example, types of textual data, structured data, unstructured data (e.g., text, pictures, audio, video, etc.), etc. In an embodiment, the brand library creating system 1 may receive data from one or more data sources, and perform processes such as integrating and labeling on the data to create a brand library related to a brand, so as to serve as a convenient, scientific and intuitive brand analysis tool to support subsequent data analysis of related owners. For example, the brand library creating system 1 may receive data on position information, a passenger flow volume, an order volume, a total sales volume, customer evaluation, and the like of each offline store under a certain brand, and thus may provide a multidimensional analysis service, a prediction service, and the like for the offline stores under the brand.
Referring to FIG. 2, a block diagram of the card library creation system of FIG. 1 in one embodiment is shown. As shown in fig. 2, the brand library creating system 1 includes: an information acquisition unit 11, a brand filtering unit 13, a store association unit 15, and a labeling processing unit 17.
In an embodiment, the brand library creating system 1 may be implemented in an information processing device (group) such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a cloud server, or a distributed computer cluster.
The information acquisition unit 11 is used to acquire multi-source data related to brands. The multi-source data source is not limited and can be derived from one or more data sources.
In particular implementations of the embodiments, the data may be obtained from at least one of the following data sources.
In one exemplary embodiment, the web page is data-collected by a data-collection program.
Generally, a data collection program is a script or program that automatically captures network information according to a certain rule. The program may reside on the server, read the corresponding document by using a standard Protocol such as HTTP (Hyper Text Transfer Protocol) through a given number of URLs (Uniform Resource locators), and then continue roaming until there is no new URL satisfying the condition with all the unvisited URLs included in the document as new starting points.
When the written data collection program is used for data collection application of a web page, for example, data related to a brand may be collected from a portal site (e.g., green wave, netbook), a brand official website, an e-commerce website, a life service website (e.g., popular comment, mei-qu, 58 city-in-town, catenated, etc.), a social website (e.g., green wave microblog), a forum (e.g., hecto bar, fence net), a blog, and the like. Taking the "mass reviews" website as an example, data related to brands may be obtained including, but not limited to: brand name, industry field to which the brand belongs, label of the brand, geographic coordinate information of off-line shop of the brand, name of the affiliated shop and position in the shop, business hours, feature recommendation, preference information, customer comment and the like.
In an exemplary embodiment, the information is obtained from at least one database.
The database may be a public database, such as a public database, a map database, a brand database, etc., of a government functioning department (e.g., civil, real estate, police, industrial, fire, transportation, weather, etc.), a college, scientific, or commercial organization, etc. The database may be a private database, which is a database of a partner (e.g., a database provided by a certain brand company, a database provided by a certain store, or a database provided by another third party, etc.), or a contract database, etc. Taking the map database as an example, the point of interest data of a city or a certain area of the city, or the geographic coordinate information of off-line shops to which a brand belongs may be collected through an Application Programming Interface (API) of a map Application, for example.
In one exemplary embodiment, the data source is self-created data.
For example, the self-built data can be formed by arranging a plurality of data sources, or can be acquired by collecting, recording and arranging the data sources in a way of field investigation, feedback table collection and the like.
In addition, the types of the data are different, and the types of the data can be, for example, text-type data, structured data, unstructured data (e.g., text, pictures, audio, video, etc.), and the like.
The multi-source data related to the brand acquired by the information acquiring unit 11 may be stored, for example, the multi-source data may be stored in a storage medium (e.g., a hard disk, an optical disk, a magnetic disk, etc.), in the cloud, or in a distributed server.
In some embodiments, the information obtaining unit 11 may further associate the obtained brand-related data with the data source where the brand-related data is located, and establish a traceability information chain related to the brand data, where the traceability information chain includes a creation time of the brand data in the data source where the brand data is located, so as to implement traceability of each piece of brand data. Thus, after the brand is associated with the data source where the brand data is located, a brand data tracing information table capable of reflecting the brand can be generated.
The brand screening unit 13 is used for performing brand screening on the multi-source data.
In some embodiments, data in different data sources may employ different data standards, and the description for the same entity (e.g., city name, brand name, mall name, etc.) may be different. For example, assume that two data from different data sources are actually descriptions of the same brand, but the descriptions of the two data differ in the description of the brand's name. In one example, the types of data in the multi-source data are diverse and may include textual data, structured data, unstructured data (e.g., text, pictures, audio, video, etc.), and so forth
In some embodiments, data from the same source may not have a uniform data standard, and the description for the same entity (e.g., city name, brand name, mall name, etc.) may differ. For example, for an e-commerce website or a lifestyle website, among many reviews of net friends, descriptions for the same brand may differ. For example, for the same brand, descriptions such as chinese full name, chinese short name, english full name, english short name, or network nickname may appear in different comments. For example, in regard to McDonald's, different descriptions of "McDonald", "macdonald", "macterf", "macbeth", "M", and "McDonald's" may appear. Accordingly, these data of different descriptions directed to the same entity need to be processed accordingly, in the embodiment, the multi-source data is brand-screened by the brand screening unit 13. Here, the brand filtering unit 13 performs brand filtering on the multi-source data, including identifying and integrating data that points to the same brand in the multi-source data.
Therefore, data processing is required to be performed on the acquired multi-source data to screen out brand basic information related to brands.
In an embodiment, please refer to fig. 3, which shows a schematic structural diagram of brand screening unit 13 in an embodiment. As shown in fig. 3, brand screening unit 13 may further include: a data cleansing module 131, a text recognition module 133, and a brand fusing module 135.
The data cleaning module 131 is used for cleaning the multi-source data according to the data cleaning rule.
Generally, Data cleansing (Data cleansing) refers to a process of reviewing and verifying Data, and aims to remove duplicate Data, remove or correct incomplete Data and erroneous Data, and obtain Data with high consistency.
Since there may be a large amount of invalid or abnormal data in the multi-source data, the invalid or abnormal data may be repeated data, incomplete data, or data with errors, and the invalid data may not only cause a large amount of subsequent processing, but also cause interference or pollution to subsequent data processing, and affect the reliability and validity of the data processing result. Therefore, it is necessary to delete the invalid data or correct part or all of the invalid data, which involves cleansing the data.
At present, for light-weight data, a manual cleaning mode is traditionally adopted, a unified and standard cleaning flow is lacked, and the manual cleaning mode mainly has the following problems: the data cleaning consumes long time, the manual cleaning mode depends on the data judgment of operators, and the cleaning is finished step by step after the judgment, which needs a lot of time; data cleaning is easy to miss; the data cleaning result is unstable, and the problem of inconsistent cleaning results of the data cleaning result can occur due to different operators; the data cleaning process cannot be backtraced, and the data cannot be backchecked and corrected when a cleaning error occurs; the data cleaning result is time-consuming and labor-consuming to check, and the data needs to be counted again after cleaning is completed to check the data cleaning result. Therefore, for the multi-source data with large data volume in the application, it is obvious that the traditional manual cleaning is not an effective way.
In an embodiment, the data cleansing module 131 performs data cleansing on part or all of the multi-source data acquired by the information acquisition unit 11, deletes duplicate data, deletes or corrects incomplete data and erroneous data, and retains qualified data and corrected data, which may be stored in a storage medium (e.g., a hard disk, an optical disk, a magnetic disk, etc.), a cloud, or a distributed server, etc.
Cleaning the multi-source data by using the data cleaning module 131 may be implemented according to a preset data cleaning rule, and therefore, the brand screening unit 13 may further include a cleaning rule setting module for setting a cleaning rule of the data. In some embodiments, the data cleansing rule file is configured by the cleansing rule setting module, and the data cleansing rule file may include one or more data cleansing rules, where the data cleansing rules may be set according to the type, format, source, and/or industry of the data, i.e., different types of data may set different data cleansing rules, different formats of data may set different data cleansing rules, different sources of data may set different data cleansing rules, and different industries of data may set different data cleansing rules.
In one example, the data cleansing rules may be validated and adjusted according to validation results. For example, a self-correlation and cross-correlation data validity verification analysis is performed on a certain multi-source data, and whether a current data cleaning rule for a certain data source, data type and/or data format needs to be modified is determined according to a validity verification analysis result. If the data needs to be corrected, the original data cleaning rule is corrected and updated, so that the data cleaning module 131 cleans the multi-source data according to the updated data cleaning rule.
In one example, a machine learning method may be used to train the data cleansing rule, for example, the data cleansing rule may be trained by using a machine learning method on the existing data, and then the trained data cleansing rule may be used to perform data cleansing on the multi-source data.
In some embodiments, brand filtering unit 13 may further include a data preprocessing module for preprocessing the multi-source data in advance before the multi-source data is cleaned by data cleaning module 131. For example, the data formats in the multi-source data can be standardized and unified into a processable format. Alternatively, the data format of the same type of data in the multi-source data may be standardized.
In some embodiments, brand filtering unit 13 may further include an industry classification module for classifying the multi-source data by industry domain. Therefore, the data volume required to be cleaned by the data cleaning module 131 can be reduced, the cleaning efficiency is accelerated, meanwhile, due to the fact that data cleaning is carried out according to the industry field classification result, confusion of the same brand name belonging to different industries can be avoided, and the cleaning accuracy is improved. In one example, the industry classification can be made according to a national standard, such as, but not limited to, national economic industry classification (GB/T4754-2017), and in other examples, the industry classification can be followed by an industry classification of the data source, or the industry classification can be set by the customer on demand. In general, the industry may be, for example: catering, commercial property, convenience store business, training education, beauty salon, healthcare services, clothing accessories, luxury goods, pet stores, intermediary services, hardware accessories, auto repair, and the like.
The text recognition module 133 is configured to perform text recognition on the cleaned data, and extract text content containing suspected brand names from the text.
The multi-source data for requesting to identify the text can be in any form such as character content, pictures, dynamic pictures, video streams and the like. Generally, the text recognition technology can perform machine learning training to form a text recognition model according to the known text content as the input of a text recognition algorithm, and perform text recognition on data to be recognized by using the text recognition model. The known text content can be subjected to feature extraction aiming at brand names in the text recognition model, and feature information of each brand name is obtained.
In a specific operation, when performing text recognition, the following steps may be included, but not limited to:
and acquiring a text to be recognized, and performing word segmentation on the text to be recognized.
And identifying the brand name of the text after word segmentation based on a preset standard brand name library according to the structural characteristics of the brand name to obtain a suspected brand name.
The performing brand name recognition on the segmented text to obtain a suspected brand name may further include:
and matching the text after word segmentation with each standard brand name in a standard brand name library to obtain a matching representation value between the text and each standard brand name.
And determining the suspected brand name with the highest matching characteristic value or the matching characteristic value larger than a preset threshold value as the corresponding standard brand name according to the matching characteristic value, and generating an identification result.
Of course, as described above, before the multi-source data is cleaned by the data cleaning module 131, the multi-source data may be classified by the industry classification module according to the industry field, and accordingly, the standard brand name library may also be divided by the industry field into a corresponding standard brand name library of a specific industry, for example, a catering standard brand name library, a training education standard brand name library, a hairdressing standard brand name library, a clothing accessory standard brand name library, and the like.
The data obtained by washing is subjected to text recognition by the text recognition module 133, and a suspected brand name can be obtained from the data.
The brand fusing module 135 is configured to fuse the extracted content containing the suspected brand name to obtain a brand name set.
In some embodiments, different data standards or no corresponding data standards are adopted for data in different data sources, so that different description information may exist in data related to the same brand name in multi-source data. The method comprises the steps of fusing data related to the same brand name in multi-source data, wherein the key point is to judge whether the data to be fused point to the same brand name.
In an example, the brand fusing module 135 fuses the extracted content containing suspected brand names including, but not limited to, the following steps:
the content containing the suspected brand name extracted by the text recognition module 133 is acquired.
And judging whether the content of the suspected brand name has the same brand name as a pre-stored brand library.
And when the content of the suspected brand name is judged to be the same as the brand name in a pre-stored brand library, pointing the suspected brand name in the content to the same brand name and adding the description information of the suspected brand name to the corresponding brand name data in the brand library.
Still taking the aforementioned McDonald's as an example, the text recognition module 133 can extract suspected brand names such as "McDonald", "makina", "makino", "majiu", "maji", "M" store "," McDonald's "," McDonald "," macrantang ", etc. from the recognized text, and the pre-stored brand library stores various standard brand names pointing to" McDonald's ": in McDonald's, there may be "McDonald", "maotai", "M mark", "McDonald's", and when data fusion is performed by the brand fusion module 135, the data to be fused is fused with a suspected brand name "maotai", "M mark", "McDonald's", and the like, which corresponds to the standard brand name, so as to point to the standard brand name "maotai (McDonald's)".
It is noted that to facilitate the accuracy of data fusion to avoid missing or misjudging data, manual analysis may be added, if necessary. In some embodiments, for some data that is disputed or suspected to be highly mismatched, a manual analysis determines whether the suspected brand name matches the standard brand name. For example, by manual analysis, "maotai" is directed to the standard brand name "majorduo (McDonald's)", or "majiu" is directed to the standard brand name "majorduo (McDonald's)".
Additionally, in some embodiments, standard brand names may be updated or augmented based on the development of the brand name. For example, the "golden archway" is updated to the standard brand name "McDonald's", so that the suspected brand name "golden archway" identified by the text identification module 133 may be pointed to the standard brand name "McDonald's" when data fusion is performed by the brand fusion module 135.
Thus, the brand name in the multi-source data can be fused through the brand fusion module 135, and the data corresponding to the standard brand name is integrated, so that the single and scattered brand data in the multi-source data, which are originally in different data sources, are integrated into unified brand data.
Based on the brand data, a brand base information table may be generated in which may include, but is not limited to: brand name and industry, etc. Please refer to table one below, which shows an example of a brand basic information table. In the above example, the brand "mcdonald" is described as an example.
Watch 1
Figure BDA0002064157860000121
Figure BDA0002064157860000131
In the foregoing, the brand filtering unit 13 performs brand filtering on the multi-source data, and obtains brand basic information of each brand and integrates each data corresponding to a brand name.
The shop association unit 15 is configured to obtain, from the multi-source data, shop basic information of an offline shop of the brand within a specified area range according to the brand basic information, and establish an association between the brand and a corresponding offline shop.
In an embodiment, please refer to fig. 4, which is a schematic structural diagram of the store association unit 15 in an embodiment. As shown in fig. 4, the store associating unit 15 may further include: a store extraction module 151, a store fusion module 153, and a store association module 155.
The store extraction module 151 is configured to extract, from the multi-source data, store data corresponding to a certain brand based on the brand basic information of the certain brand.
In an example, the store data includes at least store base information for an offline store, which may include, but is not limited to, store name and store location information.
Therefore, the step of extracting the shop data corresponding to the brand from the multi-source data by using the shop extraction module 151 may at least include:
a certain brand is selected from the brand basic information table.
Store name and store location information corresponding to the selected brand are extracted from the multi-source data.
In some embodiments, the store location information in the multi-source data for representing off-line stores is address information, such as: xx province xx city xx district xx with (street) xx number.
In some embodiments, the store locations in the multi-source data used to represent off-line stores employ geographic coordinate information, such as: latitude and longitude coordinates. The longitude and latitude coordinates are a coordinate system consisting of longitude and latitude, also called a geographical coordinate system, and are a spherical coordinate system which defines the space on the earth by utilizing a spherical surface of a three-dimensional space and can mark any position on the earth.
Since off-line shops corresponding to brands have shop location information and, in general, the focus of people is focused on a certain area (e.g., province, city, district, etc.), the area range of off-line shops to be extracted can be limited before the shop data corresponding to the brands is extracted from the multi-source data by the shop extraction module 151. In some embodiments, before obtaining the multi-source data, the area range is instructed, and the multi-source data meeting the area definition condition is selected by setting the area definition condition, for example, taking a "popular comment" website as an example, the area range of the data may be predefined, for example, the data in the "shanghai" area may be obtained by selecting or inputting "shanghai". In some embodiments, after the brand screening of the multi-source data is completed and before the shop data corresponding to the brand is extracted from the multi-source data, a region limiting condition for shop data extraction is set, so that when the shop data is extracted, before the shop data corresponding to the brand is extracted from the multi-source data according to the region limiting condition. As for the area definition condition, in some embodiments, it may be a definition condition that specifies an area, for example, the area definition condition of the shop data is set to "shanghai". In some embodiments, it may be a definition condition excluding a specified area, for example, setting the area definition condition of the shop data to "shanghai district other than chongming district".
The shop fusion module 153 is used for fusing the extracted shop data to obtain a shop set;
in some embodiments, data in different data sources adopt different data standards or have no corresponding data standards, so that different description information may exist in the shop data related to the same off-line shop in the multi-source data. The method comprises the steps of fusing store data related to the same off-line store in multi-source data, wherein the key point is to judge whether the store to be fused points to the same off-line store.
In an example, the store fusion module 153 fusing the extracted store name and store location information corresponding to the selected brand includes, but is not limited to, the following steps:
store name and store location information corresponding to the selected brand extracted by the store extraction module 151 are acquired.
Whether the shop names in the plurality of data are matched with the shop position information is judged.
And when the shop names and the shop position information in the plurality of data are judged to be matched, fusing the shop names and the shop positions in the plurality of data, and adding the data into off-line shop data of the selected brand in the brand library.
When judging whether the shop names and the shop position information in the plurality of data are matched, in some embodiments, the data to be fused can be matched with each other in a pairwise manner, or in some embodiments, the data to be fused can be sequentially matched with the standard shop names and the standard shop position information.
The following describes matching of store location information.
In certain embodiments, store location information matching may include the steps of:
store location information corresponding to the selected brand is extracted from the multi-source data. The shop location information adopts address information, and the address information can be described by dividing administrative regions step by step, such as: xx province xx city xx district xx with (street) xx number.
The shop position information is matched with pre-stored standard shop position information of off-line shops of the brand, wherein the standard shop position information can be divided step by step according to administrative regions to form a geographical coordinate information tree.
And if the store position information is judged to be the same as the position information of a certain node in the geographic coordinate information tree in the standard store position information, adding the store position information to off-line store data corresponding to the standard store position information.
Note that, in some examples, the extraction of the shop location information corresponding to the selected brand from the multi-source data is not described by division in an administrative area, but is described by attaching to a landmark or a building, for example, "xx building". In this case, before matching, the location information of "xx building" may be converted into a description divided by administrative areas step by query in advance, that is, xx district xx and xx (street) xx in xx city, xx province, for subsequent matching with standard shop location information.
In certain embodiments, store location information matching may include the steps of:
store location information corresponding to the selected brand is extracted from the multi-source data. The shop location information adopts address information, and the address information can be described by dividing administrative regions step by step, such as: xx city xx division xx (street) xx number in xx province, the shop location information can also be described by non-standards, such as xx building, xx shop xx room in xx department.
And converting the shop position information into longitude and latitude coordinates.
And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, wherein the standard shop position information adopts longitude and latitude coordinates. And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, namely matching the longitude and latitude coordinates in the shop position information with the longitude and latitude coordinates in the geographic coordinate information in the standard shop position information.
And according to a matching rule, if the deviation between the longitude and latitude coordinates in the shop position information and the longitude and latitude coordinates in the standard shop position information is judged to be less than a preset threshold value, adding the shop position information into off-line shop data corresponding to the standard shop position information.
The matching rule is as follows: calculating the square sum of the difference value of the longitude and latitude value in the shop position information to be matched and the longitude and latitude value in the standard shop position information; comparing the calculation result with a preset threshold value; if the calculation result is smaller than the preset threshold value, the matching of the shop position information to be matched and the standard shop position information can be judged, and the shop position information is added into the off-line shop data corresponding to the standard shop position information. If the calculation result is larger than the preset threshold value, the shop position information to be matched is judged not to be matched with the standard shop position information, and the shop position information is not processed.
Please refer to fig. 5, which is a schematic diagram illustrating the longitude and latitude coordinate matching according to the matching rule. As shown in fig. 5, a coordinate point a represents a warp and weft value in the store position information to be matched, and a coordinate point B represents a warp and weft value in the standard store position information. During matching, respectively calculating longitude and latitude difference values of two coordinate points, namely calculating a difference value a of a coordinate point A and a coordinate point B in longitude, and calculating a difference value B of the coordinate point A and the coordinate point B in latitude; then, the sum of the squares of the two differences is calculated, i.e.
Figure BDA0002064157860000151
Comparing the calculation result with a preset threshold value; if the calculation result is smaller than the preset threshold value, it can be determined that the store position information to be matched is matched with the standard store position information.
In certain embodiments, store location information matching may include the steps of:
store location information corresponding to the selected brand is extracted from the multi-source data. The shop location information is geographical coordinate information, such as: latitude and longitude coordinates.
And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, wherein the standard shop position information adopts longitude and latitude coordinates. And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, namely matching the longitude and latitude coordinates in the shop position information with the longitude and latitude coordinates in the geographic coordinate information in the standard shop position information.
And according to a matching rule, if the deviation between the longitude and latitude coordinates in the shop position information and the longitude and latitude coordinates in the standard shop position information is judged to be less than a preset threshold value, adding the shop position information into off-line shop data corresponding to the standard shop position information.
In this way, the off-line shop corresponding to a certain brand in the multi-source data can be fused by the shop fusion module 153, and single and scattered shop data in the multi-source data, which are originally in different data sources, can be integrated into uniform shop data.
Based on the store data, a store basic information table including store basic information of each offline store corresponding to a certain brand can be generated, and the store basic information table includes, but is not limited to: the brand name, the store location information (address information, latitude and longitude coordinates), and the like, but not limited to this, the store basic information table may further include: the operation time, the decoration condition, the shop area, the business hours and the like. Please refer to table two below, which shows an example of the basic information table of the shop. In the above example, the brand "mcdonald" is described as an example.
Watch two
Figure BDA0002064157860000161
In the foregoing, the shop fusion module 153 performs shop fusion on the multi-source data to obtain the shop basic information for the brand to which the shop basic information belongs.
The store association module 155 is used to establish associations of the brands with corresponding sets of stores. By associating the store set with the corresponding brand, off-line stores to which the brand belongs can be integrated to obtain an off-line store list (see table two) of the brand, and a brand database is established.
After the brand has been associated with the corresponding set of stores, at least: the basic information of all off-line shops corresponding to a certain brand can be obtained by selecting or inputting the brand name; or, by selecting or inputting the name of a certain off-line shop, the basic information of the off-line shop, the brand to which the off-line shop belongs, and the basic information of other off-line shops to which the brand belongs can be obtained.
In some embodiments, the store association module 155 further associates the extracted store data with the data source where the store data is located, and establishes a tracing information chain related to the store data, where the tracing information chain includes a creation time of the store data in the data source where the store data is located, so as to implement tracing of the store data of each offline store, and record information of operations, expansions, changes, stoppages, reptiles, and the like of the offline store. In this way, when the store data of all offline stores to which the brand belongs are associated with the data source to which the store data belongs, the store data tracing information table of all offline stores belonging to the brand can be generated.
The labeling processing unit 17 is configured to perform labeling operation on the brand and its off-line shop based on the multi-source data, and create a brand library.
In an embodiment, please refer to fig. 6, which shows a schematic structural diagram of the labeling processing unit 17 in an embodiment. As shown in fig. 6, the labeling processing unit 17 may further include: a basic attribute extraction module 171, a tag assignment module 173, and a tag association module 175.
The basic attribute extraction module 171 is configured to extract a brand basic attribute corresponding to the brand or a store basic attribute corresponding to the offline store from the multi-source data.
In some embodiments, a base attribute of a brand corresponding to the brand may be extracted from the multi-source data using base attribute extraction module 171. In an example, brand base attributes corresponding to a brand may include, but are not limited to: brand name, industry of business, time of establishment, official website, etc.
In certain embodiments, store base attributes corresponding to offline stores may be extracted from the multi-source data using base attribute extraction module 171. In an example, store base attributes corresponding to a store may include, but are not limited to: the name of the shop, the location, the operation time, the business scale, the evaluation information, etc.
The label assigning module 173 is configured to label the basic brand attribute or the basic store attribute based on a preset label rule.
In some embodiments, the brand base attributes may be labeled based on preset labeling rules using the labeling module 173. For example, in one example, an industry label may be tagged to the brand base attribute "industry affiliated". In one example, a standing time tag is placed on the brand base attribute "standing time". In one example, the brand base attribute "official web" is tagged with a web hyperlink.
In some embodiments, the store base attributes may be tagged based on preset tagging rules using the tagging module 173. For example, in one example, the store base attribute "store name" may be tagged with a store name. In one example, a time to open label is opened for the brand base attribute "time to open". In one example, a rating label is applied to the basic attribute "rating information" of the brand.
The tagging module 173 is for tagging based on tagging rules, and therefore, the tagging processing unit 17 may further include a tagging rule setting module for setting tagging rules of brands and off-line stores thereof. The label rules of the brands can be set according to the corresponding factors such as the industries of the brands, the brand positioning and the like, and the label rules of the off-line shops can be set according to the corresponding factors such as the industries of the brands, the brand positioning and the like, and the factors such as the locations, the environments and the layout of the off-line shops.
The tag association module 175 is used to associate the tag with the corresponding brand or its off-line store.
In some embodiments, tags may be associated with corresponding brands using tag association module 175.
Please refer to table three below, which shows an example of a brand label table. In the example, the brand "starbucks" is used as an example for illustration.
Watch III
ID Name Relation Value
001 Star Baker Industry Food and beverage
002 Star Baker Official net http://www.starbucks.com
003 Star Baker Time of establishment 1971, 03 and 30 months
In certain embodiments, the tags may be associated with corresponding off-line stores using the tag association module 175.
Referring to table four below, an example of a store tag table is shown. In the example, the brand "starbucks" is used as an example.
Watch four
ID Name Relation Value
001 Star Bake (Sichuan north road) Evaluation information Vigorous qi
003 Star Bake (Sichuan north road) Time of operation 10 and 30 days in 2010
For the "evaluation information", there may be no limitation to one characteristic dimension, and taking an offline store of a restaurant as an example, the characteristic dimension may include but is not limited to: people's qi, dining environment, quality of service, food taste, etc., these characteristic dimensions also all have characteristic values for realizing the evaluation function. For the characteristic value of each characteristic dimension, the corresponding evaluation information can be output by quantizing according to a preset evaluation rule. Taking "popularity" as an example, the evaluation information related to "popularity" such as "popularity", "popularity" and the like may be finally output by, for example, performing quantitative processing on one or more of the payment information (e.g., payment amount, payment frequency and the like) of the offline store, the number of people in line or waiting time in line, the number of reviews of diners, and the number of related pictures distributed on social media according to the evaluation rule.
After the brands and off-line shops are subjected to labeling operation by the labeling processing unit 17 based on the multi-source data, a brand library can be created.
In some embodiments, the brand library creation system of the present application may further include an updating unit for updating the brand library. In one example, the brand library is updated with the update unit by setting an update period. In an example, the brand repository is updated with the update unit by detecting a change in the data source.
In the method, the brand library creating system is utilized to deal with the processing of mass data, the association between the brand and the corresponding off-line shop is established and the brands and off-line shops thereof are labeled by screening and fusing data in a plurality of data sources related to the brand, and the brand library is established and serves as a convenient, scientific and visual brand analysis tool to provide support for the subsequent data analysis of related owners.
By utilizing the created brand library, the condition and the variation trend of off-line shops under a certain brand in the brand library can be quickly inquired, related operators comprehensively utilize brand data and shop data in the brand library, and corresponding decisions can be made. However, the brand library creation system of the present application is not limited thereto, and may be further extended for brands in addition to specific ones and off-line shops.
In some embodiments, referring to fig. 7, the brand library creation system may further include a brand relationship unit 12 for establishing a brand relationship between at least two brands or a store relationship between at least two brands of off-line stores according to the store basic information of the off-line stores in the brand library, the brands and tags of the off-line stores thereof.
Factors influencing site selection, besides site location, operating environment, population density and flow, transportation, public facilities and the like, are also generally bound to involve market competition, especially direct competitors in the same industry, or other co-existing business partners. Therefore, by establishing the correlation between the brand and the off-line shop thereof and at least one other brand related to the brand and the off-line shop thereof, more comprehensive and more targeted can be obtained than a single brand and an off-line shop.
Still take "mcdonald ' as an example, as the world's famous catering brands," mcdonald ' and "kendiry" are undoubtedly just like a pair of "love and kill" competitors. And establishing a brand relationship between at least two brands or a shop relationship between at least two brands according to the basic information and the label information of the McDonald's shop and the off-line shop thereof and the basic information and the label information of the Kendeji' and the off-line shop thereof by the brand relationship unit.
Through data analysis, the relationship between the two brands and the off-line shop to which the two brands belong can be obtained for the McDonald's and the Kendeji's. For example, the number distribution and the ratio of stores in the two brands of off-line stores are similar, and the two brands of off-line stores are usually arranged adjacently (the distance between the two brands of off-line stores is generally not more than 500 meters). The two have a mutually dependent relationship in competition.
See table five below for an example of a brand relationship between two brands. The examples are described by way of example under the brands "mcdonald and" kendirk ".
Watch five
ID Entity 1 Relation Entity 2 Value
001 Root of beautiful Sweetclover Intergrowth of different types Kendyl 0.9
In some embodiments, referring to FIG. 8, the brand library creation system may also include a mall associating unit 14. The market association unit 14 is configured to obtain, from the multi-source data, market basic information of a market where an off-line shop of the brand is located according to the brand basic information and the store basic information, and establish an association between the off-line shop of the brand and the market. Thus, after an offline store of the brand establishes an association with a particular mall, at least: by selecting or entering a brand name, store(s) information associated with off-line stores of the brand may be obtained. Alternatively, offline store information for a brand(s) associated with a mall may be obtained by selecting or entering a mall name. Thereby providing a reference to the owner of the brand, the store operator, or other related business.
In some embodiments, referring to fig. 8, the brand library creating system may further include an agent associating unit 16, where the agent associating unit 16 obtains, according to the brand basic information, agent basic information of an agent corresponding to the brand from the multi-source data, and establishes an association between the brand and the agent. As such, after the brand has been associated with a particular agent, at least: by selecting or entering a brand name, agent information(s) associated with the brand may be obtained. Alternatively, offline store information for a brand(s) associated with an agent may be obtained by selecting or entering the agent's name. Thereby providing a reference to the owner of the brand, to an agent, or to other related owners.
The application further discloses a brand query and analysis platform based on the brand library creating system.
Referring to FIG. 9, a block diagram of the brand query and analysis platform in one embodiment is shown. As shown in FIG. 9, the brand query and analysis platform may include: a brand library creation system 1, a data storage unit 2, and a data processing unit 3.
The brand library creation system 1 may be as described with reference to fig. 1 and fig. 2 and the corresponding foregoing detailed description, and will not be described herein again. The brand library creating system can deal with the processing of mass data, and establishes association between a brand and a corresponding off-line shop, labeling operation on the brand and the off-line shop thereof and establishment of a brand library by screening and fusing data in a plurality of data sources related to the brand.
The data storage unit 2 is used to store brand library information created by the brand library creation system. In some embodiments, the data storage unit 2 may be a storage medium (e.g., a hard disk, an optical disk, a magnetic disk, etc.) of the cloud or the distributed server.
The data processing unit 3 is used for performing data processing on the brand library information in the data storage unit according to the operation instruction.
In some embodiments, the operational instructions may include instant queries, multidimensional data analysis, and the like.
Taking instant query as an example, it may include but is not limited to: a brand list in the brand library, an offline store list of an offline store to which a certain brand belongs, basic information, business conditions, evaluation information, and the like of the offline store, a cooperative relationship between the certain brand and a certain market(s) or an agent, and the like.
Taking multidimensional data analysis as an example, it may include but is not limited to: the distribution, the operation area, the operation mode, the passenger flow and the like of at least two brands of off-line shops in the same region or different regions.
The brand query and analysis platform of the present application may further comprise: and the visual display unit 4 is used for providing an operation interface and displaying the processing result of the data processing unit by adopting a visual interface. In an embodiment, the visualization presentation unit 4 is designed in the manner of a client. After a user installs a client related to the brand query and analysis platform, a request can be sent through the visual display unit 4, the data processing unit 3 performs data processing on brand library information in the data storage unit according to an operation instruction reflecting the request, and a data processing result is displayed to the requesting user through the visual display unit 4.
The brand query and analysis platform comprises a brand library creating system, a data storage unit and a data processing unit, can deal with the processing of mass data, screens and fuses data in a plurality of data sources related to brands, associates the brands with corresponding off-line shops, performs labeling operation on the brands and off-line shops thereof, establishes a brand library, and can perform data processing such as instant query, multidimensional data analysis and the like according to operation instructions of users. The brand query and analysis platform can be used as a convenient, scientific and visual brand analysis tool, and provides support for subsequent data analysis of related manufacturers.
The present application further discloses a brand library creation method that creates a brand library by acquiring and integrating data from one or more data sources related to brands. These data sources may provide various types of data, which may be, for example, internal data, network information, public databases, private databases, etc., and which may be, for example, types of textual data, structured data, unstructured data (e.g., text, pictures, audio, video, etc.), etc. In the embodiment, the brand library creating method is used for receiving data from one or more data sources, integrating and labeling the data and the like to create the brand library related to the brand, so that the brand library is used as a convenient, scientific and intuitive brand analysis tool and provides support for subsequent data analysis of related owners. For example, the created brand library may include data on location information, volume of customers, volume of orders, total sales, customer ratings, etc. of each offline store under a certain brand, so that multidimensional analysis service or prediction service, etc. can be provided for the offline stores under the brand.
Please refer to fig. 10, which is a flowchart illustrating a brand library creating method according to an embodiment of the present application.
As shown in fig. 10, the brand library creating method of the present application includes the steps of:
and step S31, multi-source data related to the brand is obtained. The multi-source data source is not limited and can be derived from one or more data sources.
In particular implementations of the embodiments, the data may be obtained from at least one of the following data sources.
In one exemplary embodiment, the web page is data-collected by a data-collection program.
Generally, a data collection program is a script or program that automatically captures network information according to a certain rule. The program may reside on the server, read the corresponding document by using a standard Protocol such as HTTP (Hyper Text Transfer Protocol) through a given number of URLs (Uniform Resource locators), and then continue roaming until there is no new URL satisfying the condition with all the unvisited URLs included in the document as new starting points.
When the written data collection program is used for data collection application of a web page, for example, data related to a brand may be collected from a portal site (e.g., green wave, netbook), a brand official website, an e-commerce website, a life service website (e.g., popular comment, mei-qu, 58 city-in-town, catenated, etc.), a social website (e.g., green wave microblog), a forum (e.g., hecto bar, fence net), a blog, and the like. Taking the "mass reviews" website as an example, data related to brands may be obtained including, but not limited to: brand name, industry field to which the brand belongs, label of the brand, geographic coordinate information of off-line shop of the brand, name of the affiliated shop and position in the shop, business hours, feature recommendation, preference information, customer comment and the like.
In an exemplary embodiment, the information is obtained from at least one database.
The database may be a public database, such as a public database, a map database, a brand database, etc., of a government functioning department (e.g., civil, real estate, police, industrial, fire, transportation, weather, etc.), a college, scientific, or commercial organization, etc. The database may be a private database, which is a database of a partner (e.g., a database provided by a certain brand company, a database provided by a certain store, or a database provided by another third party, etc.), or a contract database, etc. Taking the map database as an example, the point of interest data of a city or a certain area of the city, or the geographic coordinate information of off-line shops to which a brand belongs may be collected through an Application Programming Interface (API) of a map Application, for example.
In one exemplary embodiment, the data source is self-created data.
For example, the self-built data can be formed by arranging a plurality of data sources, or can be acquired by collecting, recording and arranging the data sources in a way of field investigation, feedback table collection and the like.
In addition, the types of the data are different, and the types of the data can be, for example, text-type data, structured data, unstructured data (e.g., text, pictures, audio, video, etc.), and the like.
The obtained multi-source data related to the brand can be stored, for example, the multi-source data can be stored in a storage medium (e.g., a hard disk, an optical disk, a magnetic disk, etc.), in a cloud, or in a distributed server, etc.
In some embodiments, the method may further include associating the acquired data related to the brand with the data source where the data related to the brand is located, and establishing a traceability information chain related to the brand data, where the traceability information chain includes a creation time of the brand data in the data source where the brand data is located, so as to realize traceability of each piece of brand data. Thus, after the brand is associated with the data source where the brand data is located, a brand data tracing information table capable of reflecting the brand can be generated.
And step S33, performing brand screening on the multi-source data to obtain brand basic information of each brand.
In some embodiments, data in different data sources may employ different data standards, and the description for the same entity (e.g., city name, brand name, mall name, etc.) may be different. For example, assume that two data from different data sources are actually descriptions of the same brand, but the descriptions of the two data differ in the description of the brand's name. In one example, the types of data in the multi-source data are diverse and may include textual data, structured data, unstructured data (e.g., text, pictures, audio, video, etc.), and so forth
In some embodiments, data from the same source may not have a uniform data standard, and the description for the same entity (e.g., city name, brand name, mall name, etc.) may differ. For example, for an e-commerce website or a lifestyle website, among many reviews of net friends, descriptions for the same brand may differ. For example, for the same brand, descriptions such as chinese full name, chinese short name, english full name, english short name, or network nickname may appear in different comments. For example, in regard to McDonald's, different descriptions of "McDonald", "macdonald", "macterf", "macbeth", "M", and "McDonald's" may appear. Therefore, it is necessary to perform corresponding processing on the data of different descriptions pointing to the same entity, in the embodiment, performing brand filtering on the multi-source data. Here, the brand screening of the multi-source data includes identifying and integrating data pointing to the same brand in the multi-source data.
Therefore, data processing is required to be performed on the acquired multi-source data to screen out brand basic information related to brands.
In an embodiment, please refer to fig. 11, which is a schematic flow chart of step S33. As shown in fig. 11, the step of performing brand filtering on the multi-source data to obtain brand basic information of each brand includes the following steps:
and step S331, cleaning the multi-source data according to the data cleaning rule.
Generally, Data cleansing (Data cleansing) refers to a process of reviewing and verifying Data, and aims to remove duplicate Data, remove or correct incomplete Data and erroneous Data, and obtain Data with high consistency.
Since there may be a large amount of invalid or abnormal data in the multi-source data, the invalid or abnormal data may be repeated data, incomplete data, or data with errors, and the invalid data may not only cause a large amount of subsequent processing, but also cause interference or pollution to subsequent data processing, and affect the reliability and validity of the data processing result. Therefore, it is necessary to delete the invalid data or correct part or all of the invalid data, which involves cleansing the data.
At present, for light-weight data, a manual cleaning mode is traditionally adopted, a unified and standard cleaning flow is lacked, and the manual cleaning mode mainly has the following problems: the data cleaning consumes long time, the manual cleaning mode depends on the data judgment of operators, and the cleaning is finished step by step after the judgment, which needs a lot of time; data cleaning is easy to miss; the data cleaning result is unstable, and the problem of inconsistent cleaning results of the data cleaning result can occur due to different operators; the data cleaning process cannot be backtraced, and the data cannot be backchecked and corrected when a cleaning error occurs; the data cleaning result is time-consuming and labor-consuming to check, and the data needs to be counted again after cleaning is completed to check the data cleaning result. Therefore, for the multi-source data with large data volume in the application, it is obvious that the traditional manual cleaning is not an effective way.
In an embodiment, data cleaning is performed on part or all of the acquired multi-source data, duplicate data is deleted, incomplete data and error data are deleted or corrected, qualified data and corrected data are retained, and the data can be stored in a storage medium (such as a hard disk, an optical disk, a magnetic disk and the like), a cloud end or a distributed server and the like.
The cleaning of the multi-source data can be realized according to a preset data cleaning rule, and therefore, the method can further comprise the step of setting the data cleaning rule. In some embodiments, the setting of the data cleansing rule may be implemented by configuring a data cleansing rule file, where the data cleansing rule file may include one or more data cleansing rules, where the data cleansing rule may be set according to the type, format, source, and/or industry of the data, that is, different types of data may set different data cleansing rules, different formats of data may set different data cleansing rules, different sources of data may set different data cleansing rules, and different industries of data may set different data cleansing rules.
In one example, the data cleansing rules may be validated and adjusted according to validation results. For example, a self-correlation and cross-correlation data validity verification analysis is performed on a certain multi-source data, and whether a current data cleaning rule for a certain data source, data type and/or data format needs to be modified is determined according to a validity verification analysis result. And if the data needs to be corrected, correcting the original data cleaning rule and updating the data so as to clean the multi-source data according to the updated data cleaning rule.
In one example, a machine learning method may be used to train the data cleansing rule, for example, the data cleansing rule may be trained by using a machine learning method on the existing data, and then the trained data cleansing rule may be used to perform data cleansing on the multi-source data.
In some embodiments, a data preprocessing step may also be included: and preprocessing the multi-source data in advance before cleaning the multi-source data. For example, the data formats in the multi-source data can be standardized and unified into a processable format. Alternatively, the data format of the same type of data in the multi-source data may be standardized.
In certain embodiments, the method further comprises the step of classifying the multi-source data by industry domain. Therefore, the data volume of data cleaning can be reduced, the cleaning efficiency is accelerated, meanwhile, due to the fact that data cleaning is carried out according to the industry field classification result, confusion of the same brand name belonging to different industries can be avoided, and the cleaning accuracy is improved. In general, the industry may be, for example: catering, commercial property, convenience store business, training education, beauty salon, healthcare services, clothing accessories, luxury goods, pet stores, intermediary services, hardware accessories, auto repair, and the like.
Step S333, text recognition is carried out on the data obtained through cleaning, and text content containing suspected brand names is extracted from the text.
The multi-source data for requesting to identify the text can be in any form such as character content, pictures, dynamic pictures, video streams and the like. Generally, the text recognition technology can perform machine learning training to form a text recognition model according to the known text content as the input of a text recognition algorithm, and perform text recognition on data to be recognized by using the text recognition model. The known text content can be subjected to feature extraction aiming at brand names in the text recognition model, and feature information of each brand name is obtained.
In a specific operation, when performing text recognition, the following steps may be included, but not limited to:
and acquiring a text to be recognized, and performing word segmentation on the text to be recognized.
And identifying the brand name of the text after word segmentation based on a preset standard brand name library according to the structural characteristics of the brand name to obtain a suspected brand name.
The performing brand name recognition on the segmented text to obtain a suspected brand name may further include:
and matching the text after word segmentation with each standard brand name in a standard brand name library to obtain a matching representation value between the text and each standard brand name.
And determining the suspected brand name with the highest matching characteristic value or the matching characteristic value larger than a preset threshold value as the corresponding standard brand name according to the matching characteristic value, and generating an identification result.
Of course, as described above, before the multi-source data is cleaned, the multi-source data may be classified in advance according to the industry field, and accordingly, the standard brand name library may also be divided into corresponding standard brand name libraries of specific industries according to the industry field, for example, a catering standard brand name library, a training and education standard brand name library, a hairdressing standard brand name library, a clothing accessory standard brand name library, and the like.
And performing text recognition on the data obtained by cleaning to obtain a suspected brand name.
And step S335, fusing the extracted content containing the suspected brand name to obtain a brand name set.
In some embodiments, different data standards or no corresponding data standards are adopted for data in different data sources, so that different description information may exist in data related to the same brand name in multi-source data. The method comprises the steps of fusing data related to the same brand name in multi-source data, wherein the key point is to judge whether the data to be fused point to the same brand name.
In an example, fusing the extracted content containing suspected brand names includes, but is not limited to, the steps of:
and acquiring the extracted content containing the suspected brand name.
And judging whether the content of the suspected brand name has the same brand name as a pre-stored brand library.
And when the content of the suspected brand name is judged to be the same as the brand name in a pre-stored brand library, pointing the suspected brand name in the content to the same brand name and adding the description information of the suspected brand name to the corresponding brand name data in the brand library.
Still taking the aforementioned McDonald's as an example, suspected brand names such as "McDonald", "macdona", "macdonald", "majorda", "majordi", "McDonald's", "macdonald" and the like are extracted from the recognized text, and the pre-stored brand library stores standard brand names pointing to "McDonald's": in the data fusion, a suspected brand name "McDonald", "macdonald", "macteit", "maclet", "M note", "McDonald's", etc. corresponding to the standard brand name is fused to point to the standard brand name "McDonald's", by fusing the suspected brand names "McDonald", "macteit note", "maclet", "M note", "McDonald's", etc. in the data to be fused.
It is noted that to facilitate the accuracy of data fusion to avoid missing or misjudging data, manual analysis may be added, if necessary. In some embodiments, for some data that is disputed or suspected to be highly mismatched, a manual analysis determines whether the suspected brand name matches the standard brand name. For example, by manual analysis, "maotai" is directed to the standard brand name "majorduo (McDonald's)", or "majiu" is directed to the standard brand name "majorduo (McDonald's)".
Additionally, in some embodiments, standard brand names may be updated or augmented based on the development of the brand name. For example, the "golden archway" is updated to the standard brand name "McDonald's", and thus, the identified suspected brand name "golden archway" may be pointed to the standard brand name "McDonald's" when performing data fusion.
Therefore, by fusing the brand names in the multi-source data, the data corresponding to the standard brand names can be integrated, so that the single and scattered brand data in different data sources in the multi-source data are integrated into uniform brand data.
Based on the brand data, a brand base information table may be generated in which may include, but is not limited to: brand name and industry, etc.
In the foregoing, by performing brand filtering on the multi-source data, brand basic information of each brand is obtained and each data corresponding to a brand name is integrated.
Step S35 is to acquire, from the multi-source data, store basic information on off-line stores having brands within a specified area range based on the brand basic information, and to associate the brands with the corresponding off-line stores.
In an embodiment, please refer to fig. 12, which shows a schematic flow chart of step S35. As shown in fig. 12, the step of acquiring the store basic information of the offline store within the designated area range from the multi-source data includes the steps of:
in step S351, store data corresponding to a brand is extracted from the multi-source data based on the confirmed brand basic information of the certain brand.
In an example, the store data includes at least store base information for an offline store, which may include, but is not limited to, store name and store location information.
Therefore, the step of extracting the shop data corresponding to the brand from the multi-source data may at least include:
a certain brand is selected from the brand basic information table.
Store name and store location information corresponding to the selected brand are extracted from the multi-source data.
In some embodiments, the store location information in the multi-source data for representing off-line stores is address information, such as: xx province xx city xx district xx with (street) xx number.
In some embodiments, the store locations in the multi-source data used to represent off-line stores employ geographic coordinate information, such as: latitude and longitude coordinates. The longitude and latitude coordinates are a coordinate system consisting of longitude and latitude, also called a geographical coordinate system, and are a spherical coordinate system which defines the space on the earth by utilizing a spherical surface of a three-dimensional space and can mark any position on the earth.
Since off-line shops corresponding to brands have shop location information, and generally, the attention of people focuses on a certain area (e.g., province, city, district, etc.), the area range of off-line shops to be extracted can be limited before extracting the shop data corresponding to the brands from the multi-source data. In some embodiments, before obtaining the multi-source data, the area range is instructed, and the multi-source data meeting the area definition condition is selected by setting the area definition condition, for example, taking a "popular comment" website as an example, the area range of the data may be predefined, for example, the data in the "shanghai" area may be obtained by selecting or inputting "shanghai". In some embodiments, after the brand screening of the multi-source data is completed and before the shop data corresponding to the brand is extracted from the multi-source data, a region limiting condition for shop data extraction is set, so that when the shop data is extracted, before the shop data corresponding to the brand is extracted from the multi-source data according to the region limiting condition. As for the area definition condition, in some embodiments, it may be a definition condition that specifies an area, for example, the area definition condition of the shop data is set to "shanghai". In some embodiments, it may be a definition condition excluding a specified area, for example, setting the area definition condition of the shop data to "shanghai district other than chongming district".
Step S353, fusing the extracted shop data to obtain a shop set;
in some embodiments, data in different data sources adopt different data standards or have no corresponding data standards, so that different description information may exist in the shop data related to the same off-line shop in the multi-source data. The method comprises the steps of fusing store data related to the same off-line store in multi-source data, wherein the key point is to judge whether the store to be fused points to the same off-line store.
In an example, fusing the extracted store name and store location information corresponding to the selected brand includes, but is not limited to, the steps of:
and acquiring the extracted shop name and shop position information corresponding to the selected brand.
Whether the shop names in the plurality of data are matched with the shop position information is judged.
And when the shop names and the shop position information in the plurality of data are judged to be matched, fusing the shop names and the shop positions in the plurality of data, and adding the data into off-line shop data of the selected brand in the brand library.
When judging whether the shop names and the shop position information in the plurality of data are matched, in some embodiments, the data to be fused can be matched with each other in a pairwise manner, or in some embodiments, the data to be fused can be sequentially matched with the standard shop names and the standard shop position information.
The following describes matching of store location information.
In certain embodiments, store location information matching may include the steps of:
store location information corresponding to the selected brand is extracted from the multi-source data. The shop location information adopts address information, and the address information can be described by dividing administrative regions step by step, such as: xx province xx city xx district xx with (street) xx number.
The shop position information is matched with pre-stored standard shop position information of off-line shops of the brand, wherein the standard shop position information can be divided step by step according to administrative regions to form a geographical coordinate information tree.
And if the store position information is judged to be the same as the position information of a certain node in the geographic coordinate information tree in the standard store position information, adding the store position information to off-line store data corresponding to the standard store position information.
Note that, in some examples, the extraction of the shop location information corresponding to the selected brand from the multi-source data is not described by division in an administrative area, but is described by attaching to a landmark or a building, for example, "xx building". In this case, before matching, the location information of "xx building" may be converted into a description divided by administrative areas step by query in advance, that is, xx district xx and xx (street) xx in xx city, xx province, for subsequent matching with standard shop location information.
In certain embodiments, store location information matching may include the steps of:
store location information corresponding to the selected brand is extracted from the multi-source data. The shop location information adopts address information, and the address information can be described by dividing administrative regions step by step, such as: xx city xx division xx (street) xx number in xx province, the shop location information can also be described by non-standards, such as xx building, xx shop xx room in xx department.
And converting the shop position information into longitude and latitude coordinates.
And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, wherein the standard shop position information adopts longitude and latitude coordinates. And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, namely matching the longitude and latitude coordinates in the shop position information with the longitude and latitude coordinates in the geographic coordinate information in the standard shop position information.
And according to a matching rule, if the deviation between the longitude and latitude coordinates in the shop position information and the longitude and latitude coordinates in the standard shop position information is judged to be less than a preset threshold value, adding the shop position information into off-line shop data corresponding to the standard shop position information.
The matching rule is as follows: calculating the square sum of the difference value of the longitude and latitude value in the shop position information to be matched and the longitude and latitude value in the standard shop position information; comparing the calculation result with a preset threshold value; if the calculation result is smaller than the preset threshold value, the matching of the shop position information to be matched and the standard shop position information can be judged, and the shop position information is added into the off-line shop data corresponding to the standard shop position information. If the calculation result is larger than the preset threshold value, the shop position information to be matched is judged not to be matched with the standard shop position information, and the shop position information is not processed. Matching according to the matching rules can be seen in detail in fig. 5.
In certain embodiments, store location information matching may include the steps of:
store location information corresponding to the selected brand is extracted from the multi-source data. The shop location information is geographical coordinate information, such as: latitude and longitude coordinates.
And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, wherein the standard shop position information adopts longitude and latitude coordinates. And matching the shop position information with pre-stored standard shop position information of the off-line shop of the brand, namely matching the longitude and latitude coordinates in the shop position information with the longitude and latitude coordinates in the geographic coordinate information in the standard shop position information.
And according to a matching rule, if the deviation between the longitude and latitude coordinates in the shop position information and the longitude and latitude coordinates in the standard shop position information is judged to be less than a preset threshold value, adding the shop position information into off-line shop data corresponding to the standard shop position information.
Therefore, by fusing off-line shops corresponding to a certain brand in the multi-source data, single and scattered shop data in different data sources in the multi-source data are integrated into uniform shop data.
Based on the store data, a store basic information table including store basic information of each offline store corresponding to a certain brand can be generated, and the store basic information table includes, but is not limited to: the brand name, the store location information (address information, latitude and longitude coordinates), and the like, but not limited to this, the store basic information table may further include: the operation time, the decoration condition, the shop area, the business hours and the like. In the foregoing, by performing store fusion on multi-source data, store basic information for a brand to which the data belongs is obtained.
Step S355 establishes an association between the brand and the corresponding store set.
By associating the store set with the corresponding brand, off-line stores to which the brand belongs can be integrated to obtain an off-line store list of the brand, and a brand database is established.
After the brand has been associated with the corresponding set of stores, at least: the basic information of all off-line shops corresponding to a certain brand can be obtained by selecting or inputting the brand name; or, by selecting or inputting the name of a certain off-line shop, the basic information of the off-line shop, the brand to which the off-line shop belongs, and the basic information of other off-line shops to which the brand belongs can be obtained.
In some embodiments, step S335 may further include associating the extracted store data with the data source where the store data is located, and establishing a traceability information chain related to the store data, where the traceability information chain includes a creation time of the store data in the data source where the store data is located, so as to realize traceability of the store data of each offline store, and record information of operations, expansions, changes, stoppages, transfer and the like of the offline store. In this way, when the store data of all offline stores to which the brand belongs are associated with the data source to which the store data belongs, the store data tracing information table of all offline stores belonging to the brand can be generated.
And step S37, performing labeling operation on the brand and the off-line shop thereof based on the multi-source data, and creating a brand library.
In an embodiment, please refer to fig. 13, which is a schematic flow chart of step S37. As shown in fig. 13, the step of acquiring the store basic information of the offline store within the designated area range from the multi-source data includes the steps of:
in step S371, a brand base attribute corresponding to the brand or a store base attribute corresponding to the off-line store is extracted from the multi-source data.
In some embodiments, brand base attributes corresponding to a brand may be extracted from the multi-source data. In an example, brand base attributes corresponding to a brand may include, but are not limited to: brand name, industry of business, time of establishment, official website, etc.
In some embodiments, store base attributes corresponding to off-line stores may be extracted from the multi-source data. In an example, store base attributes corresponding to a store may include, but are not limited to: the name of the shop, the location, the operation time, the business scale, the evaluation information, etc.
And step S373, labeling the basic attribute of the brand or the basic attribute of the shop based on a preset label rule.
In some embodiments, based on a preset labeling rule, the brand base attribute may be labeled based on the preset labeling rule. For example, in one example, an industry label may be tagged to the brand base attribute "industry affiliated". In one example, a standing time tag is placed on the brand base attribute "standing time". In one example, the brand base attribute "official web" is tagged with a web hyperlink.
In some embodiments, the store base attributes may be tagged based on preset tagging rules. For example, in one example, the store base attribute "store name" may be tagged with a store name. In one example, a time to open label is opened for the brand base attribute "time to open". In one example, a rating label is applied to the basic attribute "rating information" of the brand.
In step S373, the labeling of the brand base attribute or the store base attribute is based on a labeling rule, and therefore, a step of setting a labeling rule for a brand and an offline store thereof may be further included. The label rules of the brands can be set according to the corresponding factors such as the industries of the brands, the brand positioning and the like, and the label rules of the off-line shops can be set according to the corresponding factors such as the industries of the brands, the brand positioning and the like, and the factors such as the locations, the environments and the layout of the off-line shops.
Step S375, associate the label with the corresponding brand or its off-line shop.
In some embodiments, tags may be associated with corresponding brands.
In some embodiments, the tags may be associated with corresponding off-line stores.
And performing labeling operation on the brands and off-line shops thereof based on the multi-source data to create a brand library.
In some embodiments, the brand library creation method of the present application may further include the step of updating the brand library. In one example, the brand library is updated by setting an update period. In one example, the brand repository is updated by detecting changes to the data sources.
In the method, massive data can be processed by utilizing a brand library creating method, and by screening and fusing data in a plurality of data sources related to brands, association between the brands and corresponding off-line shops is established, labeling operation is performed on the brands and off-line shops thereof, and a brand library is established and serves as a convenient, scientific and visual brand analysis tool to provide support for subsequent data analysis of related owners.
By utilizing the created brand library, the condition and the variation trend of off-line shops under a certain brand in the brand library can be quickly inquired, related operators comprehensively utilize brand data and shop data in the brand library, and corresponding decisions can be made. However, the brand library creation system of the present application is not limited thereto, and may be further extended for brands in addition to specific ones and off-line shops.
In some embodiments, the brand library creation method may further include the steps of: and establishing a brand relationship between at least two brands or a store relationship of the off-line stores of at least two brands according to the store basic information of the off-line stores in the brand library, the brands and the labels of the off-line stores.
Factors influencing site selection, besides site location, operating environment, population density and flow, transportation, public facilities and the like, are also generally bound to involve market competition, especially direct competitors in the same industry, or other co-existing business partners. Therefore, by establishing the correlation between the brand and the off-line shop thereof and at least one other brand related to the brand and the off-line shop thereof, more comprehensive and more targeted can be obtained than a single brand and an off-line shop.
In some embodiments, the brand library creation method may further include the steps of: and acquiring basic information of a market of the off-line shop of the brand from the multi-source data according to the basic information of the brand and the basic information of the shop, and establishing the association between the off-line shop of the brand and the market. Thus, after an offline store of the brand establishes an association with a particular mall, at least: by selecting or entering a brand name, store(s) information associated with off-line stores of the brand may be obtained. Alternatively, offline store information for a brand(s) associated with a mall may be obtained by selecting or entering a mall name. Thereby providing a reference to the owner of the brand, the store operator, or other related business.
In some embodiments, the brand library creation method may further include the steps of: and acquiring agent basic information of an agent corresponding to the brand from the multi-source data according to the brand basic information, and establishing association between the brand and the agent. As such, after the brand has been associated with a particular agent, at least: by selecting or entering a brand name, agent information(s) associated with the brand may be obtained. Alternatively, offline store information for a brand(s) associated with an agent may be obtained by selecting or entering the agent's name. Thereby providing a reference to the owner of the brand, to an agent, or to other related owners.
The present application further discloses a brand library creation apparatus, including but not limited to a processor and a memory.
The Processor may be a Central Processing Unit (Central Processing Unit), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The central processor may employ the x86 processor architecture developed by Intel corporation or an Advanced reduced instruction set machine (ARM) processor architecture developed by ARM corporation.
The memory may be used to store the computer instructions and/or modules, and the processor may implement the various functions of brand library creation system 1 by executing or otherwise executing the computer instructions and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include an instruction storage area and a data storage area, wherein the instruction storage area may store an operating system, a software program or an application program required for at least one function, and the like; the data storage area may store various types of data used to create the brand library (e.g., brand name, offline store location, etc.), and so forth. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein the unit for creating the brand library integration, if implemented in the form of a software functional unit and sold or used as a stand-alone product, can be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the method according to the embodiments of the present invention may also be implemented by executing a computer program to control related hardware, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the method may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (41)

1. A brand library creation method, comprising the steps of:
obtaining multi-source data related to a brand;
performing brand screening on the multi-source data to obtain brand basic information of each brand; the brand basic information comprises a brand name;
according to the basic information of the brand, basic information of stores of off-line stores of the brand in a specified area range is obtained from the multi-source data, and association between the brand and the corresponding off-line stores is established; the shop basic information comprises shop position information; and
and performing labeling operation on the brands and off-line shops thereof based on the multi-source data to create a brand library.
2. The brand library creation method of claim 1, wherein the multi-source data is obtained by at least one of:
collecting from network resources through a data collection program;
obtaining at least one database, wherein the database comprises a public database and a private database; and
and (4) self-establishing data.
3. The brand library creating method of claim 1, wherein the step of brand screening the multi-source data to obtain brand basic information of each brand comprises the steps of:
cleaning the multi-source data according to a data cleaning rule;
performing text recognition on the cleaned data, and extracting text contents containing suspected brand names from the text;
and fusing the extracted content containing the suspected brand name to obtain a brand name set.
4. The brand library creation method of claim 3, further comprising, before cleansing the multi-source data or before text recognition of the cleansed data, the steps of:
and classifying the multi-source data according to the industry field.
5. The brand library creation method of claim 4, further comprising the step of setting a data cleansing rule.
6. The brand library creation method of claim 5, wherein the respective data cleansing rules are set for at least one of different databases, different industries, and different brands.
7. The brand library creation method of claim 3, further comprising the steps of: a brand basic information table containing brand basic information of each brand is generated.
8. The brand library creating method according to claim 1, wherein the step of acquiring, from the multi-source data, shop basic information of off-line shops within a specified area range with the brand includes the steps of:
extracting shop data corresponding to a certain brand from the multi-source data according to confirmed brand basic information of the brand; the shop data comprises basic shop information of off-line shops;
fusing the extracted shop data to obtain a shop set;
an association of the brand with a corresponding set of stores is established.
9. The brand library creation method of claim 8, wherein fusing the extracted store data is implemented based on store location information.
10. The brand library creation method of claim 9, wherein the store location information includes at least one of address information and geographic coordinate information.
11. The brand library creation method of claim 8, further comprising the steps of: a store basic information table containing the store basic information of each offline store corresponding to a certain brand is generated.
12. The brand library creation method of claim 11, wherein: and when the basic information of the shops in the multi-source data is in a text form, performing text recognition on the multi-source data, and extracting text contents containing suspected basic information of the shops from the text.
13. The brand library creation method of claim 11, wherein: and when the basic information of the shops in the multi-source data is in a map form, mapping the point of the off-line shop displayed in the map into geographic coordinate information through a mapping relation.
14. The brand library creation method of claim 8, further comprising the steps of: and associating the extracted shop data with the data source where the shop data is located, and establishing a tracing information chain related to the shop data.
15. The brand library creation method of claim 8, wherein the traceability information chain includes a creation time of the store data in a data source where the store data is located.
16. The brand library creation method of claim 1, wherein: the labeling operation of brands and off-line shops based on the multi-source data comprises the following steps:
extracting a brand basic attribute corresponding to the brand or a shop basic attribute corresponding to the off-line shop from the multi-source data;
labeling the basic brand attributes or the basic shop attributes based on preset label rules;
and establishing association between the label and the corresponding brand or the off-line shop thereof.
17. The brand library creation method of claim 16, further comprising the step of presetting tag rules for brands and their off-line stores.
18. The brand library creation method of claim 1, further comprising the steps of: and establishing a brand relationship between at least two brands or a store relationship of the off-line stores of at least two brands according to the store basic information of the off-line stores in the brand library, the brands and the labels of the off-line stores.
19. The brand library creation method of claim 1, further comprising the steps of: and acquiring basic information of a market of the off-line shop of the brand from the multi-source data according to the basic information of the brand and the basic information of the shop, and establishing the association between the off-line shop of the brand and the market.
20. The brand library creation method of claim 1, further comprising the steps of: and acquiring agent basic information of an agent corresponding to the brand from the multi-source data according to the brand basic information, and establishing association between the brand and the agent.
21. The brand library creation method of claim 1, further comprising the step of updating the brand library.
22. A brand library creation system, comprising:
the system comprises an information acquisition unit, a processing unit and a display unit, wherein the information acquisition unit is used for acquiring multi-source data related to brands;
the brand screening unit is used for carrying out brand screening on the multi-source data to obtain brand basic information of each brand; the brand basic information comprises a brand name;
the shop association unit is used for acquiring shop basic information of off-line shops of the brand in a specified area range from the multi-source data according to the brand basic information and establishing association between the brand and the corresponding off-line shop; the shop basic information comprises shop position information; and
and the labeling processing unit is used for performing labeling operation on the brands and off-line shops thereof based on the multi-source data to create a brand library.
23. The brand library creation system of claim 22, wherein the data source is obtained by at least one of:
collecting from network resources through a data collection program;
obtaining at least one database, wherein the database comprises a public database and a private database; and
and (4) self-establishing data.
24. The brand library creation system of claim 22, wherein the brand filtering unit further comprises:
the data cleaning module is used for cleaning the multi-source data according to the data cleaning rule;
the text recognition module is used for performing text recognition on the cleaned data and extracting text contents containing suspected brand names from the text;
and the brand fusion module is used for fusing the extracted content containing the suspected brand name to obtain a brand name set.
25. The brand library creation system of claim 24, wherein the brand filtering unit further comprises: and the industry classification module is used for classifying the multi-source data according to the industry field.
26. The brand library creation system of claim 25, wherein the brand filtering unit further comprises: and the cleaning rule setting module is used for setting a cleaning rule of the data.
27. The brand library creation system of claim 26, wherein the store association unit further comprises:
the shop extraction module is used for extracting shop data corresponding to a certain brand from the multi-source data according to the confirmed brand basic information of the brand; the shop data comprises basic shop information of off-line shops;
the shop fusion module is used for fusing the extracted shop data to obtain a shop set;
and the shop association module is used for establishing the association between the brand and the corresponding shop set.
28. The brand library creation system of claim 27, wherein the store fusion module fuses the extracted store data based on store location information.
29. The brand library creation system of claim 28, wherein the store location information comprises at least one of address information and geographic coordinate information.
30. The brand library creation system of claim 27, wherein the store association module is further configured to associate the extracted store data with a data source in which the store data is located, and to establish a traceability information chain associated with the store data.
31. The brand library creation system of claim 30, wherein the time of creation of the store data in the data source in which it is located is included in the traceability information chain.
32. The brand library creation system of claim 22, wherein: the labeling processing unit includes:
the basic attribute extraction module is used for extracting a brand basic attribute corresponding to the brand or a shop basic attribute corresponding to the off-line shop from the multi-source data;
the label endowing module is used for marking a label on the basic attribute of the brand or the basic attribute of the shop based on a preset label rule;
and the label association module is used for establishing association between the label and the corresponding brand or off-line shop thereof.
33. The brand library creation system of claim 32, wherein the tagging processing unit further comprises: and the label rule setting module is used for setting label rules of brands and off-line shops of the brands.
34. The brand library creation system of claim 22, further comprising: and the brand relation unit is used for establishing the brand relation between at least two brands or the store relation of the off-line stores of at least two brands according to the store basic information of the off-line stores in the brand library, the brands and the labels of the off-line stores.
35. The brand library creation system of claim 22, further comprising: and the market association unit is used for acquiring the market basic information of the market where the off-line shop of the brand is located from the multi-source data according to the brand basic information and the store basic information, and establishing the association between the off-line shop of the brand and the market.
36. The brand library creation system of claim 22, further comprising: and the agent association unit is used for acquiring the agent basic information of the agent corresponding to the brand from the multi-source data according to the brand basic information and establishing the association between the brand and the agent.
37. The brand library creation system of claim 22, further comprising: and the updating unit is used for updating the brand library.
38. A brand library creating apparatus, comprising:
a memory to store instructions;
a processor coupled to the memory, the processor configured to perform implementing the brand library creation method of any of claims 1-21 based on instructions stored by the memory.
39. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement the brand library creation method of any one of claims 1 to 21.
40. A brand query and analysis platform, comprising:
the brand library creation system of any one of claims 22 to 37;
a data storage unit for storing brand library information created by the brand library creating system;
the data processing unit is used for carrying out data processing on the brand library information in the data storage unit according to an operation instruction; the operation instruction comprises at least one of instant query and multidimensional data analysis.
41. The brand querying and analyzing platform of claim 40, further comprising:
and the visual display unit is used for providing an operation interface and displaying the processing result of the data processing unit by adopting a visual interface.
CN201910415359.8A 2019-05-17 2019-05-17 Brand library creating method and system and brand query and analysis platform Pending CN111949639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910415359.8A CN111949639A (en) 2019-05-17 2019-05-17 Brand library creating method and system and brand query and analysis platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910415359.8A CN111949639A (en) 2019-05-17 2019-05-17 Brand library creating method and system and brand query and analysis platform

Publications (1)

Publication Number Publication Date
CN111949639A true CN111949639A (en) 2020-11-17

Family

ID=73336374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910415359.8A Pending CN111949639A (en) 2019-05-17 2019-05-17 Brand library creating method and system and brand query and analysis platform

Country Status (1)

Country Link
CN (1) CN111949639A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515559A (en) * 2021-07-14 2021-10-19 浪潮卓数大数据产业发展有限公司 Method for forming brand pool by selling commodity brands on e-commerce platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103999108A (en) * 2011-12-08 2014-08-20 普莱瑟公司 Method and arrangement for electronic signs
CN105956694A (en) * 2016-04-20 2016-09-21 杭州维氪科技有限公司 Heterogeneous data source integrated modeling and optimizing method for interior space value of commercial real estate
CN106339899A (en) * 2016-08-19 2017-01-18 上海宝尊电子商务有限公司 Visual decision support method of offline Trade Zone design based on online trade data
CN109409964A (en) * 2018-11-27 2019-03-01 口碑(上海)信息技术有限公司 The recognition methods of Premium Brands and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103999108A (en) * 2011-12-08 2014-08-20 普莱瑟公司 Method and arrangement for electronic signs
CN105956694A (en) * 2016-04-20 2016-09-21 杭州维氪科技有限公司 Heterogeneous data source integrated modeling and optimizing method for interior space value of commercial real estate
CN106339899A (en) * 2016-08-19 2017-01-18 上海宝尊电子商务有限公司 Visual decision support method of offline Trade Zone design based on online trade data
CN109409964A (en) * 2018-11-27 2019-03-01 口碑(上海)信息技术有限公司 The recognition methods of Premium Brands and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515559A (en) * 2021-07-14 2021-10-19 浪潮卓数大数据产业发展有限公司 Method for forming brand pool by selling commodity brands on e-commerce platform

Similar Documents

Publication Publication Date Title
Li et al. An optimisation model for linear feature matching in geographical data conflation
Barron et al. A comprehensive framework for intrinsic OpenStreetMap quality analysis
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
JP5856618B2 (en) Geospatial database integration method and device
US10031925B2 (en) Method and system of using image recognition and geolocation signal analysis in the construction of a social media user identity graph
US20160171103A1 (en) Systems and Methods for Gathering, Merging, and Returning Data Describing Entities Based Upon Identifying Information
CN111949834A (en) Site selection method and site selection platform
TWI453608B (en) System and method for managing a large number of multiple data
Nugraha et al. Mobile application development for tourist guide in Pekanbaru City
CN112269805A (en) Data processing method, device, equipment and medium
CN111723959A (en) Region dividing method, region dividing device, storage medium and electronic device
Lansley et al. Big data and geospatial analysis
Cakic et al. Digital transformation and transparency in wine supply chain using ocr and dlt
CN112819544A (en) Advertisement putting method, device, equipment and storage medium based on big data
CN113094365A (en) Food safety tracing system, method and equipment and readable storage medium
Moradi et al. Exploring five indicators for the quality of OpenStreetMap road networks: A case study of Québec, Canada
KR102184048B1 (en) System and method for checking of information about estate development plan based on geographic information system
Chatterjee et al. SAGEL: smart address geocoding engine for supply-chain logistics
WO2007004521A1 (en) Marker specification device and marker specification method
CN112395486B (en) Broadband service recommendation method, system, server and storage medium
CN101685445A (en) Method for expressing distance priority of network geographic information subject matters
CN111949639A (en) Brand library creating method and system and brand query and analysis platform
CN112668335A (en) Method for identifying and extracting business license structured information by using named entity
KR20150001867A (en) SI image response survey system and survey method thereof
Almeida et al. Automatic poi matching using an outlier detection based approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20231013

Address after: Room 901-A, Block B, Building 19, No. 36 Changsheng South Road, Economic and Technological Development Zone, Jiaxing City, Zhejiang Province, 314001

Applicant after: Jiaxing Shurong Data Technology Co.,Ltd.

Address before: Room 1301, Building 2, Chuangzhi Tiandi Technology Center, No. 477 Zhengli Road, Yangpu District, Shanghai, 200433

Applicant before: Shanghai Shurong Data Technology Co.,Ltd.

TA01 Transfer of patent application right