CN111931497A - Optimization method for language of questionnaire for automobile consumer - Google Patents
Optimization method for language of questionnaire for automobile consumer Download PDFInfo
- Publication number
- CN111931497A CN111931497A CN202010667389.0A CN202010667389A CN111931497A CN 111931497 A CN111931497 A CN 111931497A CN 202010667389 A CN202010667389 A CN 202010667389A CN 111931497 A CN111931497 A CN 111931497A
- Authority
- CN
- China
- Prior art keywords
- word
- words
- word segmentation
- language
- public praise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0203—Market surveys; Market polls
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an optimization method for a questionnaire language of an automobile consumer. The optimization method comprises the following steps: s1, acquiring public praise comment data of the automobile industry; s2, performing word segmentation on the public praise comment data by adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain a first word segmentation word library; s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank; s4, performing semantic similarity calculation on the words in the second word segmentation word bank and the secondary technical indexes; s5, clustering and grouping the words according to the semantic similarity to form a mapping table; s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method; and S7, optimizing the questionnaire language according to the weight. The method can effectively optimize the questionnaire language, and the questioning phrases can be easier to understand, so that the vehicle type can be evaluated more accurately and efficiently.
Description
Technical Field
The invention relates to the field of data processing, in particular to an optimization method for a questionnaire language of automobile consumers.
Background
At present, in a fierce competitive environment, accurate understanding of market and user requirements is urgently needed, in a consumer research process, the design of a questionnaire has direct influence on research results, and the problem that whether the user can accurately understand questions is the key point, so that the problem that the user can accurately understand questions exists, the problem that the questioning terms are too specialized exists in the aspect of questionnaire design language research in the whole industry at present, the user cannot understand or understand the questionnaire, and therefore, enterprises have strong requirements for optimization of questionnaire languages.
On the other hand, with the mature application of the NLP (natural language processing) technology, the subject immediately introduces the NLP technology, and combines the first-level technical index thinking of the automobile industry to newly count the indicators and attention degree concerned by the public praise of the user, aiming at deeply knowing the language characteristics of the automobile consumers and further optimizing the questionnaire language researched by the current user, so that the vehicle type is evaluated more accurately and efficiently, the enterprise is helped to know the advantages and disadvantages of the enterprise and the difference between the requirements of the user, and the method has an important promoting effect on the improvement of the existing products and the planning and development of new vehicle types for the enterprise.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The invention aims to provide an optimization method for researching questionnaire language by automobile consumers, which can effectively optimize the questionnaire language and enable the questionnaire language to be easier to understand, thereby evaluating the automobile type more accurately and efficiently.
In order to achieve the above purpose of the present invention, the following technical solutions are adopted:
according to one aspect of the present invention, there is provided a method for optimizing a questionnaire language of a consumer of an automobile, comprising the steps of:
s1, acquiring public praise comment data of the automobile industry;
s2, performing word segmentation on the public praise comment data by respectively adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain two word segmentation results, comparing the two word segmentation results, and performing proofreading and verification by combining a computer to obtain a first word segmentation library;
s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank;
s4, semantic similarity calculation is carried out on the words in the second word segmentation word bank and the secondary technical indexes of the automobile industry;
s5, clustering and grouping the words according to the semantic similarity of each word and the secondary technical indexes, wherein the words are used as keywords of each index to form a mapping table of a tree structure;
s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method;
and S7, optimizing the questionnaire language according to the weight.
It should be noted that:
nonsense words or stop words in S3 include, but are not limited to: very, also, cala, o, etc.
The secondary technical indicators in S4 include, but are not limited to: appearance, interior, power, etc.
And the cluster labels of the cluster groups in the S5 are corresponding secondary technical indexes.
In a preferred embodiment, in S1, after acquiring the public praise comment data of the automobile industry, the public praise comment data is imported into the database. The data are imported into the database, so that the data can be conveniently input and output, and the convenience, the time and the cost are saved.
As a further preferable technical solution, the word-of-mouth comment data is imported into the musql database using python language.
As a further preferable technical solution, in S3, nonsense words are eliminated by using a regular expression.
As a further preferable technical solution, in S3, stop words are eliminated by using a stop word bank.
And removing nonsense words and stop words can accelerate the solving speed of the model.
As a more preferable embodiment, in S6, the keywords are sorted from large to small according to the weight of the keywords.
Compared with the prior art, the invention has the beneficial effects that:
the optimization method for the automobile consumer questionnaire language is used for word segmentation based on the pyhanlp word segmentation library and the jieba word segmentation library, the word segmentation effect is good, the automobile industry secondary indexes are combined with the word segmentation result for the first time, the reliability of the optimization result is improved, the compactness between the consumer language and an automobile manufacturer is enhanced, and the healthy and stable development of the automobile industry can be promoted.
Drawings
Fig. 1 is a schematic flow chart of embodiment 1 of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer.
Example 1
As shown in fig. 1, the present embodiment provides a method for optimizing a questionnaire language of a car consumer, comprising the following steps:
the method includes the steps that related public praise comment data are purchased among automobiles, the format of the public praise data comprises parameters such as brands, automobile systems, automobile types, purchase time and public praise details, the specific format of the public praise data is shown in the following table 1, the public praise comment data are imported into a mysql database according to the tree diagram format of the automobile systems and the automobile types by means of python language, the written data can be conveniently read and written in the later period, and time cost is saved.
TABLE 1 detailed mouth tablet data format
The related vehicle type public praise data is divided into sentences and words based on the python platform, the pyhanlp word division base and the jieba word division base, word division results of the two different function bases are compared, proofreading and verification are conducted through a computer, and finally word division results are combined, and the word division results are shown in the following table 2.
TABLE 2 verified word segmentation results
Some irrelevant words (for example, very, too, or, etc.) may exist in the word segmentation result obtained in step 2, and the words do not affect the extraction of the index keyword, which is to emphasize the mood, and in order to accelerate the model solving speed, it is determined to delete the irrelevant words by using the regular expression and disabling the thesaurus, and the regular word segmentation result is shown in table 3 below.
TABLE 3 regularized word segmentation results
Based on the second-level technical indexes of the existing automobile industry, semantic similarity calculation is carried out on the word segmentation word bank in the step 3 and the second-level indexes, the following table 4 is the provided second-level technical indexes of the automobile industry, and the following table 5 is a contact graph of the first-level indexes and the second-level indexes of the automobile industry;
table 4 provides the second grade technical indexes of the automobile industry
Comfort feature | Interior decoration | Configuration of | Cost performance ratio | Quality of | Economy of use | Security | Endurance |
Appearance of the product | Space(s) | Steering and control | Brand | Oil consumption | Environment-friendly | Power plant |
TABLE 5 contact chart of the first and second indexes of the automobile industry
Calculating semantic similarity measures the similarity between two vectors by measuring their cosine values of the angle.
Where A isi、BiRepresenting the respective components of a and B, a and B refer to the one-hot encoding of the two words, respectively.
Clustering and grouping each word in the word bank according to the similarity according to a clustering method, wherein the label of the clustering and grouping is a corresponding second-level technical index; classifying the words according to the similarity of the words and the secondary indexes, wherein the words are used as keywords of each index, and the partial clustering grouping result is shown in the following table 6 (only 2 examples are given for space reasons);
TABLE 6 partial clustering grouping results
The weight of all keywords under each secondary index in all public praise of each vehicle type is counted by using a statistical method, and ranking is performed from large to small according to the weight of the keywords, wherein the weight of a certain public praise keyword is shown in the following table 7.
TABLE 7 weight of a certain tombstone keyword
Optimizing questionnaire language according to weight of secondary index keyword
Previous questionnaires:
how do you feel the appearance of the vehicle? Please score 1-5 (the score is larger, indicating a more like)
A 1 B 2C 3D 4E 5
And (3) analysis: for the consumer, giving only the score may feel overwhelmed and may not visually express the consumer's liking attitude.
Optimized questionnaires: how do you feel the appearance of the vehicle?
A overlord, B beautiful, C fashionable, D good at E atmosphere
When the consumer selects the options, each option will have a weight, and the weight score normalization (1-5 scores, or 1-10 scores) can be performed according to the size of each weight.
While particular embodiments of the present invention have been illustrated and described, it would be obvious that various other changes and modifications can be made without departing from the spirit and scope of the invention. It is therefore intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.
Claims (6)
1. A method for optimizing a questionnaire language for automotive consumers, comprising the steps of:
s1, acquiring public praise comment data of the automobile industry;
s2, performing word segmentation on the public praise comment data by respectively adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain two word segmentation results, comparing the two word segmentation results, and performing proofreading and verification by combining a computer to obtain a first word segmentation library;
s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank;
s4, semantic similarity calculation is carried out on the words in the second word segmentation word bank and the secondary technical indexes of the automobile industry;
s5, clustering and grouping the words according to the semantic similarity of each word and the secondary technical indexes, wherein the words are used as keywords of each index to form a mapping table of a tree structure;
s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method;
and S7, optimizing the questionnaire language according to the weight.
2. The optimization method according to claim 1, wherein in S1, after the automobile industry public praise comment data is acquired, the public praise comment data is imported into the database.
3. Optimization method according to claim 2, characterized in that the public praise comment data is imported into the musql database using the python language.
4. The optimization method according to claim 1, wherein in S3, nonsense words are eliminated by using a regular expression.
5. The optimization method of claim 1, wherein in S3, the stop word is eliminated by using a stop word bank.
6. The optimization method according to any one of claims 1 to 5, wherein in S6, the keywords are sorted from large to small according to their weights.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010667389.0A CN111931497A (en) | 2020-07-16 | 2020-07-16 | Optimization method for language of questionnaire for automobile consumer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010667389.0A CN111931497A (en) | 2020-07-16 | 2020-07-16 | Optimization method for language of questionnaire for automobile consumer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111931497A true CN111931497A (en) | 2020-11-13 |
Family
ID=73312808
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010667389.0A Pending CN111931497A (en) | 2020-07-16 | 2020-07-16 | Optimization method for language of questionnaire for automobile consumer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111931497A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113342931A (en) * | 2021-05-27 | 2021-09-03 | 东风柳州汽车有限公司 | Big data based user demand analysis method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800307A (en) * | 2019-01-18 | 2019-05-24 | 深圳壹账通智能科技有限公司 | Analysis method, device, computer equipment and the storage medium of product evaluation |
CN110442728A (en) * | 2019-06-28 | 2019-11-12 | 天津大学 | Sentiment dictionary construction method based on word2vec automobile product field |
CN110543547A (en) * | 2019-08-13 | 2019-12-06 | 广东数鼎科技有限公司 | automobile public praise semantic emotion analysis system |
CN111160017A (en) * | 2019-12-12 | 2020-05-15 | 北京文思海辉金信软件有限公司 | Keyword extraction method, phonetics scoring method and phonetics recommendation method |
-
2020
- 2020-07-16 CN CN202010667389.0A patent/CN111931497A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800307A (en) * | 2019-01-18 | 2019-05-24 | 深圳壹账通智能科技有限公司 | Analysis method, device, computer equipment and the storage medium of product evaluation |
CN110442728A (en) * | 2019-06-28 | 2019-11-12 | 天津大学 | Sentiment dictionary construction method based on word2vec automobile product field |
CN110543547A (en) * | 2019-08-13 | 2019-12-06 | 广东数鼎科技有限公司 | automobile public praise semantic emotion analysis system |
CN111160017A (en) * | 2019-12-12 | 2020-05-15 | 北京文思海辉金信软件有限公司 | Keyword extraction method, phonetics scoring method and phonetics recommendation method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113342931A (en) * | 2021-05-27 | 2021-09-03 | 东风柳州汽车有限公司 | Big data based user demand analysis method, device, equipment and storage medium |
CN113342931B (en) * | 2021-05-27 | 2022-11-01 | 东风柳州汽车有限公司 | Big data based user demand analysis method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9116985B2 (en) | Computer-implemented systems and methods for taxonomy development | |
EP2866421A1 (en) | Method and apparatus for identifying a same user in multiple social networks | |
CN107908753B (en) | Client demand mining method and device based on social media comment data | |
CN106156023B (en) | Semantic matching method, device and system | |
CN106294500B (en) | Content item pushing method, device and system | |
JP2015518220A (en) | Online product search method and system | |
CN109522412B (en) | Text emotion analysis method, device and medium | |
CN103365867A (en) | Method and device for emotion analysis of user evaluation | |
CN103823896A (en) | Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm | |
CN113312461A (en) | Intelligent question-answering method, device, equipment and medium based on natural language processing | |
CN109190121A (en) | Car review sentiment analysis method based on automobile body and part-of-speech rule | |
CN104252456A (en) | Method, device and system for weight estimation | |
Fouzia Sayeedunnissa et al. | Supervised opinion mining of social network data using a bag-of-words approach on the cloud | |
CN113268667B (en) | Chinese comment emotion guidance-based sequence recommendation method and system | |
CN111858922A (en) | Service side information query method and device, electronic equipment and storage medium | |
Cai et al. | PURA: a product-and-user oriented approach for requirement analysis from online reviews | |
CN112348417A (en) | Marketing value evaluation method and device based on principal component analysis algorithm | |
CN114266443A (en) | Data evaluation method and device, electronic equipment and storage medium | |
US11693886B2 (en) | Methods, systems, articles of manufacture, and apparatus to map client specifications with standardized characteristics | |
CN114840766A (en) | User portrait construction method, system, equipment and storage medium | |
CN108563647A (en) | A kind of automobile Method for Sales Forecast method based on comment sentiment analysis | |
CN104572915A (en) | User event relevance calculation method based on content environment enhancement | |
CN111931497A (en) | Optimization method for language of questionnaire for automobile consumer | |
CN115908060A (en) | Technical scheme creativity evaluation method, medium and device | |
US20220222715A1 (en) | System and method for detecting and analyzing discussion points from written reviews |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201113 |
|
WD01 | Invention patent application deemed withdrawn after publication |