CN111931497A - Optimization method for language of questionnaire for automobile consumer - Google Patents

Optimization method for language of questionnaire for automobile consumer Download PDF

Info

Publication number
CN111931497A
CN111931497A CN202010667389.0A CN202010667389A CN111931497A CN 111931497 A CN111931497 A CN 111931497A CN 202010667389 A CN202010667389 A CN 202010667389A CN 111931497 A CN111931497 A CN 111931497A
Authority
CN
China
Prior art keywords
word
words
word segmentation
language
public praise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010667389.0A
Other languages
Chinese (zh)
Inventor
杨靖
顾洪建
张帆
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cnr Tianjin Automobile Information Consulting Co ltd
China Automotive Technology and Research Center Co Ltd
Original Assignee
Cnr Tianjin Automobile Information Consulting Co ltd
China Automotive Technology and Research Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cnr Tianjin Automobile Information Consulting Co ltd, China Automotive Technology and Research Center Co Ltd filed Critical Cnr Tianjin Automobile Information Consulting Co ltd
Priority to CN202010667389.0A priority Critical patent/CN111931497A/en
Publication of CN111931497A publication Critical patent/CN111931497A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an optimization method for a questionnaire language of an automobile consumer. The optimization method comprises the following steps: s1, acquiring public praise comment data of the automobile industry; s2, performing word segmentation on the public praise comment data by adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain a first word segmentation word library; s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank; s4, performing semantic similarity calculation on the words in the second word segmentation word bank and the secondary technical indexes; s5, clustering and grouping the words according to the semantic similarity to form a mapping table; s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method; and S7, optimizing the questionnaire language according to the weight. The method can effectively optimize the questionnaire language, and the questioning phrases can be easier to understand, so that the vehicle type can be evaluated more accurately and efficiently.

Description

Optimization method for language of questionnaire for automobile consumer
Technical Field
The invention relates to the field of data processing, in particular to an optimization method for a questionnaire language of automobile consumers.
Background
At present, in a fierce competitive environment, accurate understanding of market and user requirements is urgently needed, in a consumer research process, the design of a questionnaire has direct influence on research results, and the problem that whether the user can accurately understand questions is the key point, so that the problem that the user can accurately understand questions exists, the problem that the questioning terms are too specialized exists in the aspect of questionnaire design language research in the whole industry at present, the user cannot understand or understand the questionnaire, and therefore, enterprises have strong requirements for optimization of questionnaire languages.
On the other hand, with the mature application of the NLP (natural language processing) technology, the subject immediately introduces the NLP technology, and combines the first-level technical index thinking of the automobile industry to newly count the indicators and attention degree concerned by the public praise of the user, aiming at deeply knowing the language characteristics of the automobile consumers and further optimizing the questionnaire language researched by the current user, so that the vehicle type is evaluated more accurately and efficiently, the enterprise is helped to know the advantages and disadvantages of the enterprise and the difference between the requirements of the user, and the method has an important promoting effect on the improvement of the existing products and the planning and development of new vehicle types for the enterprise.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The invention aims to provide an optimization method for researching questionnaire language by automobile consumers, which can effectively optimize the questionnaire language and enable the questionnaire language to be easier to understand, thereby evaluating the automobile type more accurately and efficiently.
In order to achieve the above purpose of the present invention, the following technical solutions are adopted:
according to one aspect of the present invention, there is provided a method for optimizing a questionnaire language of a consumer of an automobile, comprising the steps of:
s1, acquiring public praise comment data of the automobile industry;
s2, performing word segmentation on the public praise comment data by respectively adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain two word segmentation results, comparing the two word segmentation results, and performing proofreading and verification by combining a computer to obtain a first word segmentation library;
s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank;
s4, semantic similarity calculation is carried out on the words in the second word segmentation word bank and the secondary technical indexes of the automobile industry;
s5, clustering and grouping the words according to the semantic similarity of each word and the secondary technical indexes, wherein the words are used as keywords of each index to form a mapping table of a tree structure;
s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method;
and S7, optimizing the questionnaire language according to the weight.
It should be noted that:
nonsense words or stop words in S3 include, but are not limited to: very, also, cala, o, etc.
The secondary technical indicators in S4 include, but are not limited to: appearance, interior, power, etc.
And the cluster labels of the cluster groups in the S5 are corresponding secondary technical indexes.
In a preferred embodiment, in S1, after acquiring the public praise comment data of the automobile industry, the public praise comment data is imported into the database. The data are imported into the database, so that the data can be conveniently input and output, and the convenience, the time and the cost are saved.
As a further preferable technical solution, the word-of-mouth comment data is imported into the musql database using python language.
As a further preferable technical solution, in S3, nonsense words are eliminated by using a regular expression.
As a further preferable technical solution, in S3, stop words are eliminated by using a stop word bank.
And removing nonsense words and stop words can accelerate the solving speed of the model.
As a more preferable embodiment, in S6, the keywords are sorted from large to small according to the weight of the keywords.
Compared with the prior art, the invention has the beneficial effects that:
the optimization method for the automobile consumer questionnaire language is used for word segmentation based on the pyhanlp word segmentation library and the jieba word segmentation library, the word segmentation effect is good, the automobile industry secondary indexes are combined with the word segmentation result for the first time, the reliability of the optimization result is improved, the compactness between the consumer language and an automobile manufacturer is enhanced, and the healthy and stable development of the automobile industry can be promoted.
Drawings
Fig. 1 is a schematic flow chart of embodiment 1 of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer.
Example 1
As shown in fig. 1, the present embodiment provides a method for optimizing a questionnaire language of a car consumer, comprising the following steps:
the method includes the steps that related public praise comment data are purchased among automobiles, the format of the public praise data comprises parameters such as brands, automobile systems, automobile types, purchase time and public praise details, the specific format of the public praise data is shown in the following table 1, the public praise comment data are imported into a mysql database according to the tree diagram format of the automobile systems and the automobile types by means of python language, the written data can be conveniently read and written in the later period, and time cost is saved.
TABLE 1 detailed mouth tablet data format
Figure BDA0002583624770000041
The related vehicle type public praise data is divided into sentences and words based on the python platform, the pyhanlp word division base and the jieba word division base, word division results of the two different function bases are compared, proofreading and verification are conducted through a computer, and finally word division results are combined, and the word division results are shown in the following table 2.
TABLE 2 verified word segmentation results
Figure BDA0002583624770000042
Figure BDA0002583624770000051
Some irrelevant words (for example, very, too, or, etc.) may exist in the word segmentation result obtained in step 2, and the words do not affect the extraction of the index keyword, which is to emphasize the mood, and in order to accelerate the model solving speed, it is determined to delete the irrelevant words by using the regular expression and disabling the thesaurus, and the regular word segmentation result is shown in table 3 below.
TABLE 3 regularized word segmentation results
Figure BDA0002583624770000052
Based on the second-level technical indexes of the existing automobile industry, semantic similarity calculation is carried out on the word segmentation word bank in the step 3 and the second-level indexes, the following table 4 is the provided second-level technical indexes of the automobile industry, and the following table 5 is a contact graph of the first-level indexes and the second-level indexes of the automobile industry;
table 4 provides the second grade technical indexes of the automobile industry
Comfort feature Interior decoration Configuration of Cost performance ratio Quality of Economy of use Security Endurance
Appearance of the product Space(s) Steering and control Brand Oil consumption Environment-friendly Power plant
TABLE 5 contact chart of the first and second indexes of the automobile industry
Figure BDA0002583624770000053
Calculating semantic similarity measures the similarity between two vectors by measuring their cosine values of the angle.
Figure BDA0002583624770000061
Where A isi、BiRepresenting the respective components of a and B, a and B refer to the one-hot encoding of the two words, respectively.
Clustering and grouping each word in the word bank according to the similarity according to a clustering method, wherein the label of the clustering and grouping is a corresponding second-level technical index; classifying the words according to the similarity of the words and the secondary indexes, wherein the words are used as keywords of each index, and the partial clustering grouping result is shown in the following table 6 (only 2 examples are given for space reasons);
TABLE 6 partial clustering grouping results
Figure BDA0002583624770000062
The weight of all keywords under each secondary index in all public praise of each vehicle type is counted by using a statistical method, and ranking is performed from large to small according to the weight of the keywords, wherein the weight of a certain public praise keyword is shown in the following table 7.
TABLE 7 weight of a certain tombstone keyword
Figure BDA0002583624770000063
Optimizing questionnaire language according to weight of secondary index keyword
Previous questionnaires:
how do you feel the appearance of the vehicle? Please score 1-5 (the score is larger, indicating a more like)
A 1 B 2C 3D 4E 5
And (3) analysis: for the consumer, giving only the score may feel overwhelmed and may not visually express the consumer's liking attitude.
Optimized questionnaires: how do you feel the appearance of the vehicle?
A overlord, B beautiful, C fashionable, D good at E atmosphere
When the consumer selects the options, each option will have a weight, and the weight score normalization (1-5 scores, or 1-10 scores) can be performed according to the size of each weight.
While particular embodiments of the present invention have been illustrated and described, it would be obvious that various other changes and modifications can be made without departing from the spirit and scope of the invention. It is therefore intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.

Claims (6)

1. A method for optimizing a questionnaire language for automotive consumers, comprising the steps of:
s1, acquiring public praise comment data of the automobile industry;
s2, performing word segmentation on the public praise comment data by respectively adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain two word segmentation results, comparing the two word segmentation results, and performing proofreading and verification by combining a computer to obtain a first word segmentation library;
s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank;
s4, semantic similarity calculation is carried out on the words in the second word segmentation word bank and the secondary technical indexes of the automobile industry;
s5, clustering and grouping the words according to the semantic similarity of each word and the secondary technical indexes, wherein the words are used as keywords of each index to form a mapping table of a tree structure;
s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method;
and S7, optimizing the questionnaire language according to the weight.
2. The optimization method according to claim 1, wherein in S1, after the automobile industry public praise comment data is acquired, the public praise comment data is imported into the database.
3. Optimization method according to claim 2, characterized in that the public praise comment data is imported into the musql database using the python language.
4. The optimization method according to claim 1, wherein in S3, nonsense words are eliminated by using a regular expression.
5. The optimization method of claim 1, wherein in S3, the stop word is eliminated by using a stop word bank.
6. The optimization method according to any one of claims 1 to 5, wherein in S6, the keywords are sorted from large to small according to their weights.
CN202010667389.0A 2020-07-16 2020-07-16 Optimization method for language of questionnaire for automobile consumer Pending CN111931497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010667389.0A CN111931497A (en) 2020-07-16 2020-07-16 Optimization method for language of questionnaire for automobile consumer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010667389.0A CN111931497A (en) 2020-07-16 2020-07-16 Optimization method for language of questionnaire for automobile consumer

Publications (1)

Publication Number Publication Date
CN111931497A true CN111931497A (en) 2020-11-13

Family

ID=73312808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010667389.0A Pending CN111931497A (en) 2020-07-16 2020-07-16 Optimization method for language of questionnaire for automobile consumer

Country Status (1)

Country Link
CN (1) CN111931497A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342931A (en) * 2021-05-27 2021-09-03 东风柳州汽车有限公司 Big data based user demand analysis method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800307A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Analysis method, device, computer equipment and the storage medium of product evaluation
CN110442728A (en) * 2019-06-28 2019-11-12 天津大学 Sentiment dictionary construction method based on word2vec automobile product field
CN110543547A (en) * 2019-08-13 2019-12-06 广东数鼎科技有限公司 automobile public praise semantic emotion analysis system
CN111160017A (en) * 2019-12-12 2020-05-15 北京文思海辉金信软件有限公司 Keyword extraction method, phonetics scoring method and phonetics recommendation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800307A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Analysis method, device, computer equipment and the storage medium of product evaluation
CN110442728A (en) * 2019-06-28 2019-11-12 天津大学 Sentiment dictionary construction method based on word2vec automobile product field
CN110543547A (en) * 2019-08-13 2019-12-06 广东数鼎科技有限公司 automobile public praise semantic emotion analysis system
CN111160017A (en) * 2019-12-12 2020-05-15 北京文思海辉金信软件有限公司 Keyword extraction method, phonetics scoring method and phonetics recommendation method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342931A (en) * 2021-05-27 2021-09-03 东风柳州汽车有限公司 Big data based user demand analysis method, device, equipment and storage medium
CN113342931B (en) * 2021-05-27 2022-11-01 东风柳州汽车有限公司 Big data based user demand analysis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US9116985B2 (en) Computer-implemented systems and methods for taxonomy development
EP2866421A1 (en) Method and apparatus for identifying a same user in multiple social networks
CN107908753B (en) Client demand mining method and device based on social media comment data
CN106156023B (en) Semantic matching method, device and system
CN106294500B (en) Content item pushing method, device and system
JP2015518220A (en) Online product search method and system
CN109522412B (en) Text emotion analysis method, device and medium
CN103365867A (en) Method and device for emotion analysis of user evaluation
CN103823896A (en) Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN109190121A (en) Car review sentiment analysis method based on automobile body and part-of-speech rule
CN104252456A (en) Method, device and system for weight estimation
Fouzia Sayeedunnissa et al. Supervised opinion mining of social network data using a bag-of-words approach on the cloud
CN113268667B (en) Chinese comment emotion guidance-based sequence recommendation method and system
CN111858922A (en) Service side information query method and device, electronic equipment and storage medium
Cai et al. PURA: a product-and-user oriented approach for requirement analysis from online reviews
CN112348417A (en) Marketing value evaluation method and device based on principal component analysis algorithm
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
US11693886B2 (en) Methods, systems, articles of manufacture, and apparatus to map client specifications with standardized characteristics
CN114840766A (en) User portrait construction method, system, equipment and storage medium
CN108563647A (en) A kind of automobile Method for Sales Forecast method based on comment sentiment analysis
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN111931497A (en) Optimization method for language of questionnaire for automobile consumer
CN115908060A (en) Technical scheme creativity evaluation method, medium and device
US20220222715A1 (en) System and method for detecting and analyzing discussion points from written reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201113

WD01 Invention patent application deemed withdrawn after publication