CN111931497A

CN111931497A - Optimization method for language of questionnaire for automobile consumer

Info

Publication number: CN111931497A
Application number: CN202010667389.0A
Authority: CN
Inventors: 杨靖; 顾洪建; 张帆; 李斌
Original assignee: Cnr Tianjin Automobile Information Consulting Co ltd; China Automotive Technology and Research Center Co Ltd
Current assignee: Cnr Tianjin Automobile Information Consulting Co ltd; China Automotive Technology and Research Center Co Ltd
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2020-11-13

Abstract

The invention relates to an optimization method for a questionnaire language of an automobile consumer. The optimization method comprises the following steps: s1, acquiring public praise comment data of the automobile industry; s2, performing word segmentation on the public praise comment data by adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain a first word segmentation word library; s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank; s4, performing semantic similarity calculation on the words in the second word segmentation word bank and the secondary technical indexes; s5, clustering and grouping the words according to the semantic similarity to form a mapping table; s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method; and S7, optimizing the questionnaire language according to the weight. The method can effectively optimize the questionnaire language, and the questioning phrases can be easier to understand, so that the vehicle type can be evaluated more accurately and efficiently.

Description

Optimization method for language of questionnaire for automobile consumer

Technical Field

The invention relates to the field of data processing, in particular to an optimization method for a questionnaire language of automobile consumers.

Background

At present, in a fierce competitive environment, accurate understanding of market and user requirements is urgently needed, in a consumer research process, the design of a questionnaire has direct influence on research results, and the problem that whether the user can accurately understand questions is the key point, so that the problem that the user can accurately understand questions exists, the problem that the questioning terms are too specialized exists in the aspect of questionnaire design language research in the whole industry at present, the user cannot understand or understand the questionnaire, and therefore, enterprises have strong requirements for optimization of questionnaire languages.

On the other hand, with the mature application of the NLP (natural language processing) technology, the subject immediately introduces the NLP technology, and combines the first-level technical index thinking of the automobile industry to newly count the indicators and attention degree concerned by the public praise of the user, aiming at deeply knowing the language characteristics of the automobile consumers and further optimizing the questionnaire language researched by the current user, so that the vehicle type is evaluated more accurately and efficiently, the enterprise is helped to know the advantages and disadvantages of the enterprise and the difference between the requirements of the user, and the method has an important promoting effect on the improvement of the existing products and the planning and development of new vehicle types for the enterprise.

In view of the above, the present invention is particularly proposed.

Disclosure of Invention

The invention aims to provide an optimization method for researching questionnaire language by automobile consumers, which can effectively optimize the questionnaire language and enable the questionnaire language to be easier to understand, thereby evaluating the automobile type more accurately and efficiently.

In order to achieve the above purpose of the present invention, the following technical solutions are adopted:

according to one aspect of the present invention, there is provided a method for optimizing a questionnaire language of a consumer of an automobile, comprising the steps of:

s1, acquiring public praise comment data of the automobile industry;

s2, performing word segmentation on the public praise comment data by respectively adopting a jieba word segmentation library and a pyhanlp word segmentation library to obtain two word segmentation results, comparing the two word segmentation results, and performing proofreading and verification by combining a computer to obtain a first word segmentation library;

s3, removing nonsense words and stop words in the first participle word bank to obtain a second participle word bank;

s4, semantic similarity calculation is carried out on the words in the second word segmentation word bank and the secondary technical indexes of the automobile industry;

s5, clustering and grouping the words according to the semantic similarity of each word and the secondary technical indexes, wherein the words are used as keywords of each index to form a mapping table of a tree structure;

s6, counting the weight of all keywords under each secondary index in all public praise comments of each vehicle type by using a statistical method;

and S7, optimizing the questionnaire language according to the weight.

It should be noted that:

nonsense words or stop words in S3 include, but are not limited to: very, also, cala, o, etc.

The secondary technical indicators in S4 include, but are not limited to: appearance, interior, power, etc.

And the cluster labels of the cluster groups in the S5 are corresponding secondary technical indexes.

In a preferred embodiment, in S1, after acquiring the public praise comment data of the automobile industry, the public praise comment data is imported into the database. The data are imported into the database, so that the data can be conveniently input and output, and the convenience, the time and the cost are saved.

As a further preferable technical solution, the word-of-mouth comment data is imported into the musql database using python language.

As a further preferable technical solution, in S3, nonsense words are eliminated by using a regular expression.

As a further preferable technical solution, in S3, stop words are eliminated by using a stop word bank.

And removing nonsense words and stop words can accelerate the solving speed of the model.

As a more preferable embodiment, in S6, the keywords are sorted from large to small according to the weight of the keywords.

Compared with the prior art, the invention has the beneficial effects that:

the optimization method for the automobile consumer questionnaire language is used for word segmentation based on the pyhanlp word segmentation library and the jieba word segmentation library, the word segmentation effect is good, the automobile industry secondary indexes are combined with the word segmentation result for the first time, the reliability of the optimization result is improved, the compactness between the consumer language and an automobile manufacturer is enhanced, and the healthy and stable development of the automobile industry can be promoted.

Drawings

Fig. 1 is a schematic flow chart of embodiment 1 of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer.

Example 1

As shown in fig. 1, the present embodiment provides a method for optimizing a questionnaire language of a car consumer, comprising the following steps:

the method includes the steps that related public praise comment data are purchased among automobiles, the format of the public praise data comprises parameters such as brands, automobile systems, automobile types, purchase time and public praise details, the specific format of the public praise data is shown in the following table 1, the public praise comment data are imported into a mysql database according to the tree diagram format of the automobile systems and the automobile types by means of python language, the written data can be conveniently read and written in the later period, and time cost is saved.

TABLE 1 detailed mouth tablet data format

The related vehicle type public praise data is divided into sentences and words based on the python platform, the pyhanlp word division base and the jieba word division base, word division results of the two different function bases are compared, proofreading and verification are conducted through a computer, and finally word division results are combined, and the word division results are shown in the following table 2.

TABLE 2 verified word segmentation results

Some irrelevant words (for example, very, too, or, etc.) may exist in the word segmentation result obtained in step 2, and the words do not affect the extraction of the index keyword, which is to emphasize the mood, and in order to accelerate the model solving speed, it is determined to delete the irrelevant words by using the regular expression and disabling the thesaurus, and the regular word segmentation result is shown in table 3 below.

TABLE 3 regularized word segmentation results

Based on the second-level technical indexes of the existing automobile industry, semantic similarity calculation is carried out on the word segmentation word bank in the step 3 and the second-level indexes, the following table 4 is the provided second-level technical indexes of the automobile industry, and the following table 5 is a contact graph of the first-level indexes and the second-level indexes of the automobile industry;

table 4 provides the second grade technical indexes of the automobile industry

Comfort feature

Interior decoration

Configuration of

Cost performance ratio

Quality of

Economy of use

Security

Endurance

Appearance of the product

Space(s)

Steering and control

Brand

Oil consumption

Environment-friendly

Power plant

TABLE 5 contact chart of the first and second indexes of the automobile industry

Calculating semantic similarity measures the similarity between two vectors by measuring their cosine values of the angle.

Where A is_i、B_iRepresenting the respective components of a and B, a and B refer to the one-hot encoding of the two words, respectively.

Clustering and grouping each word in the word bank according to the similarity according to a clustering method, wherein the label of the clustering and grouping is a corresponding second-level technical index; classifying the words according to the similarity of the words and the secondary indexes, wherein the words are used as keywords of each index, and the partial clustering grouping result is shown in the following table 6 (only 2 examples are given for space reasons);

TABLE 6 partial clustering grouping results

The weight of all keywords under each secondary index in all public praise of each vehicle type is counted by using a statistical method, and ranking is performed from large to small according to the weight of the keywords, wherein the weight of a certain public praise keyword is shown in the following table 7.

TABLE 7 weight of a certain tombstone keyword

Optimizing questionnaire language according to weight of secondary index keyword

Previous questionnaires:

how do you feel the appearance of the vehicle? Please score 1-5 (the score is larger, indicating a more like)

A 1 B 2C 3D 4E 5

And (3) analysis: for the consumer, giving only the score may feel overwhelmed and may not visually express the consumer's liking attitude.

Optimized questionnaires: how do you feel the appearance of the vehicle?

A overlord, B beautiful, C fashionable, D good at E atmosphere

When the consumer selects the options, each option will have a weight, and the weight score normalization (1-5 scores, or 1-10 scores) can be performed according to the size of each weight.

While particular embodiments of the present invention have been illustrated and described, it would be obvious that various other changes and modifications can be made without departing from the spirit and scope of the invention. It is therefore intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.

Claims

1. A method for optimizing a questionnaire language for automotive consumers, comprising the steps of:

s1, acquiring public praise comment data of the automobile industry;

and S7, optimizing the questionnaire language according to the weight.

2. The optimization method according to claim 1, wherein in S1, after the automobile industry public praise comment data is acquired, the public praise comment data is imported into the database.

3. Optimization method according to claim 2, characterized in that the public praise comment data is imported into the musql database using the python language.

4. The optimization method according to claim 1, wherein in S3, nonsense words are eliminated by using a regular expression.

5. The optimization method of claim 1, wherein in S3, the stop word is eliminated by using a stop word bank.

6. The optimization method according to any one of claims 1 to 5, wherein in S6, the keywords are sorted from large to small according to their weights.