CN109710304B - Format adjustment method and device - Google Patents

Format adjustment method and device Download PDF

Info

Publication number
CN109710304B
CN109710304B CN201811609730.6A CN201811609730A CN109710304B CN 109710304 B CN109710304 B CN 109710304B CN 201811609730 A CN201811609730 A CN 201811609730A CN 109710304 B CN109710304 B CN 109710304B
Authority
CN
China
Prior art keywords
character
type
characters
preset
character group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811609730.6A
Other languages
Chinese (zh)
Other versions
CN109710304A (en
Inventor
刘硕
史家涛
李峰
何晓明
潘文卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weichai Power Co Ltd
Original Assignee
Weichai Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weichai Power Co Ltd filed Critical Weichai Power Co Ltd
Priority to CN201811609730.6A priority Critical patent/CN109710304B/en
Publication of CN109710304A publication Critical patent/CN109710304A/en
Application granted granted Critical
Publication of CN109710304B publication Critical patent/CN109710304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a format adjustment method and a device, which acquire a source code needing code format adjustment, divide all characters in the source code in the code compiling process of the source code to obtain a plurality of independent components, obtain respective corresponding types of the independent components, adjust the format of the source code based on the respective corresponding types of the independent components to obtain a target code with a specific format, wherein the specific formats of the target codes obtained after format adjustment of different source codes are the same, realize uniform format adjustment of the source code needing code format adjustment, improve the readability of the code, improve the code reading efficiency by later modification and maintenance, and perform character division in the code compiling process of the source code, thus realizing code compiling and code format adjustment by inputting characters once, reducing the time consumption of code compiling and code format adjustment.

Description

Format adjustment method and device
Technical Field
The present invention belongs to the field of code processing technologies, and in particular, to a format adjustment method and apparatus.
Background
With the development of the computer industry, more software projects are provided, one software project needs multiple programmers to cooperate with development, and the formats of codes written by the programmers are different, so that the codes with different formats need to be adjusted into codes with a uniform format for the convenience of later modification and maintenance.
Currently, the way to adjust the codes with different formats into the codes with a uniform format is: the specification of the pre-established code format and the use of the code editor are assisted, but different programmers have different understandings of the pre-established code format, so that the format of the code obtained based on the pre-established code format is different, and the same different programmers set up format-related menus in the code editor when using the code editor is different, which also results in different formats of the code given by the code editor.
Disclosure of Invention
In view of the above, the present invention provides a format adjustment method and apparatus for adjusting source codes with different formats into target codes with the same specific format.
The invention provides a format adjusting method, which comprises the following steps:
acquiring a source code which needs to be subjected to code format adjustment;
dividing all characters in the source code in the code compiling process of the source code to obtain a plurality of independent components, and obtaining types corresponding to the independent components respectively, wherein any independent component in the independent components is a character string or a character;
and adjusting the format of the source code based on the type corresponding to each of the independent components to obtain the target code with a specific format, wherein the specific formats of the target codes obtained by adjusting the formats of different source codes are the same.
Preferably, the format adjustment of the source code based on the type corresponding to each of the plurality of independent components to obtain the object code having the specific format includes:
and adjusting the format of the source code based on the type and the preset adjustment rule corresponding to each of the independent components to obtain the target code with the specific format, so that the specific formats of the target codes obtained after different source codes are adjusted by the preset adjustment rule are the same.
Preferably, the performing format adjustment on the source code based on the type and the preset adjustment rule corresponding to each of the plurality of independent components to obtain the target code with a specific format includes:
based on the types corresponding to the independent components, selecting a type adjustment rule corresponding to the type from preset adjustment rules;
selecting a common adjusting rule from the preset adjusting rules, wherein the common adjusting rule is used for carrying out format limitation on parts except for independent components in each line of codes;
and carrying out format adjustment on the source code based on the selected type adjustment rule and the common adjustment rule to obtain the target code with the format required by the type adjustment rule and the common adjustment rule.
Preferably, the dividing all characters in the source code in the code compiling process of the source code to obtain a plurality of independent components, and the obtaining of the types corresponding to the independent components includes:
in the code compiling process of the source code, inputting each line of characters of the source code one by one;
obtaining a character group included by each line of characters, wherein the character group included by any line of characters comprises: a character group between the first character and the 1 st space in the line of characters, a character group between the last character and the nth space in the line of characters, and a character group between the ith space and the (i + 1) th space, wherein the value of i is 2 to n-1, and n is the total number of spaces included in the source code;
for any character set: and determining whether the character included in the character group is a character string or a character based on a first preset matching rule, and obtaining the type based on a second preset matching rule.
Preferably, for any character set: determining whether the character included in the character group is a character string or a character based on a first preset matching rule, and obtaining the type based on a second preset matching rule comprises:
the following steps are performed for any character group:
if all contents carried by the character group are one of numbers, operators, punctuation marks and preset reserved words, the type of the character group is the type corresponding to all the contents, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if all contents carried by the character group do not comprise numbers, operators, punctuation marks and preset reserved words, the type of the character group is an independent variable type, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if the number of the characters in the character group is larger than 1 and the character group comprises at least one preset character, splitting the character group through the preset character, determining the preset character as one character, wherein the type of the character is the same as that of the preset character, and the preset character comprises at least one of an operator, a punctuation mark and a number;
any part of the parts obtained by splitting the preset character is divided into: if the number of characters in the part is equal to 1, determining that the part is a character and the type of the character is an argument type, if the number of characters in the part is greater than 1, determining that the part is a character string, if the character string is the same as one of preset reserved words, determining that the type of the character string is a reserved word type, otherwise, determining that the type of the character string is an argument type.
The invention provides a format adjusting device, comprising:
the acquisition module is used for acquiring a source code which needs to be subjected to code format adjustment;
the type determining module is used for segmenting all characters in the source code in the code compiling process of the source code to obtain a plurality of independent components and obtain respective corresponding types of the independent components, and any one of the independent components is a character string or a character;
and the adjusting module is used for carrying out format adjustment on the source code based on the types corresponding to the independent components to obtain the target code with a specific format, wherein the specific formats of the target codes obtained after different source codes are subjected to format adjustment are the same.
Preferably, the adjusting module is configured to perform format adjustment on the source code based on the type and preset adjustment rule corresponding to each of the plurality of independent components to obtain an object code with a specific format, so that the specific formats of the object codes obtained by adjusting different source codes by the preset adjustment rule are the same.
Preferably, the adjusting module includes:
a type adjustment rule determining unit, configured to select, based on respective types corresponding to the plurality of independent components, a type adjustment rule corresponding to the type from preset adjustment rules;
a common adjustment rule determining unit, configured to select a common adjustment rule from the preset adjustment rules, where the common adjustment rule is used to perform format limitation on a part of each line of codes except for an independent component;
and the adjusting unit is used for carrying out format adjustment on the source code based on the selected type adjusting rule and the common adjusting rule to obtain the target code with the format required by the type adjusting rule and the common adjusting rule.
Preferably, the type determining module includes:
the reading unit is used for inputting each line of characters of the source code one by one in the code compiling process of the source code;
the segmentation unit is used for obtaining a character group included by each line of characters, wherein the character group included by any line of characters comprises: a character group between the first character and the 1 st space in the line of characters, a character group between the last character and the nth space in the line of characters, and a character group between the ith space and the (i + 1) th space, wherein the value of i is 2 to n-1, and n is the total number of spaces included in the source code;
a type determination unit for, for any character group: and determining whether the character included in the character group is a character string or a character based on a first preset matching rule, and obtaining the type based on a second preset matching rule.
Preferably, the type determining unit is configured to perform the following steps for any character group:
if all contents carried by the character group are one of numbers, operators, punctuation marks and preset reserved words, the type of the character group is the type corresponding to all the contents, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if all contents carried by the character group do not comprise numbers, operators, punctuation marks and preset reserved words, the type of the character group is an independent variable type, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if the number of the characters in the character group is more than 1 and the character group comprises at least one preset character, splitting the character group through the preset character, determining the preset character as one character, wherein the type of the character is the same as that of the preset character, and the preset character comprises at least one of an operator, a punctuation mark and a number;
any part of the parts obtained by splitting the preset character is divided into: if the number of characters in the part is equal to 1, determining that the part is a character and the type of the character is an argument type, if the number of characters in the part is greater than 1, determining that the part is a character string, if the character string is the same as one of preset reserved words, determining that the type of the character string is a reserved word type, otherwise, determining that the type of the character string is an argument type.
According to the technical scheme, the source code needing code format adjustment is obtained, all characters in the source code are segmented in the code compiling process of the source code to obtain a plurality of independent components, types corresponding to the independent components are obtained, the source code is subjected to format adjustment based on the types corresponding to the independent components to obtain the target code with a specific format, wherein the specific formats of the target codes obtained after the formats of different source codes are adjusted are the same, so that the uniform format adjustment of the source codes needing the code format adjustment is realized, thereby improving the readability of the code, and the uniform format can enable each programmer to accurately understand the thought and meaning of the code, the code can be accurately changed on the basis of accurately understanding the thought and the intention, so that the code is favorably modified and maintained. And the character segmentation is carried out in the code compiling process of the source code, so that the code compiling and the code format adjustment can be carried out simultaneously by inputting the characters once, and the time consumption of the code compiling and the code format adjustment is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a format adjustment method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a format adjustment apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of a format adjustment method provided in an embodiment of the present invention is shown, where the method is used to perform format adjustment on a source code that needs to be subjected to code format adjustment to obtain an object code with a specific format, and specifically, the format adjustment method shown in fig. 1 may include the following steps:
s101: obtaining the source code that needs to be adjusted in code format can be understood as: the formats of codes written by various programmers are different, and in order to improve the readability of the codes and make the code structure more attractive, the code format adjustment needs to be performed on the source codes (namely, the codes written by the programmers), so that the source codes which need to be subjected to the code format adjustment can be, but are not limited to, source codes which are not subjected to the format adjustment and have the format adjustment requirement.
One way to obtain the source code that needs to be adjusted in code format is as follows: the programmer writes the code through the compiler, another way is: writing the text code into the code of the compiler through the written text code of the character editing software.
It should be noted that the source code needs to meet the writing requirement of the compiler, that is, when a programmer writes a code, the programmer needs to comply with the writing rule, so as to avoid a compiling error in the code compiling process and cause an error in the subsequent determination of splitting a character.
S102: all characters in the source code are segmented in the code compiling process of the source code to obtain a plurality of independent components, and types corresponding to the independent components are obtained, wherein any independent component in the independent components is a character or a character string.
That is to say, for the source code, the source code is composed of two types of independent components, namely, a character string and a character, and the stage of obtaining a plurality of independent components in the present embodiment is in the process of compiling the code, for example, in the lexical analysis stage of compiling the code, because the characters need to be input one by one in the lexical analysis stage to be analyzed, and the characters need to be input one by one when the independent components are obtained, the segmentation can be performed in the lexical analysis stage, so that the plurality of independent components can be obtained and the lexical analysis can be performed by one-time character input, thereby reducing the number of times of character input, and increasing the time consumption of compiling the code and adjusting the code format.
In this embodiment, one possible way to obtain the type corresponding to each of the plurality of independent components is to: in the code compiling process of a source code, inputting each line of characters of the source code one by one to obtain a character group included by each line of characters, wherein the character group included by any line of characters comprises: the character set between the first character and the 1 st space in the line of characters, the character set between the last character and the nth space in the line of characters, and the character set between the ith space and the (i + 1) th space, wherein the value of i is 2-n-1, n is the total number of spaces included in the source code, and for any character set: and determining whether the character included in the character group is a character string or a character based on a first preset matching rule, and obtaining the type based on a second preset matching rule.
The first preset matching rule may be, but is not limited to, the number of characters included in a character group, if the number of characters included in the character group is equal to 1, the character group is determined as a character, if the number of characters included in the character group is greater than 1, the character group is determined as a character string, and the second preset matching rule may be based on preset reserved words, operators, punctuation marks and numbers for matching, wherein for any programming language, the preset reserved words (i.e., the preset reserved words), the operators and the punctuation marks are specified, so that types can be determined based on the preset reserved words, the operators and the punctuation marks, and the numbers are generally used for assignment in the programming language, so that the types can also be determined based on the numbers. The process is as follows: the following steps are performed for any character group:
if all the contents carried by the character group are one of numbers, operators, punctuation marks and preset reserved words, the type of the character group is the type corresponding to all the contents, if the number of the characters in the character group is more than 1, the character group is determined to be a character string, and if the number of the characters in the character group is equal to 1, the character group is determined to be a character.
If all contents carried by the character group do not comprise numbers, operators, punctuation marks and preset reserved words, the type of the character group is an independent variable type, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character.
If the number of the characters in the character group is larger than 1 and the character group comprises at least one preset character, splitting the character group through the preset character, determining the preset character as a character, wherein the type of the character is the same as that of the preset character, and the preset character comprises: at least one of operators, punctuation marks and numbers.
Any part of all parts obtained by splitting the preset characters: if the number of characters in the part is equal to 1, determining that the part is a character and the type of the character is an argument, if the number of characters in the part is greater than 1, determining that the part is a character string, if the character string is the same as one of preset reserved words, determining that the type of the character string is a reserved word type, otherwise, determining that the type of the character string is an argument type.
It should be noted that the underline "_" in the punctuation mark in daily life is not classified as the punctuation mark in the present application, and the underline "_" is classified as a special mark in the present application, and the underline and the letter are of the same type, that is, when a character string includes a letter and an underline, the character string does not need to be split.
If the number of characters in a character group is greater than 1 and the character group comprises a preset character, if the character group comprises an operator ═ then the character group is split by the operator ═ and the operator ═ is determined as a character and the type of the character is the same as that of the preset character.
If the number of characters in a character group is greater than 1 and the character group comprises at least two preset characters, if the character group comprises an operator (i.e., an operator) and a punctuation mark (i.e., a punctuation mark), the character group is split by the operator (i.e., the operator) and the punctuation mark (i.e., the punctuation mark) and the operator (i.e., the punctuation mark) are determined to be one character respectively, the type of the character is the same as that of the preset character, i.e., the operator (i.e., the punctuation mark) is determined to be an operator type, and the punctuation mark (i.e., the punctuation mark) is determined to be a punctuation mark type.
And if the number of the characters in one character group is more than 1 and comprises at least one preset character, and the at least one preset character comprises at least one number, carrying out type judgment on the character before the number, and splitting based on the type of the character before the number.
One possible way to split based on the type of the previous character of the number is: if the type of the character before the number is the letter, the character group cannot be split through the number, otherwise, the character group is split through the number. That is, a number follows a letter, the letter and the number are treated as a whole, and a number follows an operator or a punctuation mark, the number and the operator or the punctuation mark are split.
For ease of understanding, the above-mentioned division is exemplified by the C language:
firstly, reserved words are character groups defined in C language, and the words can not be used as variable names any more, if "int" is a character group defined in C language, then "int" can only be used as reserved words, but can not be used as arguments, if a programmer can not use int when defining a certain name; the independent variable is an undefined character set in the C language, and particularly defines a character string or a character by a programmer. Such as "pwd _ type," the defined string or property of a string in subsequent use is defined by the programmer himself; the operator is used for executing program code operation, and the operator is set in C language, wherein the operator comprises: monocular, binocular, and trinocular operations, the monocular operation being the operation of the operator on only one variable, e.g. "! "(logical not operator); the binocular operation operates on two variables for the operator, such as ═ (assignment operator); the triage operation operates on three variables for the operator, e.g., "? : "(conditional operator), the same operator and punctuation mark are also set in the C language, and this embodiment will not be described one by one.
For example, the source code that needs to be formatted is as follows:
Figure BDA0001924418950000091
firstly, segmenting a source code according to a first character and a 1 st space of each line, a last character and an nth space of each line and two adjacent spaces of each line, wherein the obtained character groups are 'food', 'main () {', 'int', 'a', '35', 'and'; "," int "," b ═ 4; "" int "," c ", and" ═ a + b; ", where any character between any two spaces is empty, the character set is discarded.
If the character group "35" is a character in which all characters are numbers, the character group is determined to be a number type, and the number of characters in the character group is greater than 1, and the character group is determined to be a character string.
And if the character groups 'food', 'int' and 'int' are the same as the reserved character, determining the type of the reserved character, and if the number of the characters of the character group is more than 1, determining that the character group is a character string.
The character groups "a", "c", "═" and "; "the number of characters is equal to 1," a "," c "is determined as a letter," ═ is determined as an operator, "; "is determined as a punctuation mark symbol, and" a "and" c "are determined as argument types, and" ═ is an operator type, "; "is determined as a punctuation mark type and the number of characters of the character group is equal to 1, the character group is determined as one character.
The character group "b ═ 4; "," main () { "and" ═ a + b; the number of the characters is greater than 1 and includes preset characters, the character groups are split, taking the character group "main () {" as an example, the character groups are split to obtain "main", "()", "{", and then the obtained character groups are subjected to type determination, and the determined type is the same as the execution process and principle of the code for determining the type, and is not described again.
S103: and based on the respective corresponding types of the independent components, carrying out format adjustment on the source code to obtain the object code with a specific format, wherein the specific formats of the object codes obtained after carrying out format adjustment on different source codes are the same. It can be understood that: the types of the different independent components are adjusted to obtain independent components displayed in different formats, and the types of the same independent components in different source codes are adjusted to obtain independent components displayed in the same format, so that the specific formats of the target codes obtained after the formats of the different source codes are adjusted are the same.
Based on the respective corresponding types of the plurality of independent components, a feasible way to obtain the target code with a specific format is to perform format adjustment on the source code: and adjusting the format of the source code based on the type and the preset adjustment rule corresponding to each of the independent components to obtain the target code with the specific format, so that the specific formats of the target codes obtained after different source codes are adjusted by the preset adjustment rule are the same.
The preset adjustment rule is a rule for defining a format of the source code, that is, the preset adjustment rule already defines what format the code is written in, so that the specific format of the finally output target code can be indicated by the preset adjustment rule, and the preset adjustment rule may be, but is not limited to: the format requirements of each line, each type of character/character string, and the header file of the code, which constitute each part of the code, for example, each line is required to be indented with a first preset space, each type of character or character string is subjected to space separation, and the header file is required to be indented with a second preset space, where the first preset space and the second preset space may be the same or different, and the first preset space and the second preset space are not limited in this embodiment and are not set forth herein.
In this embodiment, one possible way to obtain the target code in the specific format based on the preset adjustment rule is: based on the types corresponding to the independent components, selecting a type adjustment rule corresponding to the type from preset adjustment rules, and selecting a common adjustment rule from the preset adjustment rules, wherein the common adjustment rule is used for limiting the formats of the components except for the independent components in each line of codes, and based on the selected type adjustment rule and the common adjustment rule, performing format adjustment on the source codes to obtain target codes with the formats required by the type adjustment rule and the common adjustment rule.
The common adjustment rule is used for defining common formats in different codes, such as defining the format of each line of codes and the format of a header file, for example, defining the indentation amount of each line of codes, whether the header file allows indentation and the like, and meanwhile, each line of codes has different specific formats according to different logic positions of each line of codes in the codes, for example, the indentation is different for multi-level nesting, so that the codes can be well-looked at and well-looked at as a whole, and the layers are distinct. Common adjustment rules may include, but are not limited to: code indentation, redundant empty line deletion and empty line addition, curly brace alignment and only one program language for each line; the type adjustment rule is an adjustment for a certain type of character or character string, which may include but is not limited to: and adjusting spaces before and after the character or the character string.
In this embodiment, the indented extent of the code indentation is related to the logical position of the corresponding program statement in the source code, and the indented extent of the code indentation of the program statement in different logical positions is different, for example, for the nested statements such as if and else, where there are multiple levels of nesting, the indented extent of the code indentation used in each level of nesting is different. Redundant empty line deletions and empty line additions may be adjusted according to the particular format desired.
Here, taking C language as an example, the following description will be given to the object code in a specific format: the source code is as follows:
Figure BDA0001924418950000111
the types of the independent components of the source code obtained through the step S102 are respectively a reservation system type, a number type, a punctuation type and a budget symbol type. The type adjustment rules corresponding to the types can be extracted from the preset adjustment rules, and the source code is adjusted by combining the type adjustment rules with the common adjustment rules to obtain the following target code:
Figure BDA0001924418950000112
the rules involved in obtaining the object code are as follows: the curly braces independently occupy a line, a pair of curly braces are aligned, four characters are contracted in the curly braces, redundant empty lines are deleted, and spaces are added before and after the independent components, wherein the spaces added before and after the independent components are as follows: any two adjacent characters/character strings in each line of codes are separated by a blank space; and deleting redundant empty lines is to delete redundant empty lines between any two adjacent lines of codes.
It should be noted that the preset adjustment rule may be a rule set in advance by a human or a rule specified by a system, and a new adjustment rule may also be added according to the requirement in these two forms, but it should be noted that: no matter what preset adjustment rule is adopted for adjustment, it is required to ensure that a code written by a project or a company has a uniform specific format. The rules, which may be manually set in advance, depend on the user's preference and are not described further herein.
According to the technical scheme, the source code needing code format adjustment is obtained, all characters in the source code are segmented in the code compiling process of the source code to obtain a plurality of independent components, types corresponding to the independent components are obtained, the source code is subjected to format adjustment based on the types corresponding to the independent components to obtain the target code with a specific format, wherein the specific formats of the target codes obtained after the formats of different source codes are adjusted are the same, so that the uniform format adjustment of the source codes needing the code format adjustment is realized, thereby improving the readability of the code, and the uniform format can enable each programmer to accurately understand the thought and meaning of the code, the code can be accurately changed on the basis of accurately understanding the thought and the intention, so that the code is favorably modified and maintained. And the character segmentation is carried out in the code compiling process of the source code, so that the code compiling and the code format adjustment can be carried out simultaneously by inputting the characters once, and the time consumption of the code compiling and the code format adjustment is reduced.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a format adjustment apparatus, a schematic structural diagram of which is shown in fig. 2, and the format adjustment apparatus may include: the device comprises an acquisition module 11, a type determination module 12 and an adjustment module 13.
The obtaining module 11 is configured to obtain a source code that needs to be subjected to code format adjustment, where a description and a obtaining manner of the source code refer to relevant descriptions in the method embodiment, which are not described again in this embodiment.
The type determining module 12 is configured to segment all characters in the source code in a code compiling process of the source code to obtain a plurality of independent components, and obtain types corresponding to the plurality of independent components, where any one of the plurality of independent components is a character string or a character.
That is to say, for the source code, the source code is composed of two types of independent components, namely, a character string and a character, and the stage of obtaining a plurality of independent components in this embodiment is in the process of compiling the code, for example, in the lexical analysis stage of code compiling, because the characters need to be input one by one in the lexical analysis stage for analysis, and the characters need to be input one by one when obtaining the independent components, the segmentation can be performed in the lexical analysis stage, and a plurality of independent components can be obtained and the lexical analysis can be performed by one character input, so that the number of times of inputting the characters can be reduced, and the time consumption of code compiling and code format adjustment can be increased.
In this embodiment, one structure of the type determining module 12 may be: the type determination module 12 includes: a reading unit, a dividing unit and a type determining unit.
The reading unit is used for inputting each line of characters of the source code one by one in the code compiling process of the source code; the segmentation unit is used for obtaining a character group included by each line of characters, wherein the character group included by any line of characters comprises: a character group between the first character and the 1 st space in the line of characters, a character group between the last character and the nth space in the line of characters, and a character group between the ith space and the (i + 1) th space, wherein the value of i is 2-n-1, and n is the total number of spaces included in the source code; a type determination unit for, for any character group: and determining whether the characters included in the character group are character strings or characters based on a first preset matching rule, and obtaining the type based on a second preset matching rule.
The first preset matching rule may be, but is not limited to, the number of characters included in a character group, if the number of characters included in the character group is equal to 1, the character group is determined as a character, if the number of characters included in the character group is greater than 1, the character group is determined as a character string, and the second preset matching rule may be based on preset reserved words, operators, punctuation marks and numbers for matching, wherein for any programming language, the preset reserved words (i.e., the preset reserved words), the operators and the punctuation marks are specified, so that types can be determined based on the preset reserved words, the operators and the punctuation marks, and the numbers are generally used for assignment in the programming language, so that the types can also be determined based on the numbers. Based on this type determination unit, the following steps are performed for any character group to determine the type:
if all the contents carried by the character group are one of numbers, operators, punctuation marks and preset reserved words, the type of the character group is the type corresponding to all the contents, if the number of the characters in the character group is more than 1, the character group is determined to be a character string, and if the number of the characters in the character group is equal to 1, the character group is determined to be a character.
If all contents carried by the character group do not comprise numbers, operators, punctuation marks and preset reserved words, the type of the character group is an independent variable type, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character.
If the number of the characters in the character group is larger than 1 and the character group comprises at least one preset character, splitting the character group through the preset character, determining the preset character as a character, wherein the type of the character is the same as that of the preset character, and the preset character comprises: at least one of operators, punctuation marks and numbers.
Any part of all parts obtained by splitting the preset characters: if the number of characters in the part is equal to 1, determining that the part is a character and the type of the character is an argument, if the number of characters in the part is greater than 1, determining that the part is a character string, if the character string is the same as one of preset reserved words, determining that the type of the character string is a reserved word type, otherwise, determining that the type of the character string is an argument type.
It should be noted that, the underline "_" in the punctuation mark in daily life is not included in the scope of the punctuation mark in the present application, and the underline "_" is included in the special mark in the present application, and the underline and the letter are of the same type, that is, when a character string includes the letter and the underline, the character string does not need to be split. For the specific description of the code division and the types obtained, reference is made to the related description in the method embodiment, which is not further described in this embodiment.
And an adjusting module 13, configured to perform format adjustment on the source code based on respective corresponding types of the multiple independent components to obtain an object code with a specific format, where specific formats of object codes obtained after format adjustment is performed on different source codes are the same. It can be understood that: the types of the different independent components which correspond to each other are adjusted to obtain independent components which are displayed in different formats, and the types of the same independent components in different source codes which correspond to each other are adjusted to obtain independent components which are displayed in the same format, so that the specific formats of the target codes obtained after the formats of the different source codes are adjusted are the same.
One possible way to obtain object code having a specific format based on the adjustment module 13 is to: and adjusting the format of the source code based on the type and the preset adjustment rule corresponding to each of the independent components to obtain the target code with the specific format, so that the specific formats of the target codes obtained after different source codes are adjusted by the preset adjustment rule are the same.
The preset adjustment rule is a rule for defining a format of the source code, that is, the preset adjustment rule already defines what format the code is written in, so that the specific format of the finally output target code can be indicated by the preset adjustment rule, and the preset adjustment rule may be, but is not limited to: for example, each line, each type of character/character string, and the head file of the code are required to be indented with a first preset space, each type of character or character string is separated with spaces, and the head file is required to be indented with a second preset space, where the first preset space and the second preset space may be the same or different, and the first preset space and the second preset space are not limited in this embodiment and are not described herein.
In this embodiment, an optional structure of the adjusting module 13 is: the adjustment module 13 includes: a type adjustment rule determination unit, a common adjustment rule determination unit, and an adjustment unit.
The type adjustment rule determining unit is used for selecting a type adjustment rule corresponding to the type from preset adjustment rules based on the type corresponding to each of the independent components; the common adjustment rule determining unit is used for selecting a common adjustment rule from preset adjustment rules, and the common adjustment rule is used for limiting the format of the part except the independent component in each line of codes; and the adjusting unit is used for carrying out format adjustment on the source code based on the selected type adjusting rule and the common adjusting rule to obtain the target code with the format required by the type adjusting rule and the common adjusting rule.
The common adjustment rule is used for defining common formats in different codes, such as the format of each line of codes and the format of a header file, for example, the number of the retracted codes of each line is defined, whether the header file allows retraction is allowed, and the like. Common adjustment rules may include, but are not limited to: code indentation, redundant empty line deletion and empty line addition, curly brace alignment and only one program language for each line; the type adjustment rule is an adjustment for a certain type of character or character string, which may include but is not limited to: and adjusting spaces before and after the character or the character string.
In this embodiment, the indented extent of the code indentation is related to the logical position of the corresponding program statement in the source code, and the indented extent of the code indentation of the program statement in different logical positions is different, for example, for the nested statements such as if and else, where there are multiple levels of nesting, the indented extent of the code indentation used in each level of nesting is different. Redundant empty line deletions and empty line additions may be adjusted according to the particular format desired.
The specific implementation process and principle are the same as those in the above method embodiments, and are not described herein again.
According to the technical scheme, the source code needing code format adjustment is obtained, all characters in the source code are segmented in the code compiling process of the source code to obtain a plurality of independent components, types corresponding to the independent components are obtained, the source code is subjected to format adjustment based on the types corresponding to the independent components to obtain the target code with a specific format, wherein the specific formats of the target codes obtained after the formats of different source codes are adjusted are the same, so that the uniform format adjustment of the source codes needing the code format adjustment is realized, thereby improving the readability of the code, and the uniform format enables each programmer to accurately understand the thought and meaning of the code, the code can be accurately changed on the basis of accurately understanding the thought and the intention, so that the code is favorably modified and maintained. And the character segmentation is carried out in the code compiling process of the source code, so that the code compiling and the code format adjustment can be carried out simultaneously by inputting the characters once, and the time consumption of the code compiling and the code format adjustment is reduced.
It should be noted that, in this specification, each embodiment is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and portions that are the same as and similar to each other in each embodiment may be referred to. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A method of format adjustment, the method comprising:
acquiring a source code which needs to be subjected to code format adjustment;
dividing all characters in the source code in the code compiling process of the source code to obtain a plurality of independent components, and obtaining types corresponding to the independent components respectively, wherein any independent component in the independent components is a character string or a character;
based on the types corresponding to the independent components, carrying out format adjustment on the source code to obtain target codes with specific formats, wherein the specific formats of the target codes obtained after different source codes are subjected to format adjustment are the same;
the dividing all characters in the source code in the code compiling process of the source code to obtain a plurality of independent components, and the obtaining of the types corresponding to the independent components comprises:
in the code compiling process of the source code, inputting each line of characters of the source code one by one;
obtaining a character group included by each line of characters, wherein the character group included by any line of characters comprises: a character group between the first character and the 1 st space in the line of characters, a character group between the last character and the nth space in the line of characters, and a character group between the ith space and the (i + 1) th space, wherein the value of i is 2 to n-1, and n is the total number of spaces included in the source code;
for any character set: determining whether the characters included in the character group are character strings or characters based on a first preset matching rule, and obtaining the type based on a second preset matching rule;
the first preset matching rule is the number of characters included in the character group, and if the number of characters in the character group is greater than 1, the character group is determined to be a character string; if the number of the characters in the character group is equal to 1, determining that the character group is a character;
the second preset matching rule is matched based on preset reserved words, operators, punctuation marks and numbers;
the method comprises the following steps of: determining whether the character included in the character group is a character string or a character based on a first preset matching rule, and obtaining the type based on a second preset matching rule comprises:
the following steps are performed for any character group:
if all contents carried by the character group are one of numbers, operators, punctuations and preset reserved words, the type of the character group is the type corresponding to all contents, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if all contents carried by the character group do not comprise numbers, operators, punctuation marks and preset reserved words, the type of the character group is an independent variable type, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if the number of the characters in the character group is more than 1 and the character group comprises at least one preset character, splitting the character group through the preset character, determining the preset character as one character, wherein the type of the character is the same as that of the preset character, and the preset character comprises at least one of an operator, a punctuation mark and a number;
any part of the parts obtained by splitting the preset character is divided into: if the number of characters in the part is equal to 1, determining that the part is a character and the type of the character is an argument type, if the number of characters in the part is greater than 1, determining that the part is a character string, if the character string is the same as one of preset reserved words, determining that the type of the character string is a reserved word type, otherwise, determining that the type of the character string is an argument type.
2. The method of claim 1, wherein formatting the source code based on the type of each of the plurality of independent components to obtain object code having a specific format comprises:
and adjusting the format of the source code based on the type and the preset adjustment rule corresponding to each of the independent components to obtain the target code with the specific format, so that the specific formats of the target codes obtained after different source codes are adjusted by the preset adjustment rule are the same.
3. The method of claim 2, wherein the format adjusting the source code based on the type and the preset adjustment rule corresponding to each of the plurality of independent components to obtain the target code having a specific format comprises:
based on the types corresponding to the independent components, selecting a type adjusting rule corresponding to the type from preset adjusting rules;
selecting a common adjusting rule from the preset adjusting rules, wherein the common adjusting rule is used for carrying out format limitation on parts except for independent components in each line of codes;
and carrying out format adjustment on the source code based on the selected type adjustment rule and the common adjustment rule to obtain a target code with a format required by the type adjustment rule and the common adjustment rule.
4. A format adjustment apparatus, the apparatus comprising:
the acquisition module is used for acquiring a source code which needs to be subjected to code format adjustment;
the type determining module is used for segmenting all characters in the source code in the code compiling process of the source code to obtain a plurality of independent components and obtain types corresponding to the independent components respectively, wherein any independent component in the independent components is a character string or a character;
the adjusting module is used for carrying out format adjustment on the source code based on the types corresponding to the independent components to obtain target codes with specific formats, wherein the specific formats of the target codes obtained after format adjustment is carried out on different source codes are the same;
the type determination module includes:
the reading unit is used for inputting each line of characters of the source code one by one in the code compiling process of the source code;
the segmentation unit is used for obtaining a character group included by each line of characters, wherein the character group included by any line of characters comprises: a character group between the first character and the 1 st space in the line of characters, a character group between the last character and the nth space in the line of characters, and a character group between the ith space and the (i + 1) th space, wherein the value of i is 2 to n-1, and n is the total number of spaces included in the source code;
a type determination unit for, for any character group: determining whether the characters included in the character group are character strings or characters based on a first preset matching rule, and obtaining the type based on a second preset matching rule;
the first preset matching rule is the number of characters included in the character group, and if the number of characters in the character group is greater than 1, the character group is determined to be a character string; if the number of the characters in the character group is equal to 1, determining that the character group is a character;
the second preset matching rule is matched based on preset reserved words, operators, punctuation marks and numbers;
the type determining unit is used for executing the following steps on any character group:
if all contents carried by the character group are one of numbers, operators, punctuation marks and preset reserved words, the type of the character group is the type corresponding to all the contents, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if all contents carried by the character group do not comprise numbers, operators, punctuation marks and preset reserved words, the type of the character group is an independent variable type, if the number of characters in the character group is more than 1, the character group is determined to be a character string, and if the number of characters in the character group is equal to 1, the character group is determined to be a character;
if the number of the characters in the character group is larger than 1 and the character group comprises at least one preset character, splitting the character group through the preset character, determining the preset character as one character, wherein the type of the character is the same as that of the preset character, and the preset character comprises at least one of an operator, a punctuation mark and a number;
any part of the parts obtained by splitting the preset character is divided into: if the number of characters in the part is equal to 1, determining that the part is a character and the type of the character is an argument type, if the number of characters in the part is greater than 1, determining that the part is a character string, if the character string is the same as one of preset reserved words, determining that the type of the character string is a reserved word type, otherwise, determining that the type of the character string is an argument type.
5. The apparatus according to claim 4, wherein the adjusting module is configured to perform format adjustment on the source code based on a type and a preset adjusting rule corresponding to each of the plurality of independent components to obtain an object code with a specific format, so that the specific formats of the object codes obtained after different source codes are adjusted by the preset adjusting rule are the same.
6. The apparatus of claim 4, wherein the adjustment module comprises:
a type adjustment rule determining unit, configured to select, based on respective types corresponding to the plurality of independent components, a type adjustment rule corresponding to the type from preset adjustment rules;
a common adjustment rule determining unit, configured to select a common adjustment rule from the preset adjustment rules, where the common adjustment rule is used to perform format limitation on a part of each line of codes except for an independent component;
and the adjusting unit is used for carrying out format adjustment on the source code based on the selected type adjusting rule and the common adjusting rule to obtain the target code with the format required by the type adjusting rule and the common adjusting rule.
CN201811609730.6A 2018-12-27 2018-12-27 Format adjustment method and device Active CN109710304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811609730.6A CN109710304B (en) 2018-12-27 2018-12-27 Format adjustment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811609730.6A CN109710304B (en) 2018-12-27 2018-12-27 Format adjustment method and device

Publications (2)

Publication Number Publication Date
CN109710304A CN109710304A (en) 2019-05-03
CN109710304B true CN109710304B (en) 2022-06-24

Family

ID=66258551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811609730.6A Active CN109710304B (en) 2018-12-27 2018-12-27 Format adjustment method and device

Country Status (1)

Country Link
CN (1) CN109710304B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631606B (en) * 2020-12-31 2024-07-09 中国农业银行股份有限公司 Script formatting method and device
CN113220306A (en) * 2021-05-31 2021-08-06 支付宝(杭州)信息技术有限公司 Operation execution method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050085A (en) * 2014-06-25 2014-09-17 北京思特奇信息技术股份有限公司 Forced code standard inspection method and system
CN104636320A (en) * 2015-01-29 2015-05-20 小米科技有限责任公司 Data processing method and device
CN107357733A (en) * 2017-07-17 2017-11-17 万帮充电设备有限公司 Improve the method and device of code quality
CN107515739A (en) * 2016-06-16 2017-12-26 阿里巴巴集团控股有限公司 Improve the method and device of code execution performance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012128690A (en) * 2010-12-15 2012-07-05 Canon Inc Information processor and method for controlling information processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050085A (en) * 2014-06-25 2014-09-17 北京思特奇信息技术股份有限公司 Forced code standard inspection method and system
CN104636320A (en) * 2015-01-29 2015-05-20 小米科技有限责任公司 Data processing method and device
CN107515739A (en) * 2016-06-16 2017-12-26 阿里巴巴集团控股有限公司 Improve the method and device of code execution performance
CN107357733A (en) * 2017-07-17 2017-11-17 万帮充电设备有限公司 Improve the method and device of code quality

Also Published As

Publication number Publication date
CN109710304A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
Hnátková et al. The SYN-series corpora of written Czech.
CN109683881B (en) Code format adjusting method and device
US20090259670A1 (en) Apparatus and Method for Conditioning Semi-Structured Text for use as a Structured Data Source
US20150134321A1 (en) System and method for translating text
KR20160138077A (en) Machine translation system and method
US9817887B2 (en) Universal text representation with import/export support for various document formats
US20150026159A1 (en) Digital Resource Set Integration Methods, Interfaces and Outputs
US20200193083A1 (en) Analyzing Document Content and Generating an Appendix
CN109710304B (en) Format adjustment method and device
Erjavec Automatic linguistic annotation of historical language: ToTrTaLe and XIX century Slovene
CN104391837A (en) Intelligent grammatical analysis method based on case semantics
Shterionov et al. A roadmap to neural automatic post-editing: an empirical approach
CN108519963B (en) Method for automatically converting process model into multi-language text
Kieraś et al. Morphosyntactic annotation of historical texts. The making of the baroque corpus of Polish
Terčon et al. CLASSLA-Stanza: The next step for linguistic processing of South Slavic Languages
Hamann et al. Detailed mark‐up of semi‐monographic legacy taxonomic works using FlorML
CN112328621A (en) SQL conversion method and device, computer equipment and computer readable storage medium
Hocking et al. Optical character recognition for South African languages
US20140115447A1 (en) Centering Mathematical Objects in Documents
Berta et al. Employing issues and commits for in-code sentence based use case identification and remodularization
Grønvik et al. What should the electronic dictionary do for you–and how?
Alosaimy et al. Web-based annotation tool for inflectional language resources
Krishna et al. SHR++: An interface for morpho-syntactic annotation of Sanskrit corpora
Gibbon Legacy language atlas data mining: Mapping Kru languages
Barbierik et al. Simple and Effective User Interface for the Dictionary Writing System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant