9.3 Lexical analysis
The input production defines the lexical structure of a C# source file. Each
source file in a C# program must
conform to this lexical grammar production.
input::
input-sectionopt
input-section::
input-section-part
input-section input-section-part
input-section-part::
input-elementsopt new-line
pp-directive
input-elements::
input-element
input-elements input-element
input-element::
whitespace
comment
token
Five basic elements make up the lexical structure of a C# source file: Line
terminators (?.3.1), white space
(?.3.3), comments (?.3.2), tokens (?.4), and pre-processing directives (?
.5). Of these basic elements, only
tokens are significant in the syntactic grammar of a C# program (?.2.2).
The lexical processing of a C# source file consists of reducing the file
into a sequence of tokens which becomes
the input to the syntactic analysis. Line terminators, white space, and
comments can serve to separate tokens,
and pre-processing directives can cause sections of the source file to be
skipped, but otherwise these lexical
elements have no impact on the syntactic structure of a C# program.
When several lexical grammar productions match a sequence of characters in
a source file, the lexical processing
always forms the longest possible lexical element. [Example: For example,
the character sequence // is
processed as the beginning of a single-line comment because that lexical
element is longer than a single / token.
end example]
9.3.1 Line terminators
Line terminators divide the characters of a C# source file into lines.
new-line::
Carriage return character (U+000D)
Line feed character (U+000A)
Carriage return character (U+000D) followed by line feed character (U+000A)
Next line character (U+0085)
Line separator character (U+2028)
Paragraph separator character (U+2029)
For compatibility with source code editing tools that add end-of-file
markers, and to enable a source file to be
viewed as a sequence of properly terminated lines, the following
transformations are applied, in order, to every
source file in a C# program:
?If the last character of the source file is a Control-Z character
(U+001A), this character is deleted.
?A carriage-return character (U+000D) is added to the end of the source
file if that source file is non-empty
and if the last character of the source file is not a carriage return
(U+000D), a line feed (U+000A), a line
separator (U+2028), or a paragraph separator (U+2029).
Chapter 9 Lexical structure
53
9.3.2 Comments
Two forms of comments are supported: delimited comments and single-line
comments.
A delimited comment begins with the characters /* and ends with the
characters */. Delimited comments can
occupy a portion of a line, a single line, or multiple lines. [Example: The
example
/* Hello, world program
This program writes .hello, world. to the console
*/
class Hello
{
static void Main() {
System.Console.WriteLine("hello, world");
}
}
includes a delimited comment. end example]
A single-line comment begins with the characters // and extends to the end
of the line. [Example: The example
// Hello, world program
// This program writes .hello, world. to the console
//
class Hello // any name will do for this class
{
static void Main() { // this method must be named "Main"
System.Console.WriteLine("hello, world");
}
}
shows several single-line comments. end example]
comment::
single-line-comment
delimited-comment
single-line-comment::
// input-charactersopt
input-characters::
input-character
input-characters input-character
input-character::
Any Unicode character except a new-line-character
new-line-character::
Carriage return character (U+000D)
Line feed character (U+000A)
Next line character (U+0085)
Line separator character (U+2028)
Paragraph separator character (U+2029)
delimited-comment::
/* delimited-comment-textopt asterisks /
delimited-comment-text::
delimited-comment-section
delimited-comment-text delimited-comment-section
delimited-comment-section::
not-asterisk
asterisks not-slash
C# LANGUAGE SPECIFICATION
54
asterisks::
*
asterisks *
not-asterisk::
Any Unicode character except *
not-slash::
Any Unicode character except /
Comments do not nest. The character sequences /* and */ have no special
meaning within a single-line
comment, and the character sequences // and /* have no special meaning
within a delimited comment.
Comments are not processed within character and string literals.
9.3.3 White space
White space is defined as any character with Unicode class Zs (which
includes the space character) as well as the
horizontal tab character, the vertical tab character, and the form feed
character.
whitespace::
Any character with Unicode class Zs
Horizontal tab character (U+0009)
Vertical tab character (U+000B)
Form feed character (U+000C)