9.3 Lexical analysis

The input production defines the lexical structure of a C# source file. Each

source file in a C# program must

conform to this lexical grammar production.

input::

input-sectionopt

input-section::

input-section-part

input-section input-section-part

input-section-part::

input-elementsopt new-line

pp-directive

input-elements::

input-element

input-elements input-element

input-element::

whitespace

comment

token

Five basic elements make up the lexical structure of a C# source file: Line

terminators (?.3.1), white space

(?.3.3), comments (?.3.2), tokens (?.4), and pre-processing directives (?

.5). Of these basic elements, only

tokens are significant in the syntactic grammar of a C# program (?.2.2).

The lexical processing of a C# source file consists of reducing the file

into a sequence of tokens which becomes

the input to the syntactic analysis. Line terminators, white space, and

comments can serve to separate tokens,

and pre-processing directives can cause sections of the source file to be

skipped, but otherwise these lexical

elements have no impact on the syntactic structure of a C# program.

When several lexical grammar productions match a sequence of characters in

a source file, the lexical processing

always forms the longest possible lexical element. [Example: For example,

the character sequence // is

processed as the beginning of a single-line comment because that lexical

element is longer than a single / token.

end example]

9.3.1 Line terminators

Line terminators divide the characters of a C# source file into lines.

new-line::

Carriage return character (U+000D)

Line feed character (U+000A)

Carriage return character (U+000D) followed by line feed character (U+000A)

Next line character (U+0085)

Line separator character (U+2028)

Paragraph separator character (U+2029)

For compatibility with source code editing tools that add end-of-file

markers, and to enable a source file to be

viewed as a sequence of properly terminated lines, the following

transformations are applied, in order, to every

source file in a C# program:

?If the last character of the source file is a Control-Z character

(U+001A), this character is deleted.

?A carriage-return character (U+000D) is added to the end of the source

file if that source file is non-empty

and if the last character of the source file is not a carriage return

(U+000D), a line feed (U+000A), a line

separator (U+2028), or a paragraph separator (U+2029).

Chapter 9 Lexical structure

9.3.2 Comments

Two forms of comments are supported: delimited comments and single-line

comments.

A delimited comment begins with the characters /* and ends with the

characters */. Delimited comments can

occupy a portion of a line, a single line, or multiple lines. [Example: The

example

/* Hello, world program

This program writes .hello, world. to the console

class Hello

{

static void Main() {

System.Console.WriteLine("hello, world");

}

includes a delimited comment. end example]

A single-line comment begins with the characters // and extends to the end

of the line. [Example: The example

// Hello, world program

// This program writes .hello, world. to the console

class Hello // any name will do for this class

{

static void Main() { // this method must be named "Main"

System.Console.WriteLine("hello, world");

}

shows several single-line comments. end example]

comment::

single-line-comment

delimited-comment

single-line-comment::

// input-charactersopt

input-characters::

input-character

input-characters input-character

input-character::

Any Unicode character except a new-line-character

new-line-character::

Carriage return character (U+000D)

Line feed character (U+000A)

Next line character (U+0085)

Line separator character (U+2028)

Paragraph separator character (U+2029)

delimited-comment::

/* delimited-comment-textopt asterisks /

delimited-comment-text::

delimited-comment-section

delimited-comment-text delimited-comment-section

delimited-comment-section::

not-asterisk

asterisks not-slash

C# LANGUAGE SPECIFICATION

asterisks::

asterisks *

not-asterisk::

Any Unicode character except *

not-slash::

Any Unicode character except /

Comments do not nest. The character sequences /* and */ have no special

meaning within a single-line

comment, and the character sequences // and /* have no special meaning

within a delimited comment.

Comments are not processed within character and string literals.

9.3.3 White space

White space is defined as any character with Unicode class Zs (which

includes the space character) as well as the

horizontal tab character, the vertical tab character, and the form feed

character.

whitespace::

Any character with Unicode class Zs

Horizontal tab character (U+0009)

Vertical tab character (U+000B)

Form feed character (U+000C)