9.1 Programs
A C# program consists of one or more source files, known formally as
compilation units (§16.1). A source file is
an ordered sequence of Unicode characters. Source files typically have a
one-to-one correspondence with files in
a file system, but this correspondence is not required.
Conceptually speaking, a program is compiled using three steps:
1. Transformation, which converts a file from a particular character
repertoire and encoding scheme into a
sequence of Unicode characters.
2. Lexical analysis, which translates a stream of Unicode input characters
into a stream of tokens.
3. Syntactic analysis, which translates the stream of tokens into
executable code.
Conforming implementations must accept Unicode source files encoded with
the UTF-8 encoding form (as
defined by the Unicode standard), and transform them into a sequence of
Unicode characters. Implementations
may choose to accept and transform additional character encoding schemes
(such as UTF-16, UTF-32, or non-
Unicode character mappings).
[Note: It is beyond the scope of this standard to define how a file using a
character representation other than
Unicode might be transformed into a sequence of Unicode characters. During
such transformation, however, it is
recommended that the usual line-separating character (or sequence) in the
other character set be translated to the
two-character sequence consisting of the Unicode carriage-return character
followed by Unicode line-feed
character. For the most part this transformation will have no visible
effects; however, it will affect the
interpretation of verbatim string literal tokens (§9.4.4.5). The purpose
of this recommendation is to allow a
verbatim string literal to produce the same character sequence when its
source file is moved between systems that
support differing non-Unicode character sets, in particular, those using
differing character sequences for lineseparation.
end note]