compiler, linker, loader


=====> COMPILATION PROCESS <======

                     |
                     |---->  Input is Source file(.c)
                     |
                     V
            +=================+
            |                 |
            | C Preprocessor  |
            |                 |
            +=================+
                     |
                     | ---> Pure C file ( comd:cc -E <file.name> )
                     |
                     V
            +=================+
            |                 |
            | Lexical Analyzer|
            |                 |
            +-----------------+
            |                 |
            | Syntax Analyzer |
            |                 |
            +-----------------+
            |                 |
            | Semantic Analyze|
            |                 |
            +-----------------+
            |                 |
            | Pre Optimization|
            |                 |
            +-----------------+
            |                 |
            | Code generation |
            |                 |
            +-----------------+
            |                 |
            | Post Optimize   |
            |                 |
            +=================+
                     |
                     |--->  Assembly code (comd: cc -S <file.name> )
                     |
                     V
            +=================+
            |                 |
            |   Assembler     |
            |                 |
            +=================+
                     |
                     |--->  Object file (.obj) (comd: cc -c <file.name>)
                     |
                     V
            +=================+
            |     Linker      |
            |      and        |
            |     loader      |
            +=================+
                     |
                     |--->  Executable (.Exe/a.out) (com:cc <file.name> ) 
                     |
                     V
            Executable file(a.out)

C preprocessor :-

C preprocessing is the first step in the compilation. It handles:
  1. define statements.

  2. include statements.

  3. Conditional statements.
  4. Macros
The purpose of the unit is to convert the C source file into Pure C code file.

C compilation :

There are Six steps in the unit :

1) Lexical Analyzer:

It combines characters in the source file, to form a "TOKEN". A token is a set of characters that does not have 'space', 'tab' and 'new line'. Therefore this unit of compilation is also called "TOKENIZER". It also removes the comments, generates symbol table and relocation table entries.

2) Syntactic Analyzer:

This unit check for the syntax in the code. For ex:
{
    int a;
    int b;
    int c;
    int d;

    d = a + b - c *   ;
}
The above code will generate the parse error because the equation is not balanced. This unit checks this internally by generating the parser tree as follows:
                            =
                          /   \
                        d       -
                              /     \
                            +           *
                          /   \       /   \
                        a       b   c       ?
Therefore this unit is also called PARSER.

3) Semantic Analyzer:

This unit checks the meaning in the statements. For ex:
{
    int i;
    int *p;

    p = i;
    -----
    -----
    -----
}
The above code generates the error "Assignment of incompatible type".

4) Pre-Optimization:

This unit is independent of the CPU, i.e., there are two types of optimization
  1. Preoptimization (CPU independent)
  2. Postoptimization (CPU dependent)
This unit optimizes the code in following forms:
  • I) Dead code elimination
  • II) Sub code elimination
  • III) Loop optimization

I) Dead code elimination:

For ex:
{
    int a = 10;
    if ( a > 5 ) {
        /*
        ...
        */
    } else {
       /*
       ...
       */
    }
}
Here, the compiler knows the value of 'a' at compile time, therefore it also knows that the if condition is always true. Hence it eliminates the else part in the code.

II) Sub code elimination:

For ex:
{
    int a, b, c;
    int x, y;

    /*
    ...
    */

    x = a + b;
    y = a + b + c;

    /*
    ...
    */
}
can be optimized as follows:
{
    int a, b, c;
    int x, y;

    /*
     ...
    */

    x = a + b;
    y = x + c;      // a + b is replaced by x

    /*
     ...
    */
}

III) Loop optimization:

For ex:
{
    int a;
    for (i = 0; i < 1000; i++ ) {

    /*
     ...
    */

    a = 10;

    /*
     ...
    */
    }
}
In the above code, if 'a' is local and not used in the loop, then it can be optimized as follows:
{
    int a;
    a = 10;
    for (i = 0; i < 1000; i++ ) {
        /*
        ...
        */
    }
}
5) Code generation:
Here, the compiler generates the assembly code so that the more frequently used variables are stored in the registers.
6) Post-Optimization:
Here the optimization is CPU dependent. Suppose if there are more than one jumps in the code then they are converted to one as:
            -----
        jmp:<addr1>
<addr1> jmp:<addr2>
            -----
            -----
The control jumps to the directly.
Then the last phase is Linking (which creates executable or library). When the executable is run, the libraries it requires are Loaded.

To understand linkers, it helps to first understand what happens "under the hood" when you convert a source file (such as a C or C++ file) into an executable file (an executable file is a file that can be executed on your machine or someone else's machine running the same machine architecture).
Under the hood, when a program is compiled, the compiler converts the source file into object byte code. This byte code (sometimes called object code) is mnemonic instructions that only your computer architecture understands. Traditionally, these files have an .OBJ extension.
After the object file is created, the linker comes into play. More often than not, a real program that does anything useful will need to reference other files. In C, for example, a simple program to print your name to the screen would consist of:
printf("Hello Kristina!\n");
When the compiler compiled your program into an obj file, it simply puts a reference to the printffunction. The linker resolves this reference. Most programming languages have a standard library of routines to cover the basic stuff expected from that language. The linker links your OBJ file with this standard library. The linker can also link your OBJ file with other OBJ files. You can create other OBJ files that have functions that can be called by another OBJ file. The linker works almost like a word processor's copy and paste. It "copies" out all the necessary functions that your program references and creates a single executable. Sometimes other libraries that are copied out are dependent on yet other OBJ or library files. Sometimes a linker has to get pretty recursive to do its job.
Note that not all operating systems create a single executable. Windows, for example, uses DLLs that keep all these functions together in a single file. This reduces the size of your executable, but makes your executable dependent on these specific DLLs. DOS used to use things called Overlays (.OVL files). This had many purposes, but one was to keep commonly used functions together in 1 file (another purpose it served, in case you're wondering, was to be able to fit large programs into memory. DOS has a limitation in memory and overlays could be "unloaded" from memory and other overlays could be "loaded" on top of that memory, hence the name, "overlays"). Linux has shared libraries, which is basically the same idea as DLLs (hard core Linux guys I know would tell me there are MANY BIG differences).

Comments

Popular posts from this blog

Smart Pointers in C++ and How to Use Them

Inter-Process Communication

C++ Interview Questions