Lecture 8 - Linking & Formal Languages

Thursday, June 19, 2014

5:01 PM



    Important Recall:

    Need to change the assembler to support linking.

    When the assembler encounters .word id where label :id not found, it puts 0 and indicates that the program requires the value of id before it can run.


    Question How?

    Make an entry in the MERL file.


    But: we lose a valuable error-check:



    lis $3

    .word abc

    abd: ...



    What if we meant abd when we said abc?

    • assembler asks abc to be linked in instead of signaling an error.
    • How can the assembler tell between errors and intentional behaviour?


    New assembler directive:

    .import id

    • tells the assembler to ask for id to be linked in
    • does NOT assemble to a word of MIPS.


    So when the assembler encounters .word abc,

    • if label abc: does not occur and no .import abc directive, then error.


    Format code: 0x11 means external symbol reference (ESR)


    What information needs to be recorded?

    • Name of the symbol
    • where (at what address) is the symbol being used?
      • where is the blank that needs to be filled in?


    Format of an ESR entry:

    word 1: 0x11

    word 2: location where the symbol is used

    word 3: length of the name in chars(n)

    word 4: ASCII characters in the symbol’s name (one word per char)


    word 3+n: ASCII characters in the symbol’s name


    The other side:




    .import abc

    lis $3

    .word abc


      sw $3, -4($30)


      jr $31


      add $3, $3, $2


       beq $2, $0, abc

    external reference to abc

    abc is the beginning of a library procedure

    abc is referring to a loop (random internal label)


    How can the linker know which abc to link to?

    Can’t assume labels won’t be duplicated.

    How can we make abc available in two.asm and unavailable in three.asm?

    • Solution: another assembler directive + MERL entry type.


    Assembler directive: .export abc

    • make abc available for linking
    • does not assemble to a word of MIPS
    • tells assembler to make an entry in the MERL table


    MERL entry type: External Symbol Definition (ESD)


    word 1: 0x05 (format code for ESD)

    word 2: address the symbol represents

    word 3: length of the name in chars(n)

    word 4: ASCII characters in the symbol’s name (one word per char)


    word 3+n: ASCII characters in the symbol’s name


    MERL now contains (ie everything the linker needs to do its job):

    • the code
    • all addresses that need relocating
    • address + names of every ESR
    • address + names of every ESD


    Important Linker Algorithm

    Input: merl files m1 and m2

    Output: single merl file with m2 linked after m1


    alpha <- m1.codelen – 12

    relocate m2.code by alpha

    add alpha to every address to m2.symtbl


    if m1.exports.labels intersect m2.exports.labels not empty:



    for each <addr1, label> in m1.imports:

    if there exists <addr2, label> in m2.exports:

    m1.code[addr1] <- addr2

    remove <addr1, label> from m1.imports

    add addr1 to m1.relocates

    for each <addr2, label> in m2.imports:

    if there exists <addr1, label> in m1.exports:

    m2.code[addr2] <- addr1

    remove <addr2, label> from m2.imports

    add addr2 to m2.relocates


    imports = m1.imports union m2.imports

    exports = m1.exports union m2.exports

    relocates = m1.relocates union m2.relocates


    output MERL cookie

    output total codeLen + total(imports, exports, relocates) + 12

    output total codeLen + 12

    output m1.code

    output m2.code

    output imports, exports, relocates



    Important Formal Languages

    Assembly Language

    • simple structure;
    • easy to recognize (parse);
    • straight forward, unambiguous translation to machine language.

    High-Level Languages

    • more complex structure
    • harder to recognize
    • no single translation to machine language


    How can we handle the complexity?


    Want a formal theory of string recognition – general principles that work in the context of any programming language




    • finite set of symbols (eg: {a, b, c})
    • typically denoted Σ, as in Σ = {a, b, c}

    string (or word)

    • finite sequence of symbols (from Σ)
    • eg) a, aba, cbca, abc

    length of a word, |w| = # of chars in w, eg |aba| = 3

    empty string

    • an empty sequence of symbols
    • ε denotes the empty string

    ε is not a symbol, ε not equal Σ, |ε| = 0


    • a set of strings
    • eg {a2nb | n >= 0}
      • b, aab, aaaab, …
      • {all words with an even # of a’s followed by b}
    • Note: ε – the empty word
      • {} or 0 – empty language (contains no words
      • {ε – singleton language that contains only ε





Created with Microsoft OneNote 2010
One place for all your notes and information