Lecture 8 - Linking & Formal Languages

Thursday, June 19, 2014

5:01 PM

Recall:

Need to change the assembler to support linking.

When the assembler encounters .word id where label :id not found, it puts 0 and indicates that the program requires the value of id before it can run.

How?

Make an entry in the MERL file.

But: we lose a valuable error-check:

Example

lis $3

.word abc

…

abd: ...

...

What if we meant abd when we said abc?

assembler asks abc to be linked in instead of signaling an error.
How can the assembler tell between errors and intentional behaviour?

New assembler directive:

.import id

tells the assembler to ask for id to be linked in

does NOT assemble to a word of MIPS.

So when the assembler encounters .word abc,

if label abc: does not occur and no .import abc directive, then error.

Format code: 0x11 means external symbol reference (ESR)

What information needs to be recorded?

Name of the symbol

where (at what address) is the symbol being used?

where is the blank that needs to be filled in?

Format of an ESR entry:

word 1: 0x11

word 2: location where the symbol is used

word 3: length of the name in chars(n)

word 4: ASCII characters in the symbol’s name (one word per char)

...

word 3+n: ASCII characters in the symbol’s name

The other side:

one.asm

two.asm

three.asm

.import abc

lis $3

.word abc

abc:

sw $3, -4($30)

…

jr $31

…

abc:

add $3, $3, $2

…

beq $2, $0, abc

external reference to abc

abc is the beginning of a library procedure

abc is referring to a loop (random internal label)

How can the linker know which abc to link to?

Can’t assume labels won’t be duplicated.

How can we make abc available in two.asm and unavailable in three.asm?

Solution: another assembler directive + MERL entry type.

Assembler directive: .export abc

make abc available for linking
does not assemble to a word of MIPS
tells assembler to make an entry in the MERL table

MERL entry type: External Symbol Definition (ESD)

word 1: 0x05 (format code for ESD)

word 2: address the symbol represents

word 3: length of the name in chars(n)

word 4: ASCII characters in the symbol’s name (one word per char)

...

word 3+n: ASCII characters in the symbol’s name

MERL now contains (ie everything the linker needs to do its job):

the code

all addresses that need relocating

address + names of every ESR

address + names of every ESD

Linker Algorithm

Input: merl files m1 and m2

Output: single merl file with m2 linked after m1

alpha <- m1.codelen – 12

relocate m2.code by alpha

add alpha to every address to m2.symtbl

if m1.exports.labels intersect m2.exports.labels not empty:

ERROR

for each <addr1, label> in m1.imports:

if there exists <addr2, label> in m2.exports:

m1.code[addr1] <- addr2

remove <addr1, label> from m1.imports

add addr1 to m1.relocates

for each <addr2, label> in m2.imports:

if there exists <addr1, label> in m1.exports:

m2.code[addr2] <- addr1

remove <addr2, label> from m2.imports

add addr2 to m2.relocates

imports = m1.imports union m2.imports

exports = m1.exports union m2.exports

relocates = m1.relocates union m2.relocates

output MERL cookie

output total codeLen + total(imports, exports, relocates) + 12

output total codeLen + 12

output m1.code

output m2.code

output imports, exports, relocates

Formal Languages

Assembly Language

simple structure;

easy to recognize (parse);

straight forward, unambiguous translation to machine language.

High-Level Languages

more complex structure

harder to recognize

no single translation to machine language

How can we handle the complexity?

Want a formal theory of string recognition – general principles that work in the context of any programming language

Definitions:

alphabet

finite set of symbols (eg: {a, b, c})

typically denoted Σ, as in Σ = {a, b, c}

string (or word)

finite sequence of symbols (from Σ)

eg) a, aba, cbca, abc

length of a word, |w| = # of chars in w, eg |aba| = 3

empty string

an empty sequence of symbols

ε denotes the empty string

ε is not a symbol, ε not equal Σ, |ε| = 0

language

a set of strings

eg {a2nb | n >= 0}

b, aab, aaaab, …
{all words with an even # of a’s followed by b}

Note: ε – the empty word

{} or 0 – empty language (contains no words

{ε – singleton language that contains only ε