Recall:
Need
to change the assembler to support linking.
When
the assembler encounters .word id where label :id not found, it puts 0 and
indicates that the program requires the value of id before it can run.
How?
Make
an entry in the MERL file.
But:
we lose a valuable error-check:
Example
lis
$3
.word
abc
…
abd:
...
...
|
What
if we meant abd when we said abc?
- assembler asks abc to be
linked in instead of signaling an error.
- How can the assembler tell
between errors and intentional behaviour?
New
assembler directive:
.import
id
- tells the assembler to ask
for id to be linked in
- does NOT assemble to a
word of MIPS.
So
when the assembler encounters .word abc,
- if label abc: does not
occur and no .import abc directive, then error.
Format
code: 0x11 means external symbol reference
(ESR)
What
information needs to be recorded?
- where (at what address) is
the symbol being used?
- where is the blank that
needs to be filled in?
Format
of an ESR entry:
word
1: 0x11
word
2: location where the symbol is used
word
3: length of the name in chars(n)
word
4: ASCII characters in the symbol’s name (one word per char)
...
word
3+n: ASCII characters in the symbol’s name
|
The
other side:
one.asm
|
two.asm
|
three.asm
|
.import
abc
lis
$3
.word
abc
|
abc:
sw $3, -4($30)
…
jr $31
|
…
abc:
add $3, $3, $2
…
beq $2, $0, abc
|
external
reference to abc
|
abc
is the beginning of a library procedure
|
abc
is referring to a loop (random internal label)
|
How
can the linker know which abc to link to?
Can’t
assume labels won’t be duplicated.
How
can we make abc available in two.asm and unavailable in three.asm?
- Solution: another assembler
directive + MERL entry type.
Assembler
directive: .export abc
- make abc available for
linking
- does not assemble to a
word of MIPS
- tells assembler to make an
entry in the MERL table
MERL
entry type: External Symbol Definition (ESD)
word
1: 0x05 (format code for ESD)
word
2: address the symbol represents
word
3: length of the name in chars(n)
word
4: ASCII characters in the symbol’s name (one word per char)
...
word
3+n: ASCII characters in the symbol’s name
|
MERL
now contains (ie everything the linker needs to do its job):
- all addresses that need
relocating
- address + names of every
ESR
- address + names of every
ESD
Linker Algorithm
Input:
merl files m1 and m2
Output:
single merl file with m2 linked after m1
alpha
<- m1.codelen – 12
relocate
m2.code by alpha
add
alpha to every address to m2.symtbl
if
m1.exports.labels intersect m2.exports.labels not empty:
ERROR
for
each <addr1, label> in m1.imports:
if there exists <addr2, label> in m2.exports:
m1.code[addr1] <- addr2
remove <addr1, label> from m1.imports
add addr1 to m1.relocates
for
each <addr2, label> in m2.imports:
if there exists <addr1, label> in m1.exports:
m2.code[addr2] <- addr1
remove <addr2, label> from m2.imports
add addr2 to m2.relocates
imports
= m1.imports union m2.imports
exports
= m1.exports union m2.exports
relocates
= m1.relocates union m2.relocates
output
MERL cookie
output
total codeLen + total(imports, exports, relocates) + 12
output
total codeLen + 12
output
m1.code
output
m2.code
output
imports, exports, relocates
|
Formal Languages
Assembly
Language
- easy to recognize (parse);
- straight forward,
unambiguous translation to machine language.
High-Level
Languages
- no single translation to
machine language
How
can we handle the complexity?
Want a
formal theory of string recognition – general principles that work in the
context of any programming language
Definitions:
alphabet
- finite set of symbols (eg:
{a, b, c})
- typically
denoted Σ, as in Σ = {a, b, c}
string
(or word)
- finite sequence
of symbols (from Σ)
length
of a word, |w| = # of chars in w, eg |aba| = 3
empty
string
- an empty sequence of
symbols
- ε denotes the empty string
ε is not
a symbol, ε not equal Σ, |ε| = 0
language
- eg {a2nb | n >= 0}
- b, aab, aaaab, …
- {all words with an even #
of a’s followed by b}
- Note: ε – the empty word
- {} or 0 – empty language
(contains no words
- {ε – singleton
language that contains only ε