Steves Computer Vision Blog: April 2012

Wednesday, April 25, 2012

SURF correspondence and Matching

Today I'm going through using OpenCV (2.1) or Blepo to create a nice and easy to use class for feature correspondence and correspondence matching. SURF stances for Speeded up Robust Features and is an improvement to SIFT (Scale Invariant Feature Transform). It isn't too bad to implement (and a good exercise) if you have the time but OpenCV has a nice version that does multiple octaves and octave layers.
One of the drawbacks of the OpenCV implementation is that they do not have a good correspondence matching algorithm, which is a shame, since a decent one can be done extremely easily. The code I have was written courtesy of Brian Peasley, which I have edited heavily (much to his probable dismay).

Click here for full code (blepo people look in the blepo folder)
Make sure your OpenCV lib and header files are set up properly

I build two classes for this. The first is a class for the feature, descriptor, and match information and is shown below:

class SURFDescriptor{
public:
    bool            valid;
    float             x, y, radius;
    //128 or 64 floats depending on whether extended is chosen or not
    float             *descriptor;
    SURFDescriptor        *match;
};

The second class is the main SURF class for the feature extraction and putative matching. Note that this is not a necessary class and can be done with a series of functions instead. However, since I use a very large library, it is often more convenient for me to group these within classes. Feel free to take code out of context (though be careful about memory leaks).

class SURF{
public:
    typedef SURFDescriptor* Iterator;
    typedef SURFDescriptor* ConstIterator;

    SURF(double hessianThreshold = 500, bool extended = true) //good values are (500, true)
        : m_threshold(hessianThreshold), m_extended(extended), m_CvSeqKeypoints(NULL), m_CvSeqDescriptors(NULL), m_descriptors(NULL), m_size(0) {}
    ~SURF() {}

    //must release before done using, not included in the destructor for shallow copying purposes
    void Release() {
        if(m_storage != NULL) {
            cvClearMemStorage(m_storage);
            cvReleaseMemStorage(&m_storage);
            m_storage = NULL;
        }
        if(m_descriptors != NULL) {
            free(m_descriptors);
            m_descriptors = NULL;
        };
        m_size = 0;
    }
    inline Iterator Begin() { return m_descriptors; }
    inline Iterator End() { return m_descriptors + m_size; }
    inline ConstIterator Begin() const { return m_descriptors; }
    inline ConstIterator End() const { return m_descriptors + m_size; }
    int ExtractFeatures(IplImage *img);
    void OverlayFeatures(IplImage *img);
    IplImage* DisplayMatches(IplImage *img, IplImage *img2);
    int size() const {return m_size;}
    SURFDescriptor &operator()(int index){ return m_descriptors[index]; }
    SURFDescriptor operator()(int index) const { return m_descriptors[index]; }
    SURFDescriptor *operator[](int index) const { return &(m_descriptors[index]); }
    bool extended() const { return m_extended; }
    float PutativeMatch(const SURF &surf2, int prune_dist, float percent = 0.1f);

private:
    double         m_threshold;
    bool        m_extended;
    SURFDescriptor    *m_descriptors;
    CvSeq         *m_CvSeqKeypoints, *m_CvSeqDescriptors;
    CvMemStorage* m_storage;
    int        m_size;
};

There are a couple of nice feature display and matching display functions in the class as well for debugging. If you want those, just download the original file at the top or bottom.
For now, I will just show the extract features and matching algorithms:

#define SQR(X) ((X) * (X))

inline float SAD_Descriptors(const SURFDescriptor &desc1, const SURFDescriptor &desc2, int length){
    float SAD = 0;
    float *p1 = desc1.descriptor, *p2 = desc2.descriptor;
    for(int i = 0; i < length; i++)
        SAD += fabs( *p1++ - *p2++ );

    return SAD;
}

int SURF::ExtractFeatures(IplImage *img){
    m_storage = cvCreateMemStorage(0);
    CvMat *gray_img = cvCreateMat(img->height, img->width, CV_8UC1); //create matrix which is what CV SURF operates on
    cvCvtColor(img, gray_img, CV_BGR2GRAY); //convert color IplImage to a gray scale matrix

    //Find surf features and calculate their descriptors
    CvSURFParams params = cvSURFParams(m_threshold, m_extended ? 1 : 0);
    cvExtractSURF(gray_img, 0, &m_CvSeqKeypoints, &m_CvSeqDescriptors, m_storage, params );

    m_descriptors = (SURFDescriptor *)malloc(m_CvSeqDescriptors->total*sizeof(SURFDescriptor));

    CvSURFPoint *r;
    SURFDescriptor *desc = m_descriptors;
    for(int i = 0; i < m_CvSeqKeypoints->total; i++){
        r = (CvSURFPoint*)cvGetSeqElem( m_CvSeqKeypoints, i );

        desc->x = r->pt.x;
        desc->y = r->pt.y;
        desc->radius = (float)r->size;

        desc->descriptor =    (float *)cvGetSeqElem( m_CvSeqDescriptors, i);
        desc++;
    }
    m_size = m_CvSeqDescriptors->total;
    cvReleaseMat(&gray_img);
    return m_CvSeqDescriptors->total;
}

float SURF::PutativeMatch(const SURF &surf2, int prune_dist, float percent) {
    vector<int>     index1(m_size);
    vector<float>    minval1(m_size);

    vector<int>     index2(surf2.size());
    vector<float>    minval2(surf2.size());

    float val;
    //for every feature in surf1, compare descriptor to every feature in surf2 and save the index
    //of the minimum SAD.
    vector<float>::iterator min1 = minval1.begin(), min2 = minval2.begin();
    vector<int>::iterator in1 = index1.begin(), in2 = index2.begin();
    SURFDescriptor *desc1 = m_descriptors;
    for(unsigned int i = 0; i < minval2.size(); i++)
        *min2++ = 999.0;

    int tempcnt,ext=0;
    while(1)
    {
        tempcnt=0;
        min1 = minval1.begin();
        in1 = index1.begin();
        for(int i = 0; i < m_size; i++){
            *min1 = 999.0;
            SURF::Iterator desc2 = surf2.Begin();
            min2 = minval2.begin();
            in2 = index2.begin();
            for(int j = 0; j < surf2.size(); j++) {
                if(sqrt(SQR(desc1->x-desc2->x) + SQR(desc1->y-desc2->y)) <= (prune_dist+ext))
                    tempcnt++;
                val = SAD_Descriptors( *desc1, *desc2, m_extended ? 128 : 64 );

                //comparing ith point in surf1 to every point in surf2
                if(val < *min1){
                    *min1 = val;
                    *in1 = j;
                }

                //comparing jth point in surf2 to ith point in surf1
                if(val < *min2){
                    *min2 = val;
                    *in2 = i;
                }
                desc2++;
                in2++;
                min2++;
            }
            desc1++;
            in1++;
            min1++;
        }
        if(tempcnt==0)
            ext++;
        else break;
    }

    //sort minvals to get top percent
    int minIndex;
    float temp;
    float avg = 0;
    min1 = minval1.begin();
    //find the average of minval1
    while(min1 != minval1.end())
        avg += *min1++;
    avg /= minval1.size();
    assert(percent > 0 && percent < 1);
    int endSort = (int)(percent*minval1.size());
    min1 = minval1.begin();
    for(unsigned int i = 0; i < endSort; i++) {
        minIndex = i;
        float currMin = *min1;
        vector<float>::const_iterator tmpMin = (minval1.begin()+i+1);
        for(unsigned int j = i+1; j < minval1.size(); j++) {
            if(*tmpMin < currMin) {
                minIndex = j;
                currMin =*tmpMin;
            }
            tmpMin++;
        }
        if(minIndex != i) {
            temp = *min1;
            *min1 = currMin;
            minval1[minIndex] = temp;
        }
        min1++;
    }

    float T = minval1[endSort];
    //see if minimum comparison is putative between feature point sets
    //dirty code, a little bit better now
    desc1 = m_descriptors;
    in1 = index1.begin();
    for(unsigned int i = 0; i < index1.size(); i++)
    {
        if(index2[*in1] == i && minval2[*in1] < T)
            desc1->match = surf2[*in1];
        else
            desc1->match = NULL;
        in1++;
        desc1++;
    }
    return avg;
}

Consider making the matching algorithm better or more efficient, possibly with an implementation involving RANSAC.

If you want the full source, including an example of how it works, download the source below:

Click here for full code (blepo people look in the blepo folder)
Make sure your OpenCV lib and header files are set up properly

Consider donating to further my tinkering.

Places you can find me

Monday, April 16, 2012

Assembly Language

This is a quick tutorial on Assembly Language and some related lab work I got from my TA way back when and have edited:

General Info:

To compile a C program, use the following command:
gcc -g -c -o drv.o drv.c
gcc is the compiler command. -g tells the compiler to prepare for the debugger. -c says to compile, but do not link. -o tells the compiler to rename the output to the following string. So this command is compiling the file drv.c into an object file but not linking it, readying it for debugging, and renaming the output to drv.o.

To assemble an assembly program, use the following command:
as -g -o asm.o asm.s
Similarly, as is the assembler command. -g prepares it for debugging, and -o renames the object file. So this command assembles asm.s, readies it for debugging, and renames the output to asm.o.

You can link those outputs using:
gcc -g -o myprog asm.o drv.o
This uses the gcc command to link two object files, ready it for debugging, and renames the output executable to myprog.

Or, you can do it the easy way:
gcc -g -o myprog drv.c asm.s
This does everything in one step. It compiles, links, and readies for debugging both files, puts the output executable in myprog.

You can change input/output names as needed, but C programs need a .c extension, and Assembly programs need a .s extension.

If these commands work, you'll get dropped back to the command line with no output. If there are errors, they'll be listed for you. You can either try to fix errors by reading through the code, or you can run the debugger.

You run the program itself by typing either the name of the program (myprog from above) and hitting enter, or typing ./myprog, which tells the terminal to look in the current directory (./) for the executable name (myprog).

To run the debugger, type "gdb myprog" without quotes and substituting whatever program name you used. From there, you'll be brought to another prompt for the debugger. To run the program as normal from here, type "run". You can set break points by typing "break [line number]". You can step through the code by typing "step". There are a bunch more options here as well, which you can read about by typing "help" or getting more information at one of the links at the beginning of this email. Typing "quit" exits the debugger. Stepping through the program can be really useful when trying to locate the place where a control structure acted incorrectly.

Lab2:

Lab 2 introduces simple assignments, variables, registers, and instructions. The eventual goal here is to be able to implement C commands such as "a = ((b + c) - (d + e)) - 10;" in Assembly. For each program you'll be working on, you will be given a piece of C driver code, and you will have to implement an Assembly function that implements the a C function described in the assignment.

In most computer programs, you deal with memory locations that store data (variables) and instructions that manipulate that data. In Assembly, you also deal with registers. You can store values in registers, as well as use registers in computations. Registers are fast, temporary variables located within the microprocessor itself, where normal variables exist in memory. There's a limited number of registers that can be used, but if those can be used, it is desirable due to speed. You can't just use memory variables either. The 8086 processor limits the use of memory variables to a MAXIMUM of ONE memory variable per computation. I'm going to repeat that for emphasis: you cannot use two memory variables in a computation. You must use
AT LEAST one register or a constant.

In Intel 8086 assembly language, you can only do one computation at a time, and you can only use one memory variable per computation. So the command from above, "a = ((b + c) - (d + e)) - 10;" cannot be implemented in a single line. In fact, Assembly is so low level that this takes many lines.

For the processor we're using, there are four general purpose registers: A, B, C, and D. That previous line is implemented in Assembly like so:

.comm a, 4
.comm b, 4
.comm c, 4
.comm d, 4
.comm e, 4
.text
movl b,%eax ; move b into register ax
addl c,%eax ; add c to register ax
movl d,%ebx ; move d into register bx
addl e,%ebx ; add e to register bx
subl %ebx,%eax ; subtract register bx from register ax
subl $10,%eax ; subtract 10 from register ax
movl %eax,a ; move register ax out to a

Lets go through this bit by bit. Like in C, you need to declare variables before you can use them. There are several assembler directives you can use, depending on what you're trying to do, but the most common one (and the only one we should need this semester) is the .comm command. It takes two arguments. The first is the variable name, and the second is the number of bytes the variables is. Note, there is no type information. The command format for most of what we'll be doing is "instruction firstarg, secondarg".

A note here about writing in assembly: there is no symbol to end a line of code (in C, you end all lines with a ";" ). To comment your code, multi-line comments use the C-Style /*comment goes here*/, and a single line comment should be able to use either ";" or "#", but I know for a fact that "#" works.

The above commands do not initialize the data either. If you want to initialize your variables, you can use a command like "b: .int 10", which declares an integer b and initializes it to 10. More examples can be found in the manual. So all those .comm commands up above declare variables a,b,c,d, and e of size 4 bytes (integers). For most of the programs I'll be sending you, there should be sections at the bottom labeled "declare variables here." That is where you should put the declarations.

The next bit is arithmetic operators. As you may have noticed, assembly doesn't use standard symbols (+,-,*,/,etc). Each operation has its own instruction. Addition uses the add instruction, subtraction uses sub, multiplication mul, and division div. The closest thing to an "=" is the move command, or mov. It allows you to move the value from one register/variable to another, essentially setting them equal. However, there are suffixes that are attached to each of these depending on the size of the data being operated on. Each piece of data being used above is 4 bytes, which is a "long-word". So each command has an "l" attached to the end of it. For 2 byte data, you would attach a "w" for "word". For 1 byte data, you would use "b" for "byte".

The mov instruction is formatted thus (for 4 byte data): "movl source, destination". This is the equivalent of "destination = source". Only one of the source or destination can be a variable, so the other must most likely be a register. One use of this command is to move data between variables and registers.

Arithmetic operands always operate on a value. So "addl c, %eax" is the equivalent of "%eax = %eax + c" or "%eax += c". For these math instructions, one of the arguments will always be the destination.

The add command is used thus (for 4 byte data): "addl source, destination". This is equivalent to "destination += source".
Similarly, subtraction is "subl source, destination" which is the equivalent of "destination -= source". This is the most complicated you can get with math commands, one addition/subtraction/equals per line.

I mentioned earlier that there are registers that we can use. The general purpose registers are A, B, C, and D. You can access these registers by saying %eax, %ebx, %ecx, and %edx, respectively. These are the 4 Byte registers. However, we can operate on several different sizes of data. 4 byte, 2 byte, and 1 byte. The 2 Byte registers are similarly accessed using %ax, %bx, %cx, and %dx. These are the lower 16 bits of the corresponding 32 bit registers. They are NOT separate. Each of those 16 bit registers can be thought of as a pair of 8 bit registers. So %ax is split into %ah and %al (a, high and low bytes), %bx into %bh, %bl, then you have %ch, %cl, %dh, %dl similarly. These are NOT different registers, they just allow you to access smaller chunks of the larger registers. So, for instance, "addb $2, %al" is an 8 bit instruction, and "addw $2, %ax" is a 16 bit instruction.

Hopefully, by now you can look at the C code and Assembly equivalents above and figure out how they are equivalent. Note, $10 denotes a constant number 10.

Multiplication and division get a little bit more complicated, however. If you take two 32 bit numbers and multiply them, what do you get? A 64 bit number. So you would need two registers to store the result. The multiplication operand we'll be using is "mul". This takes ONE argument. That argument is multiplied against the A register (it's ALWAYS the A register). The result is placed in the A register (and if the number is big enough, the D register). Two 32 bit numbers being multiplied together will put the lower 32 bits of the result into the A register and the upper 32 bits of the result into the D register. To multiply two values, you must first move one of them into the A register. For an 8bit multiplication, the result goes in %ah:%al. For 16 bits, %dx:%ax. 32, %edx:%eax.

Formulas:
mulb X8bit ⇒ %ax = %al * X 8bit

mulw X16bit ⇒ %dx:%ax = %ax * X 16bit

mull X32bit ⇒ %edx:%eax = %eax * X 32bit

Those formulas show the results of each possible operand. For instance, if you use "mull some32bitnumber", the result gets put in the concatenation of %edx:%eax, and that result is %eax * that32bitnumber.

Divide is similar. It takes one operand, and the source and destination registers are chosen automatically. Again, the operation is performed against the A register. However, it also pulls in the D register. So for a 32 bit division, you're actually dividing the 64bit %edx:%eax by a 32 bit number. The 32 bit result gets placed in %eax, and the remainder gets put in %edx. Note, before dividing, you need to zero the second register (D), unless you have something there that you want. If you're storing a totally different value in the D register, it could throw off your division.

Division formulas:
divb X8bit ⇒ %al = %ax / X 8bit , %ah = remainder

divw X16bit ⇒ %ax = %dx:%ax / X 16bit , %dx = remainder

divl X32bit ⇒ %eax = %edx:%eax / X 32bit , %edx = remainder

Example use of formula: If you use the command "divl some32bitnumber", the result that is placed in the register %eax is the concatenation %edx:%eax divided by that32bitnumber. %edx gets the remainder of the division. Note, the remainder is how you do modulus.

Lab3:

Lab 4:

Lab 4 is about different addressing modes, arrays, and pointers. Addressing modes are how the computer selects the data being used by an instruction. They are determined by how you specify the operand of the instruction.

Data are the numerical values used in computations. For instance, if you have the value 3 in the register %eax, your data value is 3. If you used %eax, the operand would be %eax, while the data is 3. The addressing mode is the relationship between the operands and the data.

The most basic versions of addressing are register addressing and immediate addressing. In register addressing, the data is in a register. In immediate addressing, the data is supplied in the instruction.

The next type of addressing is direct addressing, where the memory address is supplied with the instruction. In most cases, the address will be supplied at compile time when a memory variable is given an address. Thus, the operand is the memory address corresponding to your variable. The data is the contents of that memory address.

So we know how to address normal variables, such as ints, chars, or floats. What about arrays? How does one use an array in assembly? It's almost the exact same as declaring a single variable in assembly.

.comm a, 4

declares a single integer.

.comm a, 40

declares an array of ten integers. The only difference is how much space is declared. There are 4 bytes per integer. We want 10 integers, so we declare 40 bytes. The symbol 'a' is the equivalent of the address of the first byte of data. This also initializes all ten integers to zero.

To initialize the values to something else, you could also use the .fill directive.

a: .fill 10, 4, 0

should create 10 integers of size 4, with the value 0.

To initialize the first values to 1,2,3, and the rest to zero, we would use

a: .int 1,2,3
.fill 7,4,0

which creates 3 ints with the values 1, 2, and 3. And then 7 variables of size 4, with the value 0.

So how does one access the elements of the array? Let's take a look at a piece of C code here:

int a[10]; /* an array of 10 integers */
int i;
main()
{
a[0] = 10;
a[4] = 20;
a[i] = 30;
}

We've already discussed declaring the array, so here's how to access the different pieces.

a[0] is the first integer in the array. Since a is the address of the first integer, we can use direct addressing:

movl $10, a

The second example is just a touch trickier. We're trying to access the fifth element in the array. However, this will still be direct addressing, because we'll be using an address. The command is

movl $20, a+16

Remember, a is the address of the first integer. Each integer is 4 bytes. So if you increment the address by 4 bytes, you get the second integer. 8, the third. 12, the fourth. Incrementing the address by 16 gets you the fifth integer. Again, this is direct addressing. At compile-time, a is translated into an address and 16 is added to it, which results in another address.

If you want to reference an address directly, you can simply write:

movl 4096, %eax

A number by itself, with no $, is considered an address.

The hardest bit of the short example above is the a[i] = 30. This is different because the i is a variable, so this address isn't a constant (it changes during execution). What we're going to be using to access this is called indexed (or direct indexed) addressing. We're going to use a register along with a constant to compute the memory address during run-time. The constant (the displacement) is going to be a number (address) or a label (variable name, equivalent to an address). To use indexed addressing, you can use two special index registers: %esi and %edi. %esi is the source index, and %edi is the destination index. For the most part, you can use these interchangeably.

To implement the C command a[i] = 30; you would write:

movl $30, a(,%edi,4)

You should also be able to use %esi for that if you want. The code above uses a (the displacement), and then increments the address of a by the value of %edi multiplied by 4. 4 is the size of the variables we have (ints). So the i from the C command would be in %edi in the assembly command. If you were using, say, chars, the 4 above would be a 1 instead. Note, there is an initial comma after the opening parenthesis. This is there because there is actually a third field that we are leaving blank for this addressing mode.

The entirety of the C snippet translated into assembly would look like this:

.comm a, 40
.comm i, 4
movl $10, a # a[0] = 10;
movl $20, a+16 # a[4] = 20;
movl i, %edi
movl $30, a(,%edi,4) # a[i] = 30;

Another situation in which the address of the data is not known at assembly time is pointers. Pointers are variables that hold an address of another piece of data. They do not hold the piece of data themselves, but they point to somewhere else. To access pointers (dereferencing), use %ebx.

You can use pointers as below:

C:

int *p;
main()
{
*p = 40;
}

Assembly:

.comm p, 4
movl p, %ebx
movl $40, (%ebx)

The addresses we're using are four byte addresses. Therefore, the pointer p needs to be four bytes. To dereference the pointer p, move it to %ebx, and surround the register with parentheses. The final command above says to move 40 into the data that %ebx points to. This is called register indirect addressing. You're indirectly addressing the data through a pointer held in a register.

You cannot dereference a pointer held in a variable, it must be in a register first.

There is one final case to worry about: What if we have a pointer to an array? We use base indexed addressing, which is similar to indexed addressing. We'll be using two registers, %ebx (this one must be %ebx), which holds the base index, and %esi or %edi, which will hold the index. You can also use a constant offset to index into a struct, depending on the situation.

Example:

int *ap;
struct {int a,b;} *asp; //struct has two vars, a and b
int i;
main()
{
ap[3] = 50;
ap[i] = 60;
asp[i].b = 70;
}

which is coded in assembly language as:

.comm ap, 4
.comm asp, 4
.comm i, 4
. . .
movl ap, %ebx
movl $50, 12(%ebx) # ap[3] = 50;
movl i, %edi
movl $60, (%ebx, %edi, 4) # ap[i] = 60;
movl asp, %ebx # i is still in edi
movl $70, 4(%ebx, %edi, 8) # asp[i].b = 70;

How does this work? ap is a pointer. The second command accesses the integer at a 12 byte offset from where ap is pointing (12 byte offset is the fourth integer, ap[3]). For the next command, the index i is moved into %edi, one of the index registers. Now, the next command is using base indexed addressing: movl $60, (%ebx, %edi, 4). This command figures out where %ebx (the pointer ap) is pointing to, and adds an offset equal to %edi (index i) * 4 (size of each variable) to it to get the address that is needed to access that slot in the array.

The next command is actually even more complicated, since it's accessing an array of structs. Again, you're using %ebx (pointer to array), and adding %edi * 8 (pretty sure the 4 in the manual is a typo, as the struct is two ints, or size 8) to the address. However, int b in the struct is desired, so there's another 4 byte offset (grab the second byte in the struct, b), which goes outside the parentheses. Thus the final command is:

movl $70, 4(%ebx, %edi, 8) # asp[i].b = 70;

Note, for most of these addressing modes (indexed, register indirect, etc), %esi, %edi, and %ebx can be used interchangeably. However, when specifying two registers (base indexed) one of the registers MUST BE %ebx.

Recap (straight from the manual):

immediate addressing
The instruction actually contains the operand itself rather than an address or other information describing where the operand is.

movl $4, %eax

register addressing
The data is in a register.

movl %eax, %ebx

direct addressing
The memory address of the data is supplied with the instruction. The assembler does this by assigning an address to the symbol variable with respect to an offset.

movl %eax, mem_variable
or
movl %eax, mem_variable+16

indexed addressing
The contents of a register along with a constant are used to compute the memory address of the data. The constant is called the displacement.

movl $30, a(,%edi,4)

register indirect addressing
The %ebx register holds the address of the data to be addressed.

movl $40, (%ebx)

base-indexed addressing
The %ebx register holds the base of the address (like that of an array) and %esi or %edi hold the index. The constant (4 in the following example) corresponds to the number of bytes of each element in the array.

movl $60, (%ebx, %edi, 4)

Lab 5:

Lab 5 is all about subroutines (functions). Any program label can be a subroutine. This means any of the labels you've been able to jump to. To make a label into a subroutine, you simply use the "call" command rather than the jmp (or variations thereof) command. The only difference between a jmp and a call is that call stores the return address of the next instruction in line to be executed. That way, if you use the "ret" command, the program knows where to go back to: the next instruction in line before the call. ret, by the way, is how you return from a subroutine. So "call labelname" to start a subroutine, "ret" to return.

You can also nest subroutines, where one subroutine calls another which calls another, and so on. The return addresses for all these nested calls are all stored on something called the "stack."

The stack is special data storage that has two basic functions: "push" and "pop." Push adds something onto the stack, pop removes something from the stack. Order, however, is important here. Like a stack of papers, or a stack of plates, if you push something onto the stack, it goes on top. Likewise, if you pop (remove) something from the stack, you remove the object that is on top. You wouldn't want to yank the bottom plate out of a stack of fine china, would you? This is a last-in, first-out way of doing things. If you push 1, 2, 3, onto the stack, and then pop three times, you get 3, 2, 1.

So what does this have to do with subroutines? When you do a "call" instruction, the return address (the address of the next instruction) is pushed onto the top of the stack. The call command is, essentially, a combination of a push and a jmp command. If ret is used, it pops the return address (stored by call). ret is a combination of pop and jmp. So, let's look at the example from the manual.

Your program executes a call instruction at address 100 (each instruction exists in memory and has its own address) that jumps to an instruction at address 200. The return address, 104 (the next instruction to be executed after the call, 4 bytes ahead), is automatically pushed onto the stack by the call command. Now, this subroutine has another call command at address 220 that jumps to a third subroutine at address 300. This automatically pushes the return address 224 onto the stack. This subroutine uses ret, which pops 224 off the stack and then jumps to it. The previous subroutine continues executing where it left off. The second subroutine executes a ret command, which pops the previous return address, 104, off the stack and jumps to it. The first routine continues executing.

To implement subroutines in assembly, you really only need labels, call, and ret. There are different things we can do, though.

Let's look at the stack. The stack is just a section of memory. Which section of memory it is is defined by the %ess (stack segment) register. The top of the stack is pointed to by the %esp (stack pointer) register. This points to the location where the last value was pushed, which also contains the next value to be popped. The stack pointer register if an offset from %ess. The ordering of the stack is kind of weird, as you can see from the diagrams below. %ess is the the current section, and %esp is an offset from that current section. %ess is at the top of the diagram, and as you push stuff onto the stack, %esp actually gets closer. This may be counter-intuitive, as it makes more sense that pushing would make the stack larger, meaning higher addresses. This is not so. %esp initially points to the highest address (offset) in the stack segment. You may notice that the numbers are counting up as you go down the page.

When something is pushed onto the stack, the stack pointer is FIRST decremented, THEN the value is written into the new location being pointed to. For a pop, the value is read FIRST, and then %esp is incremented. This keeps values from being lost or overwritten (move to an empty space before writing, or read before moving away and losing the location). As more and more things get added to the stack, the closer %esp gets to %ess.

Here is the sequence of steps if we
pushw $1000:

0 | | <- %ess
4 | |
8 | |
12| |
16| XXXX | <- %esp before push

________________________________________

0 | | <- %ess
4 | |
8 | |
12| | <- %esp first decrement SP
16| XXXX |

________________________________________

0 | | <- %ess
4 | |
8 | |
12| 1000 | <- %esp finally, write 1000
16| XXXX |

You can see that the decrement occurs first, and then the write, keeping XXXX from being overwritten.

Pops work just the opposite:

0 | | <- %ess
4 | |
8 | |
12| 1000 | <- %esp before pop
16| XXXX |

________________________________________

0 | | <- %ess
4 | |
8 | |
12| | <- %esp first read 1000
16| XXXX |

_________________________________________

0 | | <- %ess
4 | |
8 | |
12| | finally, increment SP
16| XXXX | <- %esp

You can see here that the value is read out of the stack (and stored, presumably), and then the stack pointer is incremented, losing the location of the data we just read. The 1000 is actually still there in memory, and will remain there until it is overwritten.

The stack is used to store return addresses, but can also be used to store data for your program. You can directly call both push and pop, the syntax being as follows:

pushl variablename
popl variablename

The first command takes the value of the variable and pushes it onto the top of the stack. The second removes the value at the top of the stack and puts in in variablename. You need to be careful here, because call and ret don't necessarily know if you're pushing and popping. This could cause ret to pop the wrong address and jump to the wrong instruction.

So what uses could the stack have as memory storage? Sometimes you need to store data for a subroutine will another is executing. Suppose we call a subroutine from a program, but have some value in the registers we need saved? We don't want to store it in a variable because it's an intermediate value, and the registers will be needed in the next subroutine. What do we do? Push it onto the stack. Then call the subroutine. When the function returns, pop the variables back off in reverse order. Like so:

.
.
pushl %eax
pushl %ebx
call my_routine
popl %ebx
popl %eax
.
.

Just like that, you've saved %eax and %ebx. Now it doesn't matter what my_routine does with the registers, because you've got copies in case it overwrites something. You can use the stack the same way if you run out of registers in the middle of a computation.

So now we get to the complicated part: local variables and parameters. In C, you can have different functions or different loops all with their own set of variables, inaccessible to the other code blocks. There can be more than one "instance" of a variable with the same name, because they each have a different scope. This often happens when a function calls itself (recursion). This can be done by putting the variables on the stack.

The processor we're using, the 8086, has a special register to access the stack: the %ebp (base pointer) register. It points the the stack just like %esp, and can be used as a base register just like %ebx. %ebp is very often used to set up something called a "stack frame" that contains all of the data for a single subroutine.

Basically, when a subroutine is entered, any space needed for local variables can be allocated on the stack. If you think of the stack as a pile of boxes, allocating space for local variables can be thought of as putting empty milk crates (that are on their side) on top of the stack of boxes. You can put more boxes on top of the milk crates, but you can still fill the crates up later. You allocate space in a subroutine by decrementing the stack pointer. The base pointer is set to point to the allocated space, and each local variable is accessed using an offset from the base pointer. Example:

main()
{
int a, b;
a = 10;
doubleit();
b = a;
}

doubleit()
{
int c, d;
c = 20;
halfit();
d = c * 2;
}

halfit()
{
int e, f;
e = 20;
f = e / 2;
}

Translated into assembly code, this looks as follows:

main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp ;

movl $10, -4(%ebp) ; a
call doubleit
movl -4(%ebp), %eax ; a
movl %eax, -8(%ebp) ; b

movl %ebp, %esp
popl %ebp
ret

doubleit:
pushl %ebp
movl %esp, %ebp
subl $8, %esp ;

movl $20, -4(%ebp) ; c
call halfit
movl -4(%ebp), %eax ; c
shll $1, %eax ; c * 2
movl %eax, -8(%ebp) ; d

movl %ebp, %esp
popl %ebp
ret

halfit:
pushl %ebp
movl %esp, %ebp
subl $8, %esp

movl $20, -4(%ebp) ; e
movl -4(%ebp), %eax
shrl $1, %eax ; e / 2
movl %eax, -8(%ebp) ; f

movl %ebp, %esp
popl %ebp
ret

All references to the variables local to each subroutine are accessed by using an offset from %ebp. As in, -4(%ebp), -8(%ebp), etc. The first local var is at -4, the second at -8, and so on. So what does the stack look like for this example?

| |
| RA | <- %esp
| |
| | <- %ebp before main begins

As with all subroutines, %esp starts by pointing to the return address. This is because of the push to get the return address on the stack. So the first thing to do is save the old base pointer location, so that we keep our old local variable section accessible (it'd be correct for the previous subroutine). We do this by pushing %ebp. As %ebp is a pointer, this is storing an address. Thus:

| OLD_BP| <- %esp
| RA |
| |
| | <- %ebp before main begins

Now, we set the value of %ebp to the current value of %esp. These are both pointers, so setting the value of one to the other value CHANGES THE LOCATION TO WHICH IT IS POINTING, RATHER THAN THE ACTUAL VALUE BEING STORED THERE. IMPORTANT. We set them equal by moving the value of %esp into %ebp. At this point, %esp and %ebp are both pointing to the same location on the stack, thus:

| OLD_BP| <- %esp <- %ebp
| RA |
| |
| |

The local variable locations will be right on top of the base pointer here. But that's also the space that pushed values will go, because of the stack pointer (%esp)! What do we do? We allocated local variable room by decrementing the stack pointer. Main has two ints, so we need eight bytes of space. To allocated space, we subtract 8 from %esp (moving where it points to). The stack now looks like this:

low addresses
| |
| |
b | | <- %esp
a | |
|OLD_BP| <- %ebp
| RA |
| |
high addresses

OLD_BP is the address that was previous stored in %ebp, and RA is the return address. Decrementing %esp by 8 has moved the %esp pointer to two slots up from %ebp, as each slot is 4 bytes. Now, if something is pushed, nothing will be overwritten (%esp is decremented BEFORE the write). The next thing we want to do is set a to 10. a is the first local variable, and can thus be accessed by using the address 4 less than %ebp (the next slot). Thus our next command is "movl $10, -4(%ebp)"
This works, because you're using a constant offset from a pointer (which we learned how to access in the previous lab).

So now what does the stack look like? We just filled one of our empty milk crates with the value "10".

low addresses
| |
| |
b | | <- %esp
a | 10 |
|OLD_BP| <- %ebp
| RA |
| |
high addresses

Now what? The next bit in the C code calls another subroutine, so that's what we're going to do using "call doubleit". As mentioned earlier, call is a combo push/jmp. So the return address is pushed onto the next slot in the stack.

low addresses
| |
| RA | <- %esp
b | |
a | 10 |
|OLD_BP| <- %ebp
| RA |
| |
high addresses

Now we're actually inside of doubleit. doubleit has its own local variables, though, so we need to build the stack frame for this subroutine (the whole section, RA, BP, local variable space is the "stack frame"). So what do we do? Save %ebp (so we can get it again later), set %ebp to %esp (moving where it points, not the value contained at that address), and allocate space by decrementing %esp. Now we have this:

low addresses
d | | <- %esp
c | |
|OLD_BP| <- %ebp
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
high addresses

Where we've stored the old base pointer (OLD_BP), the return address is there (RA), and we have space for local vars (c,d). Now doubleit can start doing calculations. We want to store the number 20 in c, the first local variable. The first local variable is, again at four addresses less than the base pointer: movl $20, -4(%ebp)

low addresses
d | | <- %esp
c | 20 |
|OLD_BP| <- %ebp
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
high addresses

We then start the process over, when we call halfit, which requires its own stack frame. The call pushes the return address:

| |
| RA | <- %esp
d | |
c | 20 |
|OLD_BP| <- %ebp
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |

The base pointer is saved, moved, and then space is allocated, leaving us with:

| |
f | | <- %esp
e | |
|OLD_BP| <- %ebp
| RA |
d | |
c | 20 |
|OLD_BP|
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
| |

halfit then stores 20 in the first local variable (-4), and stores 10 (20, right shifted once), in the second local variable (-8), f. Then we have a bunch of stuff that needs to occur to leave the stack frame exactly as it was before halfit was called, so everything works in doubleit.

First things first: restore the stack pointer (remove the allocated space). You can easily do this by moving the value of %ebp to %esp. This moves the %esp pointer down to point back right at where %ebp is pointing.

| |
f | |
e | |
|OLD_BP| <- %ebp <- %esp
| RA |
d | |
c | 20 |
|OLD_BP|
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
| |

Next, pop the old base pointer (popl %ebp). As you may recall, the memory is read and THEN %esp is incremented. This pops the value of OLD_BP into %ebp, moving %ebp to point to where it used to point before halfit was called. This pop also moves %esp down one (incremented) to point at the return address.

| |
f | |
e | |
|OLD_BP|
| RA | <- %esp
d | |
c | 20 |
|OLD_BP| <- %ebp
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
| |

The last thing that occurs in halfit is ret, which is a combo pop/jump. It pops the return address, which increments %esp, and then the program is back in doubleit.

| RA |
d | | <- %esp
c | 20 |
|OLD_BP| <- %ebp
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
| |

The rest is quite similar: finish computations, tear down the stack frame, and then return.

movl -4(%ebp), %eax ; c
shll $1, %eax ; c * 2
movl %eax, -8(%ebp) ; d

movl %ebp, %esp
popl %ebp
ret

The first command moves the first local variable back into a register. The second command shifts it left by one bit (multiply by two). The third stores it in the second local variable (-8). Then the tearing down of the stack frame begins:

| RA |
d | 40 | <- %esp
c | 20 |
|OLD_BP| <- %ebp
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
| |

Deallocate local variables by moving %ebp into %esp (move the location to which it points):

| RA |
d | 40 |
c | 20 |
|OLD_BP| <- %ebp <- %esp
| RA |
b | |
a | 10 |
|OLD_BP|
| RA |
| |
| |

Pop the old base pointer into %ebp, moving %ebp to point to where it used to and %esp down one.

| RA |
d | 40 |
c | 20 |
|OLD_BP|
| RA | <- %esp
b | |
a | 10 |
|OLD_BP| <- %ebp
| RA |
| |
| |

ret. This pops the old return address, moving %esp down one again, and returning to main.

b | | <- %esp
a | 10 |
|OLD_BP| <- %ebp
| RA |
| |
| |

The rest of main is as follows:

movl -4(%ebp), %eax ; a
movl %eax, -8(%ebp) ; b

movl %ebp, %esp
popl %ebp
ret

The first local variable is moved into a register. That register is copied into the second local variable (one memory address per command max). Now main's stack frame is getting torn down. Local vars are de-allocated, and the old base pointer is popped.

The final stack looks like it did before main started:

| |
| RA | <- %esp
| |
| | <- %ebp before main begins

And then ret occurs, returning to whatever process called main.

It is generally assumed that any calling functions will have saved any of the register information they need stored. However, the %ebx register is not saved this way. In all of the programs thus far, %ebx has been manually saved in the prolog. Without this, the program can randomly crash. The epilogs also restore %ebx. These two things are NOT part of setting up the stack frame (which is what the prolog and epilog do), but can help with random crashes on our lab machines.

In summary, all of our routines get started with a prolog that sets up the stack frame:

subr:
pushl %ebp
pushl %ebx
movl %esp, %ebp
subl $SIZE_OF_LOCAL_VARS, %esp

Similarly, at the end, the stack frame is torn down.

movl %ebp, %esp
popl %ebx
popl %ebp
ret

Also, any registers to be changed in the subroutine should be pushed and popped (after allocating local var space). Such as %esi or %edi, if they're changed.

Congratulations. If you followed all of that, you now understand what the prolog and epilog in all our programs are doing (setting up and tearing down the stack frame).

Lab 6:

Lab 6 is basically about parameter passing. How can we pass parameters to a function, and how can we return parameters from it? You may have noticed that last lab, we didn't use parameters; we used globals. This time, we learn how to use parameters.

There are two basic ways to pass parameters: registers, and the stack. Let's look at the former first. Passing the parameters via registers is a simple method, and often quite fast. However, functions/languages must agree precisely on what registers are being used. Variables are stored before a call, and during the called functions the variables can be read straight out of the registers again.

Here's an example:

extern int a,b;

foo()
{
bar(a);
}

bar(p1)
int p1;
{
b = b + p1;
}

Now in assembly language, we will pass the argument a in the %eax register as follows:

foo:
pushl %ebp
movl %esp, %ebp

; begin foo
movl a, %eax
call bar
; end foo

movl %ebp, %esp
popl %ebp
ret

bar:
pushl %ebp
movl %esp, %ebp

; begin bar
addl %eax, b
; end bar

movl %ebp, %esp
popl %ebp
ret

As you can see, a is stored in the %eax register and then when bar runs, the %eax register is added to b. That's it. Here's an example with returns:

extern int a,b;

foo()
{
b = bar(a);
}

bar(p1)
int p1;
{
return p1 + 2;
}
Here, %eax is used both to pass in the argument, and to pass out the return value, thus making the implementation of bar very simple.

foo:
pushl %ebp
movl %esp,%ebp

; begin foo
movl a, %eax
call bar
movl %eax, b
; end foo

movl %ebp,%esp
popl %ebp
ret

bar:
pushl %ebp
movl %esp, %ebp

; begin bar
addl $2, %eax
; end bar

movl %ebp, %esp
popl %ebp
ret

This is almost as simple as the first example. a is placed in %eax, and then bar is called. bar then adds 2 to %eax and returns. foo then reads %eax, which is where the return value lies, and stores it in b.

So why not always use registers to pass arguments? What if you need more arguments than you have registers, or if an argument won't fit in a register?

The other option in passing, as mentioned earlier, is the stack. This is how you can pass many arguments or large arguments.

How do we do this? Before calling the function that you're passing variables to, push the arguments onto the stack IN REVERSE ORDER. If you do this, they'll be in order when you read from the called function. Once the prolog executes, you can find the parameters at a fixed offset from the base pointer, just like local variables. The only difference is that it will be a positive offset, rather than a negative offset.

Example:

extern int a,b,c,d,e;

foo()
{
e = bar(a, b, c, d);
}

bar(p1, p2, p3, p4)
int p1, p2, p3, p4;
{
return (p1 + p2) * (p3 + p4);
}

Here, you can see there are four arguments to be passed into bar. We can still use the %eax register to return the result.

foo:
pushl %ebp
movl %esp, %ebp

; begin foo
pushl d
pushl c
pushl b
pushl a
call bar
addl $16, %esp ; remove arguments from stack
movl %eax, e
; end foo

movl %ebp, %esp
popl %ebp
ret

bar:
pushl %ebp
pushl %ebx
movl %esp, %ebp

; begin bar
movl 12(%ebp), %eax ; p1
addl 16(%ebp), %eax ; p2
movl 20(%ebp), %ebx ; p3
addl 24(%ebp), %ebx ; p4
mull %ebx ; result in ax
; end bar

movl %ebp, %esp
popl %ebx
popl %ebp
ret

You can see that foo pushes all four arguments onto the stack in reverse order. bar is called. At this point, I'm going to put in a picture of the stack so you can see what's going on. I've also modified the example from the lab slightly to match more closely what you'll be doing. Addresses are arbitrary.

32| %ebx | <-%esp <-%ebp
36| old_base_ptr|
40| foo_ret_addr|
44| a |
48| b |
52| c |
56| d |

This is what the stack looks like after bar's prolog. As you can see, %ebp is pointing at address 32. The parameters that were passed are in addresses 44, 48, 52, and 56. We can access these off of a constant offset from %ebp. You can look at the prolog, figure out what the stack looks like, and determine the offset of the first variable. From there, it's a matter of adding 4 to get the second variable, and so on.

So as you can see, the difference between where %ebp is pointing (32) and where a is on the stack (44) is 12. Thus, to move a (p1) into %eax, we just access 12(%ebp), the same way we used local variables last time. If you want the second variable, simply use 16(%ebp). etc.

This is a general mechanism and can be used in most situations. Returns are usually put in a register (in our case, %eax). Some languages can return complex types by putting the return value on the stack and putting a pointer to it in a register. However, that's not something we're going to go into.

Based on a given prolog, you should be able to draw what the stack looks like and figure out any offsets you need.

Short version: parameters go on the stack, returns go in %eax.

For the assembly files referenced in the labs:

Assembly Code

Consider donating to further my tinkering.

Places you can find me