9

Some time ago I was experimenting with writing assembly routines and linking it with C programs and I found that I just can skip standard C-call prologue epilogue

    push ebp
    mov ebp, esp
    (sub esp, 4
    ...
    mov esp, ebp)
    pop ebp

just skip it all and adress just by esp, like

    mov eax, [esp+4]          ;; take argument
    mov [esp-4], eax          ;; use some local variable storage

It seem to work quite good. Why this ebp is used - is maybe addressing through ebp faster or what ?

Joe
  • 46,419
  • 33
  • 155
  • 245
user2214913
  • 1,441
  • 2
  • 19
  • 29

3 Answers3

11

There's no requirement to use a stack frame, but there are certainly some advantages:

Firstly, if every function has uses this same process, we can use this knowledge to easily determine a sequence of calls (the call stack) by reversing the process. We know that after a call instruction, ESP points to the return address, and that the first thing the called function will do is push the current EBP and then copy ESP into EBP. So, at any point we can look at the data pointed to by EBP which will be the previous EBP and that EBP+4 will be the return address of the last function call. We can therefore print the call stack (assuming 32bit) using something like (excuse the rusty C++):

void LogStack(DWORD ebp)
{
    DWORD prevEBP = *((DWORD*)ebp);
    DWORD retAddr = *((DWORD*)(ebp+4));

    if (retAddr == 0) return;

    HMODULE module;
    GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, (const char*)retAddr, &module);
    char* fileName = new char[256];
    fileName[255] = 0;
    GetModuleFileNameA(module, fileName, 255);
    printf("0x%08x: %s\n", retAddr, fileName);
    delete [] fileName;
    if (prevEBP != 0) LogStack(prevEBP);
}

This will then print out the entire sequence of calls (well, their return addresses) up until that point.

Furthermore, since EBP doesn't change unless you explicitly update it (unlike ESP, which changes when you push/pop), it's usually easier to reference data on the stack relative to EBP, rather than relative to ESP, since with the latter, you have to be aware of any push/pop instructions that might have been called between the start of the function and the reference.

As others have mentioned, you should avoid using stack addresses below ESP as any calls you make to other functions are likely to overwrite the data at these addresses. You should instead reserve space on the stack for use by your function by the usual:

sub esp, [number of bytes to reserve]

After this, the region of the stack between the initial ESP and ESP - [number of bytes reserved] is safe to use. Before exiting your function you must release the reserved stack space using a matching:

add esp, [number of bytes reserved]
Iridium
  • 23,323
  • 6
  • 52
  • 74
  • Haha, :) this is good (I mean especially the LogStack example) I am sorry i can acept all the three answers becouse they all are very good :) As to the thing, personally I like to be an optymization maniac (I want to write the fastest possible asm routines) so cue thing is if maybe this esp-4 overvrittens do not occur, or they really do ? (some text on it maybe ? somebody ?) If so I can skip all the prologue as I wrote but instead of using esp-4 use a static buffer for locals - it should be top speed :U ye ? – user2214913 Mar 27 '13 at 10:22
  • 2
    @user2214913: you're getting into premature optimization. Any meaningful work your code does will, in a non-trivial program, dwarf the cycles you shave off that way. And people who need to debug your code will not appreciate this. – DCoder Mar 27 '13 at 10:24
  • 3
    @user2214913 As DCoder says, you're probably not producing faster code by writing in assembly, and I any speed gains you get from leaving out stack frames and not reserving stack space before using it will be dwarfed by inefficiencies you're introducing. The fact that you're asking this very basic question suggests your code probably isn't written to avoid slow-downs from cache misses, or take advantage of CPU-specific instruction ordering to achieve parallel execution. A decent compiler will be aware of these things and will generally produce much faster code than hand-rolled assembly. – Iridium Mar 27 '13 at 10:36
  • 2
    It would be easy to experiment with a C/C++ codebase, having the compiler not use a frame pointer (`-fomit-frame-pointer` and the like with gcc style compilers) and see what kind of real-world performance difference it makes. With i386, there can up to high single-digit performance gains with the right code base, primarily because there is an extra general-purpose register available (not so much because of the extra instruction in the prologue/epilogue) - i386 is register poor. With the x86-64 ISA, with the larger number of registers, there is little benefit in messing with this stuff. – Jason Molenda Mar 28 '13 at 09:40
10

The use of EBP is of great help when debugging code, as it allows debuggers to traverse the stack frames in a call chain.

It [creates] a singly linked list that linked the frame pointer for each of the callers to a function. From the EBP for a routine, you could recover the entire call stack for a function.

See http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
And in particular the page it links to which covers your question: http://blogs.msdn.com/b/larryosterman/archive/2007/03/12/fpo.aspx

Michael
  • 57,169
  • 9
  • 80
  • 125
  • much tnx, this page about fpu is indeed answer to that - but the thing that mellowcandla said it is very interesting also – user2214913 Mar 27 '13 at 10:01
  • Though i could add that I do nopt understand how this linked list works, and that tt seem that without ebp this information can be retrieved also - if function can return back (and return back and return back) then you can trace this path too - without ebp – user2214913 Mar 27 '13 at 10:06
  • @user2214913: the function can return back to its caller, because the return address is stored on the stack next to the pushed parameters. However, a debugger cannot distinguish the return address from any other value on the stack, so it cannot scan the stack to find the next return address after that. The debugger cannot reliably determine how much stack space that function used locally and skip over it, either. Seriously, read the linked pages. – DCoder Mar 27 '13 at 10:12
4

It works, However, once you'll get an interrupt, the processor will push all it's registers and flags into the stack, overwriting your value. The stack is there for a reason, use it...

stdcall
  • 27,613
  • 18
  • 81
  • 125
  • Hmm, you sure as to that ? I used this and not experienced an error is this just a matter of luck ? If so I understand that i can use it but shouldnt just use [esp-X] values ? or other way I can sub esp 20 hen add 20 to it ? – user2214913 Mar 27 '13 at 09:40
  • 1
    are you sure about this overriting from system? some text on it ? – user2214913 Mar 27 '13 at 09:55
  • 3
    @Iridium It may be an asynchronous signal and not a real interrupt, it doesn't matter which. Everything below `esp` is unprotected and can be corrupted by some events. – Alexey Frunze Mar 27 '13 at 09:56
  • 'mov [esp-4], eax' - awesomely dangerous, yes. – Martin James Mar 27 '13 at 13:19
  • Even if a stack switch occurs via an interrupt gate, I would not be at all confident about accessing locations below the stack pointer. I'm not sure what would happen if the access generated a page fault, for example. – Martin James Mar 27 '13 at 13:28
  • Still not sure if this overvritte occurs - some ppl say that system interrupts have their own stack – user2214913 Mar 31 '13 at 09:38
  • @user2214913 this can be done - an interrupt can cause a stack switch. Nevertheless, I'm not convinced that such an approach is page-fault safe, and besides, why write code where you cannot put in a call to anything else, a debug dump, say, as development progresses? – Martin James Apr 01 '13 at 09:12