Advanced Question Heap corruption on exit

jedidia · Jul 8, 2012

Now, this isn't the first heap corruption on exit I've had, but it's the first I'm completely dumbfounded about and can't figure out where it comes from.

Here's the deal:
All vessels of the module causing the corruption save state without trouble. The crash happens after all of them have been executed.

The crash happens, however, before any of the vessels destructors are called. What comes in between there that could cause such a thing? I'm at a serious loss on where to start looking.

the output window has this to say, but it only really states what I already know:

Code:

HEAP[orbiter.exe]: Heap block at 0AD67FC0 modified at 0AD8E9F8 past requested size of 26a30
This may also be due to the user pressing F12 while orbiter.exe has focus.

I don't quite get what the second line is about and why pressing F12 could cause such a thing, but I'm not pressing it anyways. It might give a hint to someone about what's going on, or it might not.

The crash started to appear after I've made some changes. I tried to undo those changes to backtrack when the trouble started, but even when none of the new functions were called ever during the simulation, it still persisted. Also, those changes didn't really do much memor related. There was a new vector, but no pointers. I'm somewhat confused here. It might help already if I knew what else gets called between savestate and destructor, although I don't think there is anything vessel related?

The crash is also not 100% consinstent (only about 99%). In case it doesn't happen, the next scenario I load will crash on startup (if I don't close the launchpad, of course), which makes sense if there was a corruption on exit that didn't cause a crash.

It doesn't seem to matter in a release build, but it makes me somewhat nervous none the less.

orb · Jul 8, 2012

Is the memory at 0x0AD67FC0 allocated by your code with `new` operator, or is it an address (a handle) returned by some oapi function, like for loading mesh or something, or maybe no function or `new` operator returns that address (and it's something internally allocated by Orbiter)?

jedidia · Jul 8, 2012

That is of course a very good question, trouble is, how do I find that out? Checking every single pointer manually on allocation isn't really an option, too error prone and could take days.
I do not get a break point by the debugger, as the crash doesn't happen inside my code, but the corruption has to happen somewhere in my code, or I wouldn't be the only one having such problems.
So, if you can tell me how I can check wheather that memory address has been allocated by me or not, I'd be much obliged...

The Autos tab does show an object called EDI at that address, but I can't make anything out of it. It's none of mine, for sure.

csanders · Jul 8, 2012

You might already know this, but heap corruption errors generally only occur with debug builds, because the compiler will set specific values at the end of a portion of memory, and look to see if those values change (I think this check only happens during a delete or free()). The release build won't set the values at the end of memory, or check for change - hence no error, but whatever is overwriting the memory is probably still happening.

I think you can set a break point to a specific address in memory, and the debugger will stop if something read/writes to it. If you can set that up, you might be able to do a stack trace to find the offending function.

EDIT: EDI is probably the extended destination index register. It is usually assigned the address of a destination for writing to memory.

orb · Jul 8, 2012

jedidia said:
The Autos tab does show an object called EDI at that address, but I can't make anything out of it. It's none of mine, for sure.

EDI is a 32-bit index register, usually used to point to destination address for copying memory, comparing, or filling memory with some value (from EAX register). It can also be used as a generic register if there are no other free registers and you don't want to push other register's value on stack (i.e. to much slower than registers memory). The EDI register isn't at that address (it's in processor), it only holds pointer to that address.

storm · Jul 8, 2012

I don't think we can do much for debugging without having the actual binaries. If you don't feel up to releasing the source you can always just pack up the symbols.

jedidia · Jul 8, 2012

I don't worry about the sourcecode, I worry about your sanity :lol:

The beast is big, and the core structure somewhat convoluted (the original programmer seemed to have good knowledge of c, but not much of C++). I wouldn't really want to force it on someone to look through it, but if that's the only way, I'll upload it somewhere. But if you can give me a procedure how to find out what is at that address, I might still be able to figure it out by myself.

You might already know this, but heap corruption errors generally only occur with debug builds

No, didn't know that. Good that I do now.

---------- Post added at 09:01 PM ---------- Previous post was at 07:02 PM ----------

Tried a bit of digging with the memory window, not that much success. The memory address writes the letters IMS, which is the name of my module, but it definitely is not the address of any of the allocated instances... Don't quite see yet how I can find out more.

Say, if I cast a VESSEL* into my subclass IMS*, and then at some point in time pass IMS* as a VESSEL* without casting... might that be a problem? the compiler's all happy with it, at least...

storm · Jul 8, 2012

jedidia said:
Say, if I cast a VESSEL* into my subclass IMS*, and then at some point in time pass IMS* as a VESSEL* without casting... might that be a problem? the compiler's all happy with it, at least...

That should be fine as long as IMS is a subclass of VESSEL. Errors like that generally resolve themselves at compile time.

dbeachy1 · Jul 8, 2012

To pin down which method is corrupting memory, add this block to the beginning of your ovcInit method:

Code:

#ifdef _DEBUG
    // NOTE: _CRTDBG_CHECK_ALWAYS_DF is too slow
    _CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF |
                   _CRTDBG_CHECK_CRT_DF | 
                   _CRTDBG_LEAK_CHECK_DF); 
#endif

Then rebuild in DEBUG mode, run Orbiter under the debugger, and reproduce the problem. The debugger should halt close to where the memory corruption occurs, at which point you can look around in the debugger and find the culprit. If the above flags still don't narrow the problem down enough, set the 'CRTDBG_CHECK_ALWAYS_DF' flag bit as well.

jedidia · Jul 9, 2012

Thanks a lot, but alas, no more information is forthcoming. I added the flags into ovcInit function of my module, including CHECK_ALWAYS, rebuilt and ran the whole thing again, but it's still the exact same: crash happens outside my code, I get no Autos, no locals, no stack symbols. Memory address isn't constant, which makes kinda difficult to find out what has been there if you don't get your variables displayed...

If there's anything I can say with certainty, it's that the corruption is more likely to occur the more instances of my class are present. Since I did most tests with one instance only, it is impossible to say when the problem started, as it has probably been around a long time before I noticed it...

EDIT: say, are there any instrumentation libraries for VC 2008 express? I can't find any...
Also, I did a thourough search, taking note of my pointers when they are created, and comparing it to the memory address that gets thrown when the crash occurs. As far as I can tell, none of my allocated stuff is anywhere near that address, nor are any meshes, although there is of course a good chance that I missed something.
But of course there doesn't necessarily need to be any of my elements in that place, it's enough if any of them accidentaly access the memory, but since the crash doesn't happen in my code, I don't know. Is it possible that the corruption only gets noticed a while after it actually occured?

Also, about the meshes... they are a bit of a worry child of mine since we had that one weird occurence where a mesh actually wrote into other memory by sheer force of existence. It had something to do with an unclean mesh, and fortunately it wrote into the address of another mesh so we could actually notice the problem visually, track it down, and make the problem go away by fixing the mesh, no code fixes involved. Since that day I am a bit suspicious about bugs in orbiters mesh managment, and IMS definitely uses it more than any other addon ever did (after all it pieces a spacecraft together dynamically from different meshes).

dumbo2007 · Jul 9, 2012

Are you deleting/adding vessels by any chance ?

I find that when faced with a tough heap corruption issue, its best to start commenting out the code starting outwards from the Orbiter callbacks - if you know what I mean. Simplifying the code by commenting out entire functions etc.

By the way its almost definitely something in your code

change. I have been convinced that Orbiter has a bug many times and I have been proven woefully wrong every time

And I ll ask anyway...are you absolutely sure that you undid everything in the change so that the code is exactly as it was in the working state ?

csanders · Jul 9, 2012

Is it possible that the corruption only gets noticed a while after it actually occured?

Yes. Usually the "corruption check" happens when the memory block where the corruption occurred gets freed, and this can happen long after the corruption occurs.

FYI: I double checked this:

You might already know this, but heap corruption errors generally only occur with debug builds, because the compiler will set specific values at the end of a portion of memory, and look to see if those values change (I think this check only happens during a delete or free()). The release build won't set the values at the end of memory, or check for change - hence no error,

For a stand alone application:
If you run the release version in debug mode, it will still "break" when it detects the heap corruption - so there is still some check being done in release builds.

Running the application normally (i.e. double-clicking on it's icon) may not cause anything noticeable to happen depending on the memory size.
i.e.:
This causes no issues:

Code:

{
        char *buf;
        buf = (char *)malloc(25*9*3);
        buf[25*9*3] = 0;
        free (buf);
}

This causes the program to CTD:

Code:

{
        char *buf;
        buf = (char *)malloc(256*256*3);
        buf[256*256*3] = 0;
        free (buf);
}

And running the debug build normally, will cause a debug assertion window to appear, which doesn't appear on the release build.

jedidia · Jul 9, 2012

I find that when faced with a tough heap corruption issue, its best to start commenting out the code starting outwards from the Orbiter callbacks - if you know what I mean. Simplifying the code by commenting out entire functions etc.

That's what I do first. I had practicaly everything disconnected, except the stuff I can't without not just commenting away, but rewwriting entire functions (that is, the core architecture). My post- and prestep didn't really execute anything much anymore. No panels, no physics calculations, no damage simulation, nuffin'. Still happened, but only in one particular scenario. The more I'm experimenting, the more I find myself unabe to produce it in any other scenario... might be a gremlin to do with outdated scenario for my current version...

I have been convinced that Orbiter has a bug many times and I have been proven woefully wrong every time

Trouble is, I have been right a few times already. I'm not saying this must be an orbiter bug yet, I simply can't reproduce it relyably enough to say anything definitive. But IMS has the habit of bringing out stuff in Orbiter that no add-on ever does, because it recklessly creates, deletes and moves around docking ports, attachment points, thrusters, meshes, COG, you name it. No other add-on has yet used the orbiter API so dynamically, and since that thing with the mesh happened, I'm really suspicious of the whole mesh-handling thing... Just kind of a nervous nagging at the back of my head.

Yes. Usually the "corruption check" happens when the memory block where the corruption occurred gets freed, and this can happen long after the corruption occurs.

Translates to: the problem could be anywhere. Great!
I think I'll put this aside for a while, at least until I'm able to produce the bugger in another scenario.

Advanced Question Heap corruption on exit

jedidia

shoemaker without legs

orb

New member

jedidia

shoemaker without legs

csanders

Addon Developer

orb

New member

storm

New member

jedidia

shoemaker without legs

storm

New member

dbeachy1

O-F Administrator

jedidia

shoemaker without legs

dumbo2007

Crazy about real time sims

csanders

Addon Developer

jedidia

shoemaker without legs

Similar threads