Compiling with /LARGEADDRESSAWARE

martins · Feb 13, 2016

It seems that Orbiter may finally be coming up against the 32-bit address space limit. During a test run I got an allocation failure when setting the surface tile resolution to 128 nodes and the tile LOD bias to max. At this point Orbiter had allocated about 1.7GB.

It's not a problem likely to come up in practice for now (setting both parameters to max leads to unplayable frame rates for me, although I suspect that on the D3D9 client it may be a more realistic proposition).

So maybe ultimately I will need to look into porting Orbiter to 64-bit, but that's not an immediate target (certainly not for the next release). So I was looking at a potential quick fix by setting the /LARGEADDRESSAWARE linker flag.

On 64-bit OS, this flag allows the application to address the full 32-bit (4GB) address space (actually more like 3GB, once the operating system takes its share), rather than 2GB for standard 32-bit executables.

However, the web is full of dire warnings about introducing sporadic, hard-to-pinpoint bugs with this flag. The problem is that pointer arithmetic is more dangerous. You can't use signed integers for representing pointers, so address differences must not be negative, etc. I did a quick test, and didn't encounter any immediate adverse effects after using the flag, but I couldn't guarantee that the Orbiter code is entirely 4GB-safe.

Does anybody have experience with using that flag? Are there any tools to test if code is safe in that respect? I guess the first step is to look into compiler signed/unsigned mismatch warnings (of which there are plenty in Orbiter).

For now, I don't anticipate to set the flag for the Orbiter executable in the next release, but I'll keep playing around with it to get a feeling how stable Orbiter is with it.

Face · Feb 13, 2016

What would that mean for addons? Would you have to recompile them with the flag, too? The same question goes for the 64-bit Orbiter version.

dbeachy1 · Feb 13, 2016

A 64-bit Orbiter core could only load 64-bit DLLs, so any add-ons would need to be recompiled as 64-bit. That should be straightforward (I've created 64-bit and 32-bit versions of the same exe + DLLs before), but it would mean that old 32-bit add-ons could no longer be loaded. I suppose we could have both 32-bit and 64-bit Orbiter core versions for a time, but I assume that's down the road farther.

As for existing 32-bit DLLs with a Large-Address-Aware Orbiter core, from what I have read, the LAA flag is just a flag set by the linker in the EXE file and is not necessary for DLLs (see here and here.). That being said, if an add-on is (incorrectly!) doing pointer arithmetic using signed integers instead of pointer variables, then, as Martin mentioned, problems would ensue. However, I would argue the Orbiter core shouldn't avoid using the LAA flag just because some buggy add-ons could crash.

I would love to see a 64-bit Orbiter core sooner rather than later, but I do understand that is a much bigger step.

SolarLiner · Feb 13, 2016

Facing the problem of the mismatching 32 vs. 64-bit DLLs, would it be possible to make some sort of bridge?

This method is used by my DAW to allow loading of 32 bit VSTs in the 64-bit version of the program. It does that by having e bridge executable that is fed the audio (or MIDI if it's an instrument) and parameters input, and then gets them out processed. The result only adds a few samples of latency (which corresponds to something under a millisecond at 44.1 kHz sample rate), and of course a few more CPU cycles being used.

Though I don't know how hard that would be to implement, I'd guess that would be doable for using 32-bit compiled modules the same way.

dbeachy1 · Feb 13, 2016

Unfortunately a bridge won't work in this case because a 32-bit DLL can't use 64-bit pointers, and the Orbiter core needs to send pointers to/from its add-on DLLs as well as invoke callback methods in the DLLs. A 64-bit process cannot load and run code in a 32-bit DLL -- details here: http://stackoverflow.com/questions/2265023/load-32bit-dll-library-in-64bit-application So that would just leave using (as you mentioned) a separate 32-bit bridge exe file and using inter-process communication to send data between the 64-bit and 32-bit processes -- but that would take a lot of work to implement for the Orbiter core if it could even be made to work at all (it would certainly cause a performance hit to keep marshaling the data each frame between the 32-bit and 64 processes). It would be a lot simpler just to have a 32-bit version of the core for older add-ons as well as a 64-bit version for newer add-ons.

But this is all moot for now since Martin has said the Orbiter core will remain 32-bit for a while longer, at least -- particularly if the LAA flag works out. :tiphat:

Urwumpe · Feb 13, 2016

dbeachy1 said:
Unfortunately a bridge won't work in this case because a 32-bit DLL can't use 64-bit pointers, and the Orbiter core needs to send pointers to/from its add-on DLLs as well as invoke callback methods in the DLLs. A 64-bit process cannot load and run code in a 32-bit DLL -- details here: http://stackoverflow.com/questions/2265023/load-32bit-dll-library-in-64bit-application

But a 32 bit executable could communicate with a 64 bit executable... like using a 32 bit orbiter proxy for a 64 bit orbiter application.

dbeachy1 · Feb 13, 2016

True, but that would a) be a lot of work to implement all that marshaling/unmarshaling code in the Orbiter core, b) require writing a new 32-bit bridge executable, and c) cause a performance hit. Every single frame would involve copying (marshaling) data between the 64-bit and 32-bit processes. I would argue it's not worth the effort and performance hit but instead just have a 32-bit version of the core as well for legacy add-ons.

Dambuster · Feb 13, 2016

Would it be feasible to make a program to recompile the DLLs for addons stored on OH into 64-bit ones, and do so automatically?

I have no idea about any potential licensing issues caused by this, or if you'd need access to the addons' source code(?), but I'm a bit worried by the idea of 99% of Orbiter addons becoming incompatible overnight. It would be a huge shame to effectively 'lose' so much incredible work, and I think such a change could be quite damaging to the community.

Just from my viewpoint, I use older addons all the time, and my Orbiter experience would be a lot duller without them!

martins · Feb 13, 2016

Well, given the current trend in Orbiter release dates, and the fact that the next release will not be 64-bit, I would estimate that by the time Orbiter has a 64-bit release, there will be a permanent base on Mars, and 64-bit computers will be exhibits in the science museum. :lol:

But in any case, if and when I do go for 64-bit code, I hope to keep the code base compile for 32-bit targets as well, at least for a transitional period (say one release, which should translate to about 10 years). Enough time for addon developers to make the switch.

Linguofreak · Feb 14, 2016

martins said:
On 64-bit OS, this flag allows the application to address the full 32-bit (4GB) address space (actually more like 3GB, once the operating system takes its share), rather than 2GB for standard 32-bit executables.

I'm not familiar with the specifics on Windows, but from how 32-bit applications are generally run on 64-bit OS's, you should have the whole 4 GiB available to the program, unless Windows does something really weird, since a 64-bit OS can and generally does place itself well above the 4 GiB mark. On 32-bit OS's, there always needs to be at least some space reserved for the kernel, with the exact amount depending on the OS (3 GiB by default on Linux, 2 GiB on Windows).

IIRC, the /LARGEADDRESSAWARE flag on Windows was originally introduced to allow a Linux-style 3 GiB user / 1 GiB kernel split for 32-bit applications on 32-bit kernels. I'm not sure why they needed to introduce such a flag, as far as I know applications for other OS's are able to make use of extra address space transparently if the kernel lets them have it.

dbeachy1 · Feb 14, 2016

Linguofreak said:
I'm not familiar with the specifics on Windows, but from how 32-bit applications are generally run on 64-bit OS's, you should have the whole 4 GiB available to the program, unless Windows does something really weird, since a 64-bit OS can and generally does place itself well above the 4 GiB mark.

Unfortunately it can't work like that because a given 32-bit program can only access (meaning, read from memory, write to memory, or execute code in) a 4 GB range of memory at once -- i.e., it can only "see" 4 GB of address space at a time no matter how much RAM is in the machine. That includes Windows kernel code and device memory that the program uses as well. Windows can and does map the same physical memory containing its kernel code and relevant data (which could be anywhere in physical memory) into each 32-bit process's virtual address space -- that happens somewhere above the 2 GB mark, or, if the processes is Large-Address-Aware, above the 3 GB mark: that last 800 MB - 1 GB is where Windows has to map its kernel and device code and data into the process's address space. This is because each 32-bit process needs to pass data back and forth between the program's memory space, kernel code, and physical devices (video cards, hard drives, sound cards, etc.): each device requires some memory for the device driver code as well as a range of virtual address space that is mapped to memory and/or I/O ports on the device itself, which the device driver uses to communicate with it (for example, video memory on a video card). For details and some diagrams, see this Virtual Address Spaces MSDN article.

Back in 1998 when Windows 2000 was being developed 4 GB of address space was huge, and so Windows was designed to reserve the upper 2 GB of the 4 GB virtual address space for itself and device drivers and the lower 2 GB for programs: at the time, a typical PC only had 64 megabytes (as on 0.064 GB) or less of RAM. And now, 18 years later, we are bumping up against the 2 GB limit -- which just goes to show how long-in-the-tooth 32-bit Windows is by now.

Linguofreak said:
IIRC, the /LARGEADDRESSAWARE flag on Windows was originally introduced to allow a Linux-style 3 GiB user / 1 GiB kernel split for 32-bit applications on 32-bit kernels. I'm not sure why they needed to introduce such a flag, as far as I know applications for other OS's are able to make use of extra address space transparently if the kernel lets them have it.

As I understand it, it was for code compatibility reasons: the LAA flag was added so that only a program that explicitly requested it would ever have more than 2 GB of RAM mapped into its address space. This was because, as Martin pointed out, 32-bit code that incorrectly stores pointer differences as signed integer variables (instead of unsigned) would crash. More information is here and here.

Linguofreak · Feb 14, 2016

dbeachy1 said:
Unfortunately it can't work like that because a given 32-bit program can only access (meaning, read from memory, write to memory, or execute code in) a 4 GB range of memory at once -- i.e., it can only "see" 4 GB of address space at a time no matter how much RAM is in the machine. That includes Windows kernel code and device memory that the program uses as well. Windows can and does map the same physical memory containing its kernel code (which could be anywhere in physical memory) into each 32-bit process -- that happens somewhere above the 2 GB mark, or, if the processes is Large-Address-Aware, above the 3 GB mark: that last 800 MB - 1 GB is where Windows has to map its kernel and device code and data into the process's address space.

I know for a fact that it *does* work like that in 32-on-64 situations on Linux. Remember that user-mode code can't see kernel or driver memory even in 32-on-32 situations, so a 64-bit kernel can map itself (including any drivers) into the process's address space above 4 GiB without causing any problems. At any system call, the CPU automatically switches to 64-bit mode and starts running the kernel at the system call entry point, and the kernel zero extends any pointers passed to it to 64-bit. It verifies that any pointers it passes back are under 4 GiB, just as it would verify that they are under 3 GiB in a 32-on-32 situation, and then truncates them to 32-bit. I would assume that 64-bit Windows does the same thing with /LARGEADDRESSAWARE 32-bit programs, unless something really wonky is going on (even with non-/LARGEADDRESSAWARE 32-bit programs, Win64 kernels are probably still doing what I have described, and just not mapping anything in the 2-4 GiB region at all, so that they can use the same kernel mapping for every program, regardless of bitness or LAAness).

As I understand it, it was for code compatibility reasons: the LAA flag was added so that only a program that explicitly requested it would ever have more than 2 GB of RAM mapped into its address space. This was because, as Martin pointed out, 32-bit code that incorrectly stores pointer differences as signed integer variables (instead of unsigned) would crash. More information is here and here.

OK, so it's a class of bugs that is specific to OS's that have historically restricted user-mode code to the bottom half of the address space. Most non-Windows OS's these days are Unices, which have generally run on a broader range of CPU architectures, and several important architectures in the history of Unix have used separate page tables for the kernel, allowing processes to use their full address space, and I think 3 GiB has been the most popular split point on 32-bit for Unices on architectures that don't have that feature. So Unix application code has never been able to assume the invalidity of higher-half pointers.

---------- Post added at 15:08 ---------- Previous post was at 14:41 ----------

dbeachy1 said:
Back in 1998 when Windows 2000 was being developed 4 GB of address space was huge, and so Windows was designed to reserve the upper 2 GB of the 4 GB virtual address space for itself and device drivers and the lower 2 GB for programs: at the time, a typical PC only had 64 megabytes (as on 0.064 GB) or less of RAM. And now, 18 years later, we are bumping up against the 2 GB limit -- which just goes to show how long-in-the-tooth 32-bit Windows is by now.

The 2 GiB split only goes back as far as Win2k? I'd have assumed it went back to NT 3.1 and the origins of the Win32 API.

Hielor · Feb 14, 2016

Linguofreak said:
It verifies that any pointers it passes back are under 4 GiB, just as it would verify that they are under 3 GiB in a 32-on-32 situation, and then truncates them to 32-bit.

How does that work given that (according to you) the kernel has granted the entire 4GB space to the process rather than reserving any for itself?

Obviously the kernel needs to have some of the sub-4GB space available, otherwise it won't be able to allocate/return anything below that for the 32-bit process...

dbeachy1 · Feb 15, 2016

In any case, the question was about 32-bit Windows here, so even if Linux does it differently by somehow thunking the data back and forth between 32-bit and 64-bit memory, that isn't applicable to the Orbiter question in this thread.

If it helps, let me phrase it this way: "Unfortunately it can't work like that on Windows because a given 32-bit program can only access (meaning, read from memory, write to memory, or execute code in) a 4 GB range of memory at once -- i.e., it can only "see" 4 GB of address space at a time no matter how much RAM is in the machine."

Linguofreak · Feb 15, 2016

Hielor said:
How does that work given that (according to you) the kernel has granted the entire 4GB space to the process rather than reserving any for itself?

Obviously the kernel needs to have some of the sub-4GB space available, otherwise it won't be able to allocate/return anything below that for the 32-bit process...

The kernel isn't going to pass the process pointers to memory that isn't userspace-accessible even if the kernel is mapped below 4 GiB, otherwise the process would crash when it tried to use the pointer.

Pointers returned from system calls will generally either be responses to allocation requests, responses to requests to memory map a file (if we don't consider that a subcategory of "allocation requests"), confirmation of pointers passed in, or, possibly, pointers to some component of a data structure to which a top-level pointer had already been passed in (although most examples of functions that would do the latter that I can think of would tend to be implemented as userspace library functions rather than as system calls proper that would actually call the kernel).

In the first two cases, the address returned is unmapped at the beginning of the system call (neither kernel nor user code can access it), is in the userspace-reserved portion of the address space, and is available for use by the process when the system call completes.

In the second two cases, the kernel is operating on a data structure provided by the process, and the returned pointer points to the whole or a part of that data structure. The address is in the userspace-reserved portion of the address space, and is mapped and assigned to the process at the beginning and end of the call.

---------- Post added at 22:19 ---------- Previous post was at 21:57 ----------

dbeachy1 said:
In any case, the question was about 32-bit Windows here, so even if Linux does it differently by somehow thunking the data back and forth between 32-bit and 64-bit memory, that isn't applicable to the Orbiter question in this thread.

Yes, but my purpose in bringing Linux up was to say "I know how Linux does it, so I'm pretty sure Windows does the same thing unless Microsoft is much stupider than I think they are".

There's no more thunking involved than there would be in any 32-on-64 implementation, and it would actually be more work to put the kernel below 4 GiB fir 32-bit processes and in the higher half of the 64-bit space for 64-bit processes than to put it in the latter place for both.

Anyways, from the Windows-specific side rather than the general-principles side, a bit of Googling turned up this:

https://blogs.msdn.microsoft.com/oldnewthing/20050601-24/?p=35483

dbeachy1 · Feb 15, 2016

Linguofreak said:
Anyways, from the Windows-specific side rather than the general-principles side, a bit of Googling turned up this:

https://blogs.msdn.microsoft.com/oldnewthing/20050601-24/?p=35483

Linguofreak said:
so I'm pretty sure Windows does the same thing unless Microsoft is much stupider than I think they are.

Except that that blog article is wrong. Just because some random person writes a blog doesn't mean the information is accurate. I'm not going to argue with you. This official MSDN article spells out how 32-bit Windows works. Quoting from the article:

MSDN Documentation said:
In 32-bit Windows, the total available virtual address space is 2^32 bytes (4 gigabytes). Usually the lower 2 gigabytes are used for user space, and the upper 2 gigabytes are used for system space.

In 32-bit Windows, you have the option of specifying (at boot time) that more than 2 gigabytes are available for user space. The consequence is that fewer virtual addresses are available for system space. You can increase the size of user space to as much as 3 gigabytes, in which case only 1 gigabyte is available for system space. To increase the size of user space, use BCDEdit /set increaseuserva.

This Microsoft documentation article and this Microsoft documentation article say the same thing. You don't have to believe the official Microsoft documentation, but that's how 32-bit Windows apps work.

Linguofreak · Feb 15, 2016

dbeachy1 said:
Except that that blog article is wrong. Just because some random person writes a blog doesn't mean the information is accurate.

This isn't just some random person. This is a Microsoft employee with, from what I can see, considerable tenure as a developer.

This official MSDN article spells out how 32-bit Windows works.

Yes, it does. It does not, however, spell out how *64-bit Windows* works with 32-bit programs.

You don't have to believe the official Microsoft documentation, but that's how 32-bit Windows apps work.

If you insist on official Microsoft documentation, a bit more Googling turns this up:

MSDN said:
32-bit applications on 64-bit platforms can address up to 2 GB, or up to 4 GB with the /LARGEADDRESSAWARE:YES linker flag.

Thirty-two-bit applications that are large-address-aware can determine at run time how much total virtual address space is available to them with the current OS configuration by calling GlobalMemoryStatusEx. The ullTotalVirtual result will range from 2147352576 bytes (2 GB) to 4294836224 bytes (4 GB). Values that are larger than 3221094400 (3 GB) can only be obtained on 64-bit editions of Windows. For example, if IncreaseUserVa has a value of 2560, the result is ullTotalVirtual with a value of 2684223488 bytes.

Emphasis mine.

birdmanmike · Feb 15, 2016

Linguofreak is quite right. With 64 bit Win OS the flight simulator community has been using up to 4 Gb VAS for years - with FSX and now P3D (which has it built-in). Before anybody says - actually about 3.5 Gb because of OS overhead.

Not in 32 bit Windows. This as said above is with 32 bit (flight simulator for example) with 64 bit Windows. Not just FS of course, 32 bit applications that can use more VAS

Search also "4Gb patch" . . .

Face · Feb 15, 2016

So that means with this flag we can use up to 4GB on 64-bit OS, and up to around 3GB on 32-bit OS, right? That is also without too much incompatibility with old addons, isn't it?

Sounds like a win.

As for the pointer arithmetics: what would be a real-world example of code to violate that rule?
Is running down a character array with a signed integer and then calculating character positions by means of adding a negative value to the array pointer a problem? I.e. having something like this:

Code:

int i=-1;
char a[5]="Test";
char b=*(&a[1]+i);
if (b!='T') fail();

Could that fail with LAA?

Hielor · Feb 15, 2016

I personally have worked with the guy that wrote that blog. He probably knows more about how Windows works than pretty much any other single person on the planet.

Compiling with /LARGEADDRESSAWARE

Orbiter Founder

Well-known member

O-F Administrator

It's necessary, TARS.

O-F Administrator

Not funny anymore

O-F Administrator

Member

Orbiter Founder

Well-known member

O-F Administrator

Well-known member

Defender of Truth

O-F Administrator

Well-known member

O-F Administrator

Well-known member

Active member

Well-known member

Defender of Truth

Similar threads