Software Thread-safety of basic Windows file operations

martins

Orbiter Founder
Orbiter Founder
Joined
Mar 31, 2008
Messages
2,448
Reaction score
462
Points
83
Website
orbit.medphys.ucl.ac.uk
Background: I am currently refactoring the Orbiter core build process, mainly for a clean separation of source, build and deployment directories. Currently it's a bit of a muddle, requiring lots of SVN ignore directives and manual copying.

As part of the changed build system, I now have a number of custom build steps that copy files (data files, scenarios, etc.) from the source to the deployment tree, in the form

if not exist $(TargetDir) mkdir $(TargetDir)
xcopy /d /y $(SourceDir)\$(File) $(TargetDir)\$(File)

This seems to work fine *until* I enable multiple build threads in Visual Studio. Then the build regularly fails with access violations.

So here is my question: Are the above commands threadsafe? If two threads try to create the same directory simultaneously, is this handled gracefully? Does Windows make a difference between read and write access to a file, and are two threads allowed to simultaneously get read access to the same file?

Is there another way to perform this operation in a threadsafe manner?
 

Urwumpe

Not funny anymore
Addon Developer
Donator
Joined
Feb 6, 2008
Messages
37,605
Reaction score
2,327
Points
203
Location
Wolfsburg
Preferred Pronouns
Sire
So here is my question: Are the above commands threadsafe? If two threads try to create the same directory simultaneously, is this handled gracefully? Does Windows make a difference between read and write access to a file, and are two threads allowed to simultaneously get read access to the same file?

I have to solve the same windows problem here at work, regarding access violations during parallel file operations. As far as I can tell, the problem is that internally in Windows, the folder is already created at a time, but not its Access Control Lists. The same algorithm works fine in UNIX, but fails in Windows, that is the annoying problem here. Also I am not absolutely sure if part of the problem can indeed be too processes creating the same directory at the same time, since we "exploit" implicit behavior of a tool here to create the folder structure.

Depending on the Creation flags when you open a file, two threads can indeed read from the same file. You can even write and read from the same file if needed. But this does not apply to creation of a new file.

Is there another way to perform this operation in a threadsafe manner?

Current unbearable hotfix solution here is trying the operation again after a 2 second delay. Since most jobs are completely done on a UNIX machine in 0.1 seconds, 2 seconds are far to long for this software.

Attempted lasting problem solution: Create all folders first, let ACL update and then create files in the folders. Not sure if this works better, the same software algorithm must also run well on different UNIX versions and processor architectures. But at least, it couldn't harm.
 

Face

Well-known member
Orbiter Contributor
Addon Developer
Beta Tester
Joined
Mar 18, 2008
Messages
4,398
Reaction score
578
Points
153
Location
Vienna
Background: I am currently refactoring the Orbiter core build process, mainly for a clean separation of source, build and deployment directories. Currently it's a bit of a muddle, requiring lots of SVN ignore directives and manual copying.

As part of the changed build system, I now have a number of custom build steps that copy files (data files, scenarios, etc.) from the source to the deployment tree, in the form

if not exist $(TargetDir) mkdir $(TargetDir)
xcopy /d /y $(SourceDir)\$(File) $(TargetDir)\$(File)

This seems to work fine *until* I enable multiple build threads in Visual Studio. Then the build regularly fails with access violations.

So here is my question: Are the above commands threadsafe? If two threads try to create the same directory simultaneously, is this handled gracefully? Does Windows make a difference between read and write access to a file, and are two threads allowed to simultaneously get read access to the same file?

Is there another way to perform this operation in a threadsafe manner?

If I got that right, you have the following situation:
  • One VS solution with many projects in it.
  • Some of these projects have the above mentioned post build step.
  • Many of these projects don't reference each other or do not have explicit dependencies set.
  • You have the option "maximum number of parallel project builds" set to >1 (default is 8 I think).
If this is the case, VS starts multiple MSBuild processes (at least my 2015 does so here) with some dependency chain working through the projects in each of them. Simple command-line post-build steps are then prone to throw the access-violation you mentioned, but somehow I doubt it is really a race-condition in the directory creation in your case.
I think it has something to do with another process trying to access the $(TargetDir), which is a standard macro that points to a directory that seems to be under exclusive control of the main VS process.

I'd suggest to try 2 things:

  1. Use a different deployment directory instead of $(TargetDir). For this you'd have to copy the VS-project artifacts (the result of the compile), too.
  2. For a controlled, centralized directory operation, create an empty dummy-project with only the copy operation, and add dependencies to this project to each of the other projects. This way, the dependency chain in each of the parallel processes will only start after that central project finished its work. You can use this trick for a final, centralized post-build as well (e.g. deploying all the things at once).
..or I'm totally wrong and you already have a custom MSBuild scripting doing a custom build process. In this case, ignore me :lol: .
 

martins

Orbiter Founder
Orbiter Founder
Joined
Mar 31, 2008
Messages
2,448
Reaction score
462
Points
83
Website
orbit.medphys.ucl.ac.uk
Thanks for the input. I'll try the idea of a dummy project that creates the target directory structure, so that individual projects don't try to create the same directories. I hope it's possible for different threads to simultaneously copy files into the same directory. If not even that is possible, I'm stumped.

If possible, I still want the individual projects to copy the files they are responsible for (e.g. the DeltaGlider project copying its own config, mesh and texture files) into the deployment directory, rather than doing all copies by a central deployment dummy project. Ideally so that if one project is removed, all the corresponding copy operations are automatically dropped as well, rather than having to manually edit the deployment project.

If I got that right, you have the following situation:
  • One VS solution with many projects in it.
  • Some of these projects have the above mentioned post build step.
  • Many of these projects don't reference each other or do not have explicit dependencies set.
  • You have the option "maximum number of parallel project builds" set to >1 (default is 8 I think).
That's correct. To point 3, I have all dependencies set (correctly, I hope). Given the nature of the build, pretty much everything depends on the Orbiter core project, but few of the following projects depend on each other, so the core is built first, and then everything else can be built in any order or simultaneously.

I am doing all my build scripting inside the integrated VS GUI. I haven't really looked into using MSBuild with a handmade build script. I'll look into that.

I wonder if this is the point where I should start looking into a cmake solution. It's probably overkill for a project as simple as Orbiter (single target platform, hardly any external dependencies).

While I'm on the topic of building Orbiter - I am also looking into upgrading the compiler toolchain. Currently I am using VS2008, but it is getting a bit long in the tooth. Is there a compelling reason to upgrade to a newer compiler, given that Orbiter isn't using any bleeding-edge C++ features? For example, is a newer version likely to generate more efficient code?

I do have a professional version of VS2008, but for anything newer I would probably have to do with a community version. Would this have an impact, e.g. the tools for managing dialog resources, etc?

Another point is that I would prefer a compiler version where most users can be reasonably expected to have the corresponding runtimes already installed on their machines. Otherwise people who install Orbiter from zip will often run into problems with missing runtimes. So maybe something like VS2013 would be a reasonable compromise?
 

Urwumpe

Not funny anymore
Addon Developer
Donator
Joined
Feb 6, 2008
Messages
37,605
Reaction score
2,327
Points
203
Location
Wolfsburg
Preferred Pronouns
Sire
Is there a compelling reason to upgrade to a newer compiler, given that Orbiter isn't using any bleeding-edge C++ features? For example, is a newer version likely to generate more efficient code?

Not that much, but of course there are also bugfixes and security patches in a newer runtime library. There are some new optimizations more now, but I doubt they are so revolutionary that they alone justify the update.

I primarily recommend updating the IDE because even just the 2017 community edition has become much more powerful and better to use than the 2008 professional version. I have yet to find an IDE with a better inclusion of git into its UI - you really notice that Microsoft switched over to git themselves.

I do have a professional version of VS2008, but for anything newer I would probably have to do with a community version. Would this have an impact, e.g. the tools for managing dialog resources, etc?

Visual Studio IDEs since 2013 have a build-in resource editor even in Community. But I am still not really convinced of this. The classic Windows resources are just legacy now and don't get much attention from Microsoft anymore. But the RC is still installed with the SDK.

Since this really depends on your Orbiter project, I can only recommend to test it first and then decide if switching to newer Visual Studio is helpful. You might get some additional work during the transition that nobody can predict yet.
 

Face

Well-known member
Orbiter Contributor
Addon Developer
Beta Tester
Joined
Mar 18, 2008
Messages
4,398
Reaction score
578
Points
153
Location
Vienna
I am doing all my build scripting inside the integrated VS GUI. I haven't really looked into using MSBuild with a handmade build script. I'll look into that.

MSBuild is like NAnt, which itself is inspired by Apache Ant. IMHO a very cumbersome language to script builds in, in particular caused by its use of XML as basic format.

While I'm on the topic of building Orbiter - I am also looking into upgrading the compiler toolchain. Currently I am using VS2008, but it is getting a bit long in the tooth. Is there a compelling reason to upgrade to a newer compiler, given that Orbiter isn't using any bleeding-edge C++ features? For example, is a newer version likely to generate more efficient code?

I use VS from 2003 upwards to 2015 on a daily basis due to work, although mostly for .NET projects. IMHO, the versions from 2012 upwards have better toolsets (codemaps, analysis, debugger options), but feel more sluggish in day-to-day work. 2008 was snappier, if you ask me.

On the subject of efficiency of the C++ compiler, I can't really comment much. Those few C++ projects I use newer versions with (mostly Orbiter stuff, some command-line tools for work) don't really show a difference, but of course I never profiled them for comparison.

I do have a professional version of VS2008, but for anything newer I would probably have to do with a community version. Would this have an impact, e.g. the tools for managing dialog resources, etc?

I think that the biggest impact would be on the Orbiter community itself, because then chances are that people can compile the samples right out of the box with a freely available tool-chain. I think newer versions of the free compilers don't suffer from missing resource editors anymore, but then again I don't have much experience with them.
 

martins

Orbiter Founder
Orbiter Founder
Joined
Mar 31, 2008
Messages
2,448
Reaction score
462
Points
83
Website
orbit.medphys.ucl.ac.uk
I just remembered one more compiler-related problem I wanted to solve for a long time. Maybe one of you guys has some suggestions?

A major headache for the Orbiter beta SVN repository is the fact that VC++ appears to create different binaries from the same sources at every compile run. So whenever I do a "rebuild all", SVN will commit everything, although most of the targets have been built from unchanged sources. I usually try to hand-pick the actually changed targets, but obviously this is tedious and error-prone.

As far as I can tell, the differences are only a couple of bytes in a single spot, so I suspect that this is a time-stamp or GUID built into the executable. Is there a way to suppress this, so that builds from identical sources are guaranteed to produce identical binaries? I didn't find anything relating to that for VS2008, but maybe newer versions would have an option for this?
 

Face

Well-known member
Orbiter Contributor
Addon Developer
Beta Tester
Joined
Mar 18, 2008
Messages
4,398
Reaction score
578
Points
153
Location
Vienna
A major headache for the Orbiter beta SVN repository is the fact that VC++ appears to create different binaries from the same sources at every compile run. So whenever I do a "rebuild all", SVN will commit everything, although most of the targets have been built from unchanged sources. I usually try to hand-pick the actually changed targets, but obviously this is tedious and error-prone.

As far as I can tell, the differences are only a couple of bytes in a single spot, so I suspect that this is a time-stamp or GUID built into the executable. Is there a way to suppress this, so that builds from identical sources are guaranteed to produce identical binaries? I didn't find anything relating to that for VS2008, but maybe newer versions would have an option for this?

Yeah, that's a super nasty problem. IIRC, Microsofts suggestion is to "use the Microsoft Portable Executable and Common Object File Format Specification from the MSDN Library to alter the date/time stamps so that they won't be a factor in the comparison".
There is also a tool called DumpBin that should have the capability to factor out the stamps, but then you'd have to configure SVN so that it uses this tool's result instead of hashing to compare 2 versions. I have experimented in Mercurial with such an external comparer, but it is complicated and just as error-prone as the manual picking method. In SVN, I don't know how to even start.

I also think new VS versions don't make it easier.

Obligatory SO-link: https://stackoverflow.com/questions/1363217/binary-reproducibility-in-visual-c
 

Urwumpe

Not funny anymore
Addon Developer
Donator
Joined
Feb 6, 2008
Messages
37,605
Reaction score
2,327
Points
203
Location
Wolfsburg
Preferred Pronouns
Sire
A much simpler solution there: Have separate source and build folders.

I know Visual Studio smashes everything into your source structure, but that is not best practice. Its better to keep them separated, like common for UNIX builds for example.

Of course, if you want to check your builds into SVN, the timestamps will still matter. But if you have separate source and build folders, you can at least choose when to commit a build artifact.

CMake for example does that very nicely - it puts the project files for MSBuild into a different folder than the sources.
 

martins

Orbiter Founder
Orbiter Founder
Joined
Mar 31, 2008
Messages
2,448
Reaction score
462
Points
83
Website
orbit.medphys.ucl.ac.uk
A much simpler solution there: Have separate source and build folders.

Quite. In fact this is my main motivation for restructuring my build process. I hope to greatly reduce the need for rebuilds. In particular I want two entirely separate build directories for release and debug builds, so that switching from one to the other doesn't require a rebuild (it does at the moment, because while I have separate directories for intermediate binaries, the final targets are the same).
 

Face

Well-known member
Orbiter Contributor
Addon Developer
Beta Tester
Joined
Mar 18, 2008
Messages
4,398
Reaction score
578
Points
153
Location
Vienna
But if you have separate source and build folders, you can at least choose when to commit a build artifact.

I don't get that. Even with intermingled source and artifact structures, I can always choose what files I want to commit. It might be easier to specify all the build artifacts (because you can just say "exclude /bin/") but in Martin's case it wouldn't make a difference, because he would still have to choose which build artifacts he wants to commit on a file-by-file basis.
 

martins

Orbiter Founder
Orbiter Founder
Joined
Mar 31, 2008
Messages
2,448
Reaction score
462
Points
83
Website
orbit.medphys.ucl.ac.uk
Reading Face's link, this quote

The reason is that compiler writers are far more interested in generating correctly functioning code and generating it quickly than ensuring that whatever is generated is laid out identically on your hard drive. Due to the numerous and varied methods and implementations for optimizing code, it is always possible that one build ended up with a little more time to do something extra or different than another build did. Thus, the final result could be a different set of bits for what is the same functionality.

sounds like complete nonsense to me. Surely compiling a binary from a given set of source files is a deterministic process! Compiling from the same source files _should_ generate the same output unless you deliberately put in a random element. There may be "numerous and varied methods and implementations for optimizing code", but why would the compiler pick different ones in each run? It sounds like if the compiler has a bad day, it will do a sloppy job at optimizing my code. And what does it mean "a little more time to do something extra"?? Could it be that the compiler has better things to do than waste time on compiling my code? :idk:
 

Urwumpe

Not funny anymore
Addon Developer
Donator
Joined
Feb 6, 2008
Messages
37,605
Reaction score
2,327
Points
203
Location
Wolfsburg
Preferred Pronouns
Sire

martins

Orbiter Founder
Orbiter Founder
Joined
Mar 31, 2008
Messages
2,448
Reaction score
462
Points
83
Website
orbit.medphys.ucl.ac.uk
Interesting, but even then I don't know why the compiler would seed its random number generator differently at each run, _unless_ it does code profiling directly during the optimisation _and_ remembers the results from the previous compilation run, so it can try additional MC sequences to improve on the last result. But that would be smarter than I would like to give Microsoft credit for :lol:
 

Face

Well-known member
Orbiter Contributor
Addon Developer
Beta Tester
Joined
Mar 18, 2008
Messages
4,398
Reaction score
578
Points
153
Location
Vienna
sounds like complete nonsense to me. Surely compiling a binary from a given set of source files is a deterministic process!

Yes, I had a chuckle at that, too. :lol:
OTOH, knowing about Microsoft code in e.g. .NET classes, I wouldn't be surprised if they've build in some randomness in the compiler just for the heck of it.
 

Linguofreak

Well-known member
Joined
May 10, 2008
Messages
5,031
Reaction score
1,271
Points
188
Location
Dallas, TX
Surely compiling a binary from a given set of source files is a deterministic process! Compiling from the same source files _should_ generate the same output unless you deliberately put in a random element.

Well, it may be that the quite is talking about binary reproducibility between different compilers or different builds of the same compiler. It also talks about differences in function or section order. I can imagine that if the compiler were running multithreaded, differences in thread scheduling by the OS could result in one thread's results being available for inclusion into the binary before another thread.
 

kuddel

Donator
Donator
Joined
Apr 1, 2008
Messages
2,064
Reaction score
507
Points
113
I've not yet tested this, but https://github.com/smarttechnologies/peparser looks promising.
It does not make the Visual C++ Compiler do deterministic builds, but as a post-build step it could (SVN-)revert the binaries in case they are "equal"

---------- Post added at 21:02 ---------- Previous post was at 20:49 ----------

Tested the above mentioned PE Parser (release v0.9.1) and it works perfectly with two consecutive D3D9Client builds.:thumbup:
 
Top