Optimizing Computation of Factorials in Java

Arrowstar · Nov 2, 2012

Hi everyone!

I have a quick programming question concerning the optimization of some code. I'm working in Java, I have the following loop (in pseudo-code):

p=1;
for(i=A to B) {
p = p*i;
}

I know what A and B are, and I'm looking to compute "p". Normally I'd just run the loop and be done with it, but the method this loop is in gets called over 1 million times in a typical run of the program. Needless to say, if A and B are far enough apart, it takes a while to run.

What I'd like to be able to do is eliminate the loop I have up there and replace it with a single expression, thus speeding up the calculations. Anyone know if what I'm after is possible?

Quick_Nick · Nov 2, 2012

If A and B don't vary too much, I suppose you could have a table of results for p given A and B. Time vs space trade-off.

martins · Nov 2, 2012

Maybe this helps:
http://en.wikipedia.org/wiki/Factorial#Computation

Arrowstar · Nov 2, 2012

Dr. Schweiger:

I've explored that option. However, it does not appear to remove the need for for() loops if A and B are only known at runtime. Perhaps I'm missing something?

Quick_Nick · Nov 2, 2012

That bit limit gives me the best impression for just using a table. Assuming your Java implementation is very close to the simple pseudo-code, you're clearly not dealing with arbitrary precision.

I don't know the speed of tables in Java (though I should know the big-Os involved...), but it would be simple to experiment with.

RisingFury · Nov 2, 2012

First thing that comes to mind is to divide away the numbers you're not using...

For example, if you know that B is always less or equal to a certain number. In this case let's say 10. Let's also say that A is always higher than or equal to 1.

Now, if you have a typical A = 3 and typical B = 8, instead of multiplying everything from A * (A + 1) * (A + 2) * ... * B, you should just divide away the number you're not multiplying this time. If you multiply 1 * 2 * ... * 10, it comes out as 3 628 800. Now, if you have A = 5 and B = 8, then you only need to divide 3 628 800 by 2 and 10 and you'll get the same result...

There's one other problem you might run into. Integers only hold so much... The biggest number a 32 bit unsigned integer can hold is 2^32 - 1 = 4294967295 (the -1, because it starts with 0 and not with 1). Unless you're working with 64 bit integers (or higher), multiplying those numbers will quite quickly exceed whatever you can write into memory. If A = 2 and B = 20, it already WAY exceeds that...

martins · Nov 2, 2012

Well, do you need exact results (i.e. are your variables integers/variable precision integers), or approximate results (i.e. floating point values)?

If the latter, what about using Stirling's approximation mentioned in the Wikipedia article,

n! \approx \sqrt{2 \pi n}\left(\frac{n}{e}\right)^n

Arrowstar · Nov 2, 2012

Well, I'm not sure how that would work yet. In my OP, the for loop comes out mathematically to be (for, say, A=2 and B=5):

p = p^4 * 2^4 * 3^3 * 4^2 * 5

But since p equals 1 initially, what I actually have is:

p = 2^4 * 3^3 * 4^2 * 5

I can write this as:

p = 5! * 4! * 3! * 2!

Problem is that the number of factorials I need to multiply together is based on the difference B - A. The greater that difference is, the more I need, and since I can't know the difference ahead of time, there is no way I am aware of to programmatically compute p without a loop, which I am trying to avoid in the interest of speed.

martins · Nov 2, 2012

Arrowstar said:
Well, I'm not sure how that would work yet. In my OP, the for loop comes out mathematically to be (for, say, A=2 and B=5):

p = p^4 * 2^4 * 3^3 * 4^2 * 5

But since p equals 1 initially, what I actually have is:

p = 2^4 * 3^3 * 4^2 * 5

I can write this as:

p = 5! * 4! * 3! * 2!

Problem is that the number of factorials I need to multiply together is based on the difference B - A. The greater that difference is, the more I need, and since I can't know the difference ahead of time, there is no way I am aware of to programmatically compute p without a loop, which I am trying to avoid in the interest of speed.

This is not what the loop in your original post does (unless in Java, "*" means something other than a product).

Quick_Nick · Nov 2, 2012

Arrowstar said:
Well, I'm not sure how that would work yet. In my OP, the for loop comes out mathematically to be (for, say, A=2 and B=5):

p = p^4 * 2^4 * 3^3 * 4^2 * 5

But since p equals 1 initially, what I actually have is:

p = 2^4 * 3^3 * 4^2 * 5

I can write this as:

p = 5! * 4! * 3! * 2!

Problem is that the number of factorials I need to multiply together is based on the difference B - A. The greater that difference is, the more I need, and since I can't know the difference ahead of time, there is no way I am aware of to programmatically compute p without a loop, which I am trying to avoid in the interest of speed.

What your pseudo-code does is, for example with A=5 and B=8, you get p = 5*6*7*8 = 8! - 4!

What you say it does, is something very very different. Something close to (but not?)

Code:

int p = 1;
for(int i=0; i <= B-A; i++)
{
     p *= Math.pow(A+i, B-i);
}

which does equate to the factorial product you mentioned.

I'm still all for a factorial look-up table, whether you create it or compute it at runtime.
Overall though, even multiplying several recursive factorials millions of times in Java couldn't possibly take very long.

martins · Nov 2, 2012

Quick_Nick said:
5*6*7*8 = 8! - 4!

Not quite ...

Quick_Nick · Nov 2, 2012

Without optimization, I get 1.4 seconds (?) to compute it a millions times with A=2 and B=5. :lol: I can't attest to the accuracy of that number, but it's obviously a very short period of time regardless.

---------- Post added at 11:10 PM ---------- Previous post was at 11:09 PM ----------

martins said:
Not quite ...

...duh :lol: That should be 8! / 4!

---------- Post added at 11:33 PM ---------- Previous post was at 11:10 PM ----------

For comparison, for one million runs I get 8 milliseconds with a simple lookup array containing the first 12 factorials as opposed to 1400 milliseconds to compute it using Math.pow (which is a slow function as is).

agentgonzo · Nov 2, 2012

As a different route, take a step back and think why your are performing this loop over a million times. I've seen many times people trying to optimise an algorithm by various means, including running it on a super-beefy computer, only to have someone come in with a different approach that doesn't need the massive loop. This may (I say 'may') be one of those times. Take a good look at what you want to achieve and see if there is another way to do it. If you're propagating a state etc, then you may be able to find a better model or do the for loop in larger steps without much loss in accuracy (ie instead of

Code:

for(int i = A; i <= B; i++)

you may find that you get the same accuracy with

Code:

for(int i = A; i <= B; i += 100)

Or if you need to call this method multiple times and you can re-use the input set, then cache the result and re-use.

Urwumpe · Nov 2, 2012

Well, I would recommend a "divide and conquer" strategy there. Since you are just interested in a chain of multiplications, you could as well try turning the chain of multiplications into a tree of multiplication pairs

ie:

2 * 3 * 4 * 5 * 6 * 7

Could also be solved as

2 * 3 = 6; 4 * 5 = 20; 6 * 7 = 42
6 * 20 = 120
120 * 42 = 5040

Such a calculation can also be calculated well on multiple cores or GPU shaders.

And on such a tree, you can also optimize things by having look-up tables of already calculated branches. Eg for the product of the numbers from 2 to 1024... and if you are thinking of binary trees, why not also input the number of the factorial as binary form and use the set bits of the binary big num (AKA bit set) as decision aid, which branches your calculation will have?

jthill · Nov 2, 2012

I'd go for partial precomputation on log-scale intervals, precompute (log scale 0) 1x...x10, 11x...x20 etc, and (log scale 1) 1x...x100 etc, and so forth

Then do the leading and trailing partial intervals at each log scale and look up the precomputed results for any included full intervals.

martins · Nov 2, 2012

Arrowstar said:
Well, I'm not sure how that would work yet. In my OP, the for loop comes out mathematically to be (for, say, A=2 and B=5):

p = p^4 * 2^4 * 3^3 * 4^2 * 5

But since p equals 1 initially, what I actually have is:

p = 2^4 * 3^3 * 4^2 * 5

If this is indeed what you are after (rather than the factorial calculated in the loop you showed), then I think the correct loop would be

Code:

p=1;
q=1;
for i=A:B
  q = q*i;
  p = p*q;
end

Quick_Nick · Nov 2, 2012

Can we just get some context? :lol:

Optimizing Computation of Factorials in Java

Arrowstar

Probenaut

Quick_Nick

Passed the Turing Test

martins

Orbiter Founder

Arrowstar

Probenaut

Quick_Nick

Passed the Turing Test

RisingFury

OBSP developer

martins

Orbiter Founder

Arrowstar

Probenaut

martins

Orbiter Founder

Quick_Nick

Passed the Turing Test

martins

Orbiter Founder

Quick_Nick

Passed the Turing Test

agentgonzo

Grounded since '09

Urwumpe

Not funny anymore

jthill

Member

martins

Orbiter Founder

Quick_Nick

Passed the Turing Test

Similar threads