First use Yuval's advice: With p = 2^32 - 5, remove all N ≥ p (the result is 0), then sort the rest, so now you have up to 1,000 numbers in sorted order from 0 to p-1. This is trivial to do by multiplying up to almost 2^32 numbers modulo p and recording results at the right moment; that's not much above a second worth of work but more than a second.
Let's say you have r processors, and you want to use them simultaneously. The largest number is t < p. You make each processor responsible for multiplying s = ceil (t / r) numbers. For example if r = 8 and t = 2^32 - 6, then s = 2^29. Processor 0 multiplies numbers from 1 to s, processor 1 multiplies numbers from s+1 to 2s and so on. Let's say one of your N's is 3s + 15900: Processor 3 multiplies 3s+1 to 4s. At some point you record the product of numbers 3s+1 to 3s+15,900. And when all the processors are done, you multiply with the final results of processors 0, 1 and 2.
You may have vector units being able to do for example 4 operations in parallel. In that case, you would multiply r by 4, and have 8 processors calculating 4 results simultaneously.
Say you have calculated x (x+1) modulo p, the product of two numbers. How can you get (x+2)(x+3) modulo p, the next product? (x+2)(x+3)-x(x+1) = 4x + 6, so the product modulo p changes by (4x + 6) modulo p. Let d(x) = (4x + 6) modulo p. As you increase x by 2, d(x) changes by 8 (modulo p), so the products x(x+1) modulo p, (x+2)(x+3) modulo p, (x+4)(x+5) modulo p etc. can be calculated very quickly. I think that should be enough to get you below one second.
Some additions: One, if p = $2^{32}-5$ is not prime, that makes the problem just faster to solve. If all prime factors are ≤ $p^{1/2}$ then the problem is trivial because N! = 0 (modulo p) if N ≥ $2^{17}$ (must be careful in case p is the square of a prime). Otherwise let p = a·q, where a ≥ 2 and q is a prime q ≥ $p^{1/2}$, then we can calculate N! modulo p easily from N! modulo a and N! modulo q. Since $a ≤ 2^{16}$ and N! modulo a = 0 if N ≥ a, calculating N! modulo a is trivial (only 65535 products needed). b would be at most half as large as p, making the problem at least twice as fast to solve. After applying this, we can assume p is prime.
Two, Wilson's Theorem lets you cut the time in half. We now assume p is prime (having cut down the time earlier if the original p wasn't). Wilson's Theorem says (p-1)! = -1 (modulo p). Now say we want to calculate (p-k)! modulo p. We have obviously (p-k)! · ((p-k+1)·(p-k+2)· ... · (p-1)) = (p-1)!. This product is equal to (p-k)! · (-(k-1))·(-(k-2))· ... · (-1)) = (p-k)! · (k-1)! · (-1)^(k-1). The result modulo p is -1, so (p-k)!·(k-1)! = (-1)^k modulo p. So if p/2 ≤ N < p, then we let k = p - N, calculate (k-1)! modulo p, and solve N! · (k-1)! = (-1)^k modulo p. The problem is now mostly reduced to the problem of calculating N! modulo p for about 1,000 values 0 ≤ N ≤ p/2 ≤ $2^{31}$.
Three, Yuval's idea using a table with pre-calculated values N! modulo p will work if the task is to solve the problem for fixed p and different sets of 1,000 values N. If there are 1,000 values N you need 1,000k precalculated values to save a factor k. It doesn't work if p is variable.
And if it is possible to use multiple processors or vector units to do k calculations in parallel, then a factor k in time can be saved. Without these savings, it's 2 billion products modulo p in 1 second.
One idea is factoring N! into a product of powers primes. There are less than 100 million primes < $2^{31}$. However, since we have 1,000 values $N < 2^{31}$, I think many primes will have to be multiplied multiple times. Doing this in one second is tough.