As other answers have pointed out, most PRNGs using floating-point computation and floating-point storage for the PRNG state simply use the floating-point unit to perform integer arithmetic, using the significand part of IEEE-754 floating-point types to store the integers. One (historical) motivation for this is that hardware may support wider floating-point types than integer types.
One PRNG of this kind that has good quality and is still in wide-spread use today is MRG32k3a, which uses a sextuple of double variables for state. It first appeared in the literature in the late 1990s:
P. L'Ecuyer, "Good parameters and implementations for combined multiple recursive random number generators." Operations Research, Vol. 47, No. 1, Jan.-Feb. 1999, pp. 159-164
#define norm 2.328306549295728e-10
#define m1 4294967087.0
#define m2 4294944443.0
#define a12 1403580.0
#define a13n 810728.0
#define a21 527612.0
#define a23n 1370589.0
double s10, s11, s12, s20, s21, s22;
double MRG32k3a ()
{
long k;
double p1, p2;
/* Component 1 /
p1 = a12 s11 - a13n * s10;
k = p1 / m1; p1 -= k * m1; if (p1 < 0.0) p1 += m1;
s10 = s11; s11 = s12; s12 = p1;
/* Component 2 /
p2 = a21 s22 - a23n * s20;
k = p2 / m2; p2 -= k * m2; if (p2 < 0.0) p2 += m2;
s20 = s21; s21 = s22; s22 = p2;
/* Combination /
if (p1 <= p2) return ((p1 - p2 + m1) norm);
else return ((p1 - p2) * norm);
}
The use of long k was motivated by the absence of a way to round a floating-point-number to an integer using round-towards-0, i.e. truncation. However, since ISO-C99 there is the trunc() function which can be used for this purpose and maps to a single instruction on many modern processor architectures. This results in a pure floating-point computation, and modern compilers like Clang are able to use 2-way SIMD vectorization to parallelize the computation of p1 and p2.
double MRG32k3a (void)
{
double k, p1, p2;
/* Component 1 */
p1 = a12 * s11 - a13n * s10;
k = trunc (p1 / m1); p1 -= k * m1; if (p1 < 0.0) p1 += m1;
s10 = s11; s11 = s12; s12 = p1;
/* Component 2 */
p2 = a21 * s22 - a23n * s20;
k = trunc (p2 / m2); p2 -= k * m2; if (p2 < 0.0) p2 += m2;
s20 = s21; s21 = s22; s22 = p2;
/* Combination */
if (p1 <= p2) return ((p1 - p2 + m1) * norm);
else return ((p1 - p2) * norm);
}
Floating-point division is often fast on the latest (post 2020) processor implementations. Given that the divisors here are constant, it may nonetheless be worthwhile to try and replace the division by constant with FMA-based computation utilizing a pre-computed reciprocal. FMA is a fused multiply-add operation that computes $a b +c$ that uses the full product $ab$ during the addition and applies a single rounding at the end. It was first introduced by IBM in the early 1990s:
Erdem Hokenek, Robert K. Montoye, and Peter W. Cook. "Second-generation RISC floating point with multiply-add fused." IEEE Journal of Solid-State Circuits, Vol. 25, No. 5, October 1990, pp. 1207-1213
At this point in time the major processor architectures all support FMA in hardware, and it is conveniently accessible via a standard function fma() in C, C++, and various other programming languages. Using FMA, we can create a bit-accurate replacement for division by constant for many integer constants:
/* compute quotient q=x/y for finite x, y; rcp_y = correctly rounded 1.0/y */
double qdiv (double x, double y, double rcp_y)
{
double q;
q = x * rcp_y;
if (x != 0) q = fma (fma (-y, q, x), rcp_y, q);
return q;
}
This results in the following implementation, which can likewise be SIMD vectorized by compilers like Clang:
#define rcp_m1 (1.0/m1)
#define rcp_m2 (1.0/m2)
double MRG32k3a (void)
{
double k, p1, p2;
/* Component 1 */
p1 = a12 * s11 - a13n * s10;
k = trunc (qdiv (p1, m1, rcp_m1)); p1 -= k * m1; if (p1 < 0.0) p1 += m1;
s10 = s11; s11 = s12; s12 = p1;
/* Component 2 */
p2 = a21 * s22 - a23n * s20;
k = trunc (qdiv (p2, m2, rcp_m2)); p2 -= k * m2; if (p2 < 0.0) p2 += m2;
s20 = s21; s21 = s22; s22 = p2;
/* Combination */
if (p1 <= p2) return ((p1 - p2 + m1) * norm);
else return ((p1 - p2) * norm);
}
It should be noted that MRG32k3a can be efficiently implemented via pure integer computation on modern 64-bit processors. Even on 32-bit processors that provide an integer multiplier that can deliver the most significant 32 bits of the full product of two 32-bit integers an integer-based implementation can be reasonably efficient.
Part of programmer folklore in the world of 3D graphics is the following PRNG that uses only float computation and a two-vector of float variables for state storage.
float rand(vec2 n) {
return fract(sin(dot(n.xy ,vec2(12.9898,78.233))) * 43758.5453);
}
This generates random numbers in $(-1,1)$ via fract(), which extracts the fractional part of a float. With the simple application of the absolute value, this could be used to generate random numbers in $[0, 1)$. For performance, this PRNG relies heavily on the availability of a blazingly fast sin() approximation, which is a standard feature of GPUs.
I have not used this PRNG and know nothing about its origin, design criteria, and properties. The small state and the non-linearity of the sine function suggests that it has a short period and provides a distribution that is not entirely uniform. Also, granularity of the generated numbers is limited as the significand of a float mapped to IEEE-754 binary32 format comprises only 23 stored bits. There is a Q&A for this PRNG on Stackoverflow which may be useful for followup.