It is known that symmetric Sinkhorn algorithm in fact minimizes KL divergence [2,3]. In 1, authors present a method to minimize the Euclidean distance. This is called BBS (Bregmanian Bi-Stochastication). It is reported that BBS algorithm using the Euclidian distance has noticeably better potential of producing good clustering results than the SK algorithm, while Sinkhorn performs better in terms of doubly stochasticity.
(1) Learning a Bi-Stochastic Data Similarity Matrix, Fei Lang, Ping Li, Arnd Christian Konig
(2) J. N. Darroch and D. Ratcliff. Generalized iterative scaling
for log-linear models. The Annals of Mathematical Statistics,
43(5):1470–1480, 1972.
(3) G. W. Soules. The rate of convergence of Sinkhorn balancing.
Linear Algebra and its Applications, 150:3 – 40, 1991.
Regarding the projection of an arbitrary (arguably non-positive matrix):
First, it is quite hard to project a matrix with negative entries. I would first project it onto the orthogonal matrices and then from there round it to the Birkhoff polytope. It is not a great procedure though, but a reasonable approach is given in :
(4) Approximating Orthogonal Matrices by Permutation Matrices, Alexander Barvinok, 2005
(5) Reparameterizing the Birkhoff Polytope for Variational Permutation Inference - See Doubly Stochastic Matrices in Stan.