So I got curious as to how fast things actually are. Apparently you can use $ openssl speed to see how fast certain operations
are. I am not sure, however, as to how this translates to actual
performance issues when connecting to an SSH server.
As a part of the Diffie-Hellman key exchange, the SSH server computes a
certain hash and then signs it with its private host key. The client then computes the same hash and verifies the server signature. So it's useful to look at how slow both of these operations are. The signing operation is perhaps more important
as servers tend to have much more SSH connections than clients.
The signing/verifying operation will typically involve another hashing operation; it is going to be a part of the host key verification algorithm. E.g. ssh-rsa is going to use sha1
and ecdsa-sha2-nistp521 is going to use sha512. I am not sure if the numbers below include hashing times. In either case, it
seems that for data > 16 bytes sha256 and sha512 perform comparably.
I tested this on three devices:
Thinkpad X220, Intel Core i5-2520M
sign verify sign/s verify/s
rsa 2048 bits 0.001350s 0.000048s 740.8 20913.2
rsa 3072 bits 0.006107s 0.000094s 163.7 10639.3
rsa 4096 bits 0.010134s 0.000158s 98.7 6316.4
rsa 7680 bits 0.089906s 0.000525s 11.1 1903.0
rsa 15360 bits 0.468636s 0.002004s 2.1 499.0
dsa 2048 bits 0.000600s 0.000519s 1667.5 1927.0
256 bits ecdsa (nistp256) 0.0000s 0.0001s 23594.7 7348.1
384 bits ecdsa (nistp384) 0.0016s 0.0011s 620.4 890.8
521 bits ecdsa (nistp521) 0.0005s 0.0009s 1866.4 1080.1
253 bits EdDSA (Ed25519) 0.0001s 0.0002s 15737.2 6078.1
Xiaomi Mi A2, Qualcomm SDM660 Snapdragon 660
sign verify sign/s verify/s
rsa 2048 bits 0.004257s 0.000111s 234.9 9030.1
rsa 3072 bits 0.012975s 0.000243s 77.1 4116.7
rsa 4096 bits 0.029138s 0.000425s 34.3 2353.9
rsa 7680 bits 0.220952s 0.001460s 4.5 684.9
rsa 15360 bits 1.362500s 0.005801s 0.7 172.4
dsa 2048 bits 0.001530s 0.001434s 653.8 697.3
256 bits ecdsa (nistp256) 0.0001s 0.0003s 12472.4 3907.9
384 bits ecdsa (nistp384) 0.0032s 0.0025s 311.4 396.6
521 bits ecdsa (nistp521) 0.0081s 0.0062s 123.1 161.9
253 bits EdDSA (Ed25519) 0.0002s 0.0004s 6284.5 2412.9
Raspberry Pi 3 Model B Rev 1.2, Cortex-A53
sign verify sign/s verify/s
rsa 2048 bits 0.011919s 0.000268s 83.9 3735.0
rsa 3072 bits 0.032787s 0.000550s 30.5 1819.5
rsa 4096 bits 0.069583s 0.000934s 14.4 1070.1
rsa 7680 bits 0.381111s 0.003097s 2.6 322.9
rsa 15360 bits 2.725000s 0.012002s 0.4 83.3
dsa 2048 bits 0.003586s 0.003021s 278.9 331.0
256 bits ecdsa (nistp256) 0.0004s 0.0013s 2249.3 743.5
384 bits ecdsa (nistp384) 0.0181s 0.0127s 55.1 78.5
521 bits ecdsa (nistp521) 0.0421s 0.0287s 23.7 34.8
253 bits EdDSA (Ed25519) 0.0005s 0.0012s 2156.7 800.2
Here are the charts, linear and logarithmic.
Y axis is sign/s (solid, circles) and verify/s (dotted, triangles).
Blue is the X220, Orange is Xiaomi, and Red is Raspberry Pi.

My take away here is that ECDSA nistp256 is much faster at signing than other ECDSA keys. On Raspberry pi, using nistp384 and 521
leads to a maximum of 55.1 or 23.7 signing operations per second—these look like bad defaults to me.
Ed25519 is somewhat slower than ECDSA, especially at signing, but not
by much.
Also, surprisingly, nistp521 performs better than nistp384 on Intel CPU.