Back in the Windows XP era, when setting up Windows OS-built-in speech/dictation, I had to speak out a bunch of programmed-in text samples to the speech-to-text engine to personalize my voice profile.
Today, with networked speech-to-text engines like Siri or Cortana, I can just start dictating.
The quality of the text-to-speech conversion seems equivalent, though my memory may be faulty on that aspect.
Have speech models advanced past the need for any personalization of the training data? Or, do they just do the personalization under the covers now, without an explicit training wizard? Or, do they not do training, even though it would still be beneficial (e.g. because it's inconvenient)?