I have just tried using LoRA on Llama 3 8B and I found without doing any fine tuning it performed pretty well on my dataset. But then I realized that surely the LoRA parameters are randomly initialized right? So if that's the case, shouldn't that mean the model outputs are initially detrimented by the LoRA parameters? Since they're just adding random values to the regular parameters?
I also have a somewhat related question if you don't mind answering it as well. I keep reading that the alpha parameter in LoRA is the scaling factor e.g. y = Wx + alpha * L1L2, but I often see alpha values of 256 for example which seems way to large because that would be setting a ratio of 1 : 256 for the influence share between the regular parameters and the LoRA parameters on the output.