I am trying to understand the top_p parameter in langchain (nucleus sampling) but I can't seem to grasp it.
Based on this we sort the probabilities and select a subset that exceeds p and concurrently has the fewer members possible.
For example for:
t1 =0.05
t2 = 0.5
t3 = 0.3
t4 = 0.15
and top_p=0.75 we would select t2 and t3, right?
If this is the case what happens if top_p=0.001?
We just need one token and any one of these is enough.
Do we select the biggest one (t2)? (based on my experience this makes sense, since i tested top_p=0.001 on an LLM and the output was coherent, so since we select only one token if it was a random token with probability >0.001 the output should be garbage).