I'm following the formula on http://www.shannonentropy.netmark.pl/calculate to calculate the Shannon Entropy of a string of nucleotides [nt]. Since their are 4 nt, I assigned them each with equal probability P(nt) = 0.25. The equation I'm using is -sum([Pr(x)*log2(Pr(x)) for all x in X]) #X is the DNA sequence (e.g. ATCG).
So my question is this: In Shannon Entropy, MUST the probability be based solely on the sequence itself or can the probabilities be predetermined (i.e. nt_set = {A, T, C, G} and each P(nt) = 0.25)
If I used predetermined probabilities, would that still be entropy and if not, what would I be calculating?