Inspired in antkam's answer, here's another idea to investigate.
Let's pick some binary error-correcting code $(n,k)$, not necessarily linear, with not too small $n$.
Proposal 1: pick $2^k$ random tuples as codewords, with $n/k \approx 4.5 $. For example, $n=41$, $k=9$.
Proposal 2: pick some BCH code with $ k \approx t$. For example, let us take a BCH $(255,45)$ code, which has $t=43$.
The strategy is: the sequence is divided in blocks of length $n$. In each block, we mark the $m$ 'miss bits' (those which were not correctly guessed). It $m\ge k$ we label the last $k$ of them as 'information bits'; if $m<k$ we label additional $k-m$ hit bits (the last ones) as information bits.
$A$ looks ahead, finds the codeword that is nearest (Hamming distance) from the next block, and uses the $k$ information bits in this block to code it. The remaining bits are copied from $C$.
$B$ simply picks that codeword (and, after knowing the results, deduce the code for the next block).
The analysis seems easier with the random code (proposal $1$), though probably the BCH code (or something similar) would perform better.
The Hamming distance between the codeword and the $C$ block will correspond to the minimum of $2^k$ $Binom(n,1/2)$. This concentrates around
$$ t^*= \frac{n}{2} - \sqrt{n k \log(2) /2} \tag 1$$
with $ t^* \approx k \iff n/k \approx 4.5$. Granted this, in each block we'll have $m \approx k$ , i.e., we'll have approximately as many missed bits as information bits are needed (which is what we want). If that is so, we'd attain a score of $1-k/n \approx 0.777$.
For the case of the BCH code, I suggested taking $t\approx k$, in the hope that the distance from a random tuple to a codeword would concentrate at (or less than) the value $t$. But this needs more elaboration (or at least some simulation).
Update: some simulations partially support the above (a little too optimistic) conjecture, though $n/k \approx 4$ seem to perform better. A random code with $n=57,k=14$ attains a hit rate $r=0.753$. For smaller sizes, a punctured/truncated BCH code performs a little better; for example: $n=23,k=6$ ($BCH(31,6)$ punctured) gives
$r=0.740$ ; random: $0.731$). It seems that random codes perform roughly the same (or better!) than BCH codes for large sizes.
Some Octave/Matlab code:
NC = 45; KC=11; % (n,k) code parameters
N = 1000; % total tentative number of coins
NB = floor(N/NC+1/2); % Number of blocks in message
N = NB * NC; % total number of coins adjusted
NT = 100 ; % number of independent tries
mindist = zeros(1,3*KC); % distribution of minimal distances
for t = 1:NT
CW=randint(2^KC,NC); % codewords
%% For BCH, comment the previous line and uncomment the following two
%NCNP =63; KCNP =16; % BCH (n,k) nonpunctured parameters (greater or equal than NC KP)
%CW=bchenco(dec2bin(0:2^KCNP - 1) - '0',NCNP,KCNP)(1:2^KC,1:NC); % 2^KC codewords
C = randint(NB,NC);
for b = 1:NB
% nearest codeword index in nci, distance in ncd
[ncd,nci]= min(sum(mod(bsxfun(@plus,C(b,:),CW),2) , 2)) ;
mindist(ncd+1)++;
endfor
endfor
mindist /= sum(mindist);
hitrate=1-((0:size(mindist,2)-1)+max((KC-(0:size(mindist,2)-1))*1/2,0))*mindist' / NC
Edit: fixed the hitrate calculation (a little up) : when A has to use "good" bits ($m<k$) to send the message, the probability of coincidence for those bits is $1/2$ (not $1/4$ as I initially assumed).
Added: These values seem consistent with the bound I conjectured in a comment, thus:
The goal of $A$ is to use the "missed rounds" (those not guessed by both) to pass information to $B$ about the other coins. Let $p$ be the miss probability. Then, $A$ would like to pass to $B$ an average of $p$ bits of information for each round: $I(A;B)=p$ bits. Applying Fano inequality, we get the critical value:
$$ h(p) = H(B|A) = H(B) - I(A;B)= 1 - p \tag 2$$
with $h(p)=- p \log_2(p)- (1-p) \log_2(1-p)$. The root occurs at
$p =0.2271\cdots$, which corresponds to a hit rate around $0.773$.
Added (2019-03-23): In this answer I show that the distribution of the minimum of $k=2^{\beta n}$ Binomials $(n,1/2)$ asymptotically concentrates around the root of $h(d/n)=1 - \beta$. This proves that the random coding strategy is asympotically optimal, attaining the bound given by Fano inequality above.