About the characterizations of Special Soundness, from Staking Sigmas we have that:
''A $\Sigma$-protocol $\Pi=(A,Z,\phi)$ is said to have ${\it special\ soundness}$ if there exists a PPT extractor $\mathcal{E}$, such that given any two transcripts $(x,a,c,z)$ and $(x,a,c',z')$, where $c\ne c'$ and $\phi(x,a,c,z)=\phi(x,a,c',z')=1$, it holds that \begin{align*} \Pr[\mathcal{R}(x,w)=1|w\leftarrow\mathcal{E}(1^\lambda,x,a,c,z,c',z')]=1, \end{align*}''
which is essentially the same stated in On $\Sigma$-protocols with different notation:
''From any $x$ and any pair of accepting conversations on input $x$, $(a, e, z)$, $(a,e',z')$ where $e\ne e'$, one can efficiently compute $w$ such that $(x,w)\in R$. This is sometimes called the ${\it special\ soundness}$ property.''
So my question is, why is this a desirable property? shouldn't we want to avoid this? i.e. shouldn't it be infeasible for an extractor to retrieve the witness even if it produces two accepting transcripts?