https://arxiv.org/abs/2310.17652 finds the smallest (non-additive/non-stabilizer) code with transversal T to be an $((11,2,3))$ code. Would be cool to see if there are more compact encodings for distance 2.
For stabiliser codes instead, https://arxiv.org/abs/1910.09333 introduces the notion of a CSS-T code, which is a code for which your first condition holds. However, in general the logical action can be identity or T, so condition 2 is not guaranteed.
https://arxiv.org/abs/2210.14066 arrives at the minimality by asking that each weight 1 Pauli corresponds to a unique syndrome combination in order to be correctable and hence the code being distance 3. The restriction for distance 2 is that every weight 1 Pauli triggers some syndromes, the combination of which need not be unique. I think this comes down to having an even -hence the minimal being 0- number of all-ones columns in Theorem 1 (and the number of rows being >= in Lemma 1). This means that at least 2 Z errors will have the same syndrome and hence the code will be distance 2. The rest should follow similarly to the paper.
In https://arxiv.org/abs/2303.15615 Table 2 (p23) they create codes with the required logical action and distance from toric codes, and they get the construction of the minimal ones for each 3d level gate: https://github.com/m-webster/CSSLO/blob/main/notebooks/04.2_code_search.ipynb
I think these have to be the smallest examples, since there can't be smaller kernels.
Interestingly, if you relax your condition 2 you get the $[[8,3,2]]$ code, while if you relax condition 1 to 'depth 1 and fault tolerant' instead of transversal, allowing physical $CCZ$ you get the $[[10,1,2]]$ morphed code from https://arxiv.org/abs/2112.01446 .