The derivative of a map from $ \mathbb R ^ m $ to $ \mathbb R ^ n $ is an $ n $-by-$ m $ matrix, but if the map goes between spaces of matrices, then yes, you need higher-rank tensors. It can be helpful to work entry by entry, or to use (which looks more or less the same) abstract index notation. But we can also stick to matrices if we work with differentials instead of derivatives, since this doesn't increase the rank. (Formally, you can think of the differential as the partial derivative with respect to some scalar component of $ x $ without specifying which one, although there are other ways to think of it.) Also to avoid having to use the rule for differentiating an inverse matrix (which as noted in the answer that @underflow linked to in a comment, is $ \mathrm d ( K ^ { - 1 } ) = - K ^ { - 1 } \, \mathrm d K \, K ^ { - 1 } $ instead of anything involving $ K ^ { - 2 } $), I'll rewrite your formula for $ f $ as the equation $ \big ( I - J ( x ) \big ) \, f ( x ) = \big ( I + J ( x ) \big ) \, x _ 0 $.
So $$ \eqalign { \mathrm d \Big ( \big ( I - J ( x ) \big ) \, f ( x ) \Big ) & = \mathrm d \Big ( \big ( I + J ( x ) \big ) \, x _ 0 \Big ) \\ \mathrm d \big ( I - J ( x ) \big ) \, f ( x ) + \big ( I - J ( x ) \big ) \, \mathrm d \big ( f ( x ) \big ) & = \mathrm d \big ( I + J ( x ) \big ) \, x _ 0 + \big ( I + J ( x ) \big ) \, \mathrm d ( x _ 0 ) \\ \Big ( \mathrm d ( I ) - \mathrm d \big ( J ( x ) \big ) \Big ) \, f ( x ) + \big ( I - J ( x ) \big ) \, \mathrm d \big ( f ( x ) \big ) & = \Big ( \mathrm d ( I ) + \mathrm d \big ( J ( x ) \big ) \Big ) \, x _ 0 + \big ( I + J ( x ) \big ) \, 0 \\ \Big ( 0 - \mathrm d \big ( J ( x ) \big ) \Big ) \, f ( x ) + \big ( I - J ( x ) \big ) \, \mathrm d \big ( f ( x ) \big ) & = \Big ( 0 + \mathrm d \big ( J ( x ) \big ) \Big ) \, x _ 0 \\ \big ( I - J ( x ) \big ) \, \mathrm d \big ( f ( x ) \big ) & = \mathrm d \big ( J ( x ) \big ) \, f ( x ) + \mathrm d \big ( J ( x ) \big ) \, x _ 0 \\ \mathrm d \big ( f ( x ) \big ) & = \big ( I - J ( x ) \big ) ^ { - 1 } \, \mathrm d \big ( J ( x ) \big ) \, \big ( { f ( x ) + x _ 0 } \big ) \text . } $$ You could expand $ f ( x ) $ on the right-hand side into its definition, but it's easier to read if you don't.
If you want the partial derivative of $ f ( x ) $ with respect to the $ i $th component of $ x $, then replace $ \mathrm d $ above with $ \partial _ i $, which is short for $ \partial / \partial x _ i $ (or $ \partial / \partial x ^ i $ to distinguish upper and lower indices). If we write $ J ^ i _ j $ for the entry in row $ i $ and column $ j $ of the matrix $ J ( x ) $, $ K ^ i _ j $ for the corresponding entry of $ \big ( I - J ( x ) \big ) ^ { - 1 } $, $ f ^ i $ for the $ i $-th entry of $ f ( x ) $, and $ { x _ 0 } ^ i $ for the $ i $-th entry of $ x _ 0 $, then we get $$ \partial _ i f ^ j = \sum _ { k = 1 } ^ n \, \sum _ { l = 1 } ^ n \, K ^ j _ k \, \partial _ i J ^ k _ l \, ( f ^ l + { x _ 0 } ^ l ) \text , $$ which you can abbreviate as $ \partial _ i f ^ j = K ^ j _ k \, \partial _ i J ^ k _ l \, ( f ^ l + { x _ 0 } ^ l ) $ using the Einstein summation convention; or you can interpret this expression as abstract index notation. Note that $ \partial _ i J ^ k _ l $ is (a component of) a rank-$ 3 $ tensor as you suspected, with contravariant rank $ 1 $ and covariant rank $ 2 $. If you write things like this, then there's no direct indication what $ K $ means, so you have to keep track of that $ ( \delta ^ i _ j - J ^ i _ j ) \, K ^ j _ k = K ^ i _ j \, ( \delta ^ j _ k - J ^ j _ k ) = \delta ^ i _ k $, where $ \delta $ is the Kronecker delta (the components of the identity matrix, or the identity matrix itself in abstract index notation). Similarly, $ f ^ i = K ^ i _ j \, ( \delta ^ j _ k + J ^ j _ k ) \, { x _ 0 } ^ k $. You could also start with this and do the whole derivation in this notation.
Now, I noticed something while checking my work, that I probably wouldn't have thought of otherwise, which is that we can do something with that expression $ f ( x ) + x _ 0 $ (or $ f ^ l + { x _ 0 } ^ l $) that appears in the answer. If you expand $ f ( x ) $ as $ \big ( I - J ( x ) \big ) ^ { - 1 } \, \big ( I + J ( x ) \big ) \, x _ 0 $ and also write $ x _ 0 $ as $ \big ( I - J ( x ) \big ) ^ { - 1 } \, \big ( I - J ( x ) \big ) \, x _ 0 $ (this is the non-obvious part), then $ f ( x ) + x _ 0 $ factors as $ \big ( I - J ( x ) \big ) ^ { - 1 } \, \Big ( \big ( I + J ( x ) \big ) + \big ( I - J ( x ) \big ) \Big ) \, x _ 0 $, which simplifies to $ 2 \, \big ( I - J ( x ) \big ) ^ { - 1 } \, x _ 0 $. So we get $$ \mathrm d \big ( f ( x ) \big ) = 2 \, \big ( I - J ( x ) \big ) ^ { - 1 } \, \mathrm d \big ( J ( x ) \big ) \, \big ( I - J ( x ) \big ) ^ { - 1 } \, x _ 0 \text , $$ or $$ \partial _ i f ^ j = 2 \, K ^ j _ k \, \partial _ i J ^ k _ l \, K ^ l _ m \, { x _ 0 } ^ m \text . $$ This might be nicer to work with. (And it looks a lot more like your guess, although I don't know how you got that.)