Why quaternion slerp? A precise derivation.

2024-05

Hypercomplex numbers have their uses in algebra, but for the purpose of spatial rotations it is perhaps helpful to downweigh the emphasis a little bit and focus instead on Rodrigues' original perspective, which is purely geometric and arguably more inventable. It turns out that historically Rodrigues first developed the use of a 3- or 4-number tuple as a rational representation of rotations without trigonometric functions. The concept of a matrix was not yet prevalent, but his proposed formulae in effect 1) converts the 3/4 numbers to a rotation matrix, and 2) composes successive rotations into a new 3/4-number tuple.

1. 3D skew-symmetric matrix

X\mX is skew symmetric if XT=X\mX^T = -\mX. The common way to parameterize such a matrix in R3×3\R^{3 \times 3} is with a vector u=[x,y,z]T\vu = [x, y, z]^T s.t.
[u^]=[0zyz0xyx0].\begin{align} \skewm{\vu} = \begin{bmatrix*}[r] 0 & -z & y \\ z & 0 & -x \\ -y & x & 0 \\ \end{bmatrix*}.\end{align}
The variable placement and the signs are very deliberate. [u^]\skewm{\vu} is rank-2 with null-space being u\vu itself.
Useful identities of a skew-symmetric matrix:
  • Cross product matrix: [x^]y=x×y[x^]y=[y^]x[x^]x=0\skewm{\vx} \vy = \vx \times \vy \qquad \skewm{\vx} \vy = -\skewm{\vy} \vx \qquad \skewm{\vx} \vx = \bm{0}.
  • [u^]2=uuT(uTu)I\skewm{\vu}^2 = \vu \vu^T - (\vu^T\vu) \mI . tr([u^]2)=2uTu\tr{\skewm{\vu}^2} = -2\vu^T\vu. If u\vu has norm 1, [u^]2=uuTI \skewm{\vu}^2 = \vu \vu^T - \mI.

2. Matrix Exponential

Define matrix exponential of a square matrix XRn×n\mX \in \R^{n \times n} to be the series
exp(X):=k=0Xkk!.\begin{align} \exp{(\mX)} := \sum_{k=0}^\infty \frac{\mX^k}{k!}.\end{align}
Fact 1: The series always converges [proof]. exp(A)exp(B)exp(B)exp(A)\exp{(\mA)}\exp{(\mB)} \neq \exp{(\mB)}\exp{(\mA)} in general. exp(X)\exp{(\mX)} results in an invertible matrix [proof]: exp(X)exp(X)=exp(X)exp(X)=I\exp{(-\mX)} \exp{(\mX)} = \exp{(\mX)} \exp{(-\mX)} = \mI.
Fact 2: Matrix exponential is generally not a 1-to-1 mapping (it is in 1D). It's generally many-to-1 [example]. See the detailed case breakdown for R3×3\R^{3 \times 3} in the axis-angle mapping section below.
Fact 3: If X\mX is skew-symmetric, then exp(X)\exp{(\mX)} is an orthogonal matrix, i.e. its transpose is its inverse: exp(X)T=exp(XT)=exp(X)=exp(X)1\exp{(\mX)}^T = \exp{(\mX^T)} = \exp{(-\mX)} = \exp{(\mX)}^{-1}. Furthermore, its determinant det(exp(X))=exptr(X)=exp(0)=1\text{det}(\exp(\mX)) = \exp{\tr{\mX}} = \exp(0) =1, by Jacobi's identity [proof]. Thus, exp(X)\exp(\mX) produces special orthogonal matrices SO(n)SO(n).
We also claim that every special orthogonal matrix R\mR can be generated by some skew-symmetric matrix X\mX. The mapping is surjective [missing proof].
Fact 4: If (λ,v)(\lambda,\vv) is an eigen val/vec pair of X\mX, then exp(X)v=k=0λkk!v\exp(\mX) \vv = \sum_{k=0}^\infty \frac{\lambda^k}{k!} \vv . (eλ,v)(e^\lambda, \vv) is an eigen pair of exp(X)\exp(\mX).
Fact 5: exp(BAB1)=I+(BAB1)22!+=BIB1+BA22!B1+=Bexp(A)B1\exp{(\mB \mA \mB^{-1})} \mycolor{gray}{ = \mI + \frac{(\mB \mA \mB^{-1})^2}{2!} + \dots = \mB \mI \mB^{-1} + \mB\frac{\mA^2}{2!}\mB^{-1} + \dots }= \mB \exp{(\mA)} \mB^{-1} .
Fact 6: texp(tA)=t(I+tA+t22!A2+)=A+tA2+t22!A3+=Aexp(tA)=exp(tA)A\partial_t \exp({t\mA}) \mycolor{gray}{ = \partial_t(\mI + t\mA + \frac{t^2}{2!}\mA^2 + \dots ) = \mA + t\mA^2 + \frac{t^2}{2!} \mA^3 + \dots }= \mA \exp{(t\mA)} = \exp{(t\mA)} \mA. Let y(t)=exp(tA)x\vy(t) = \exp({t\mA})\vx . Then y˙(t)=Aexp(tA)x=Ay(t) \dot \vy(t) \mycolor{gray}{ = \mA \exp{(t\mA)} \vx } = \mA \vy(t) .
We are therefore able to come up with guesses when encountering ODEs of the form
B˙=AB or BA  B(t)=exp(tA)y˙=Ay  y(t)=exp(tA)xfor some initial x.\begin{align} \dot \mB = \mA \mB \text{ or } \mB \mA &~\Longrightarrow~ \mB(t) = \exp(t\mA) \nonumber \\ \dot \vy = \mA \vy &~\Longrightarrow~ \vy(t) = \exp({t\mA}) \vx \quad \text{for some initial $\vx$}.\end{align}
These are less useful for rotations, but central in modeling probability distributions that appear in the context of markov chains.

3. Axis-Angle Mapping

[4-number, constrained parameterization In 3D, given (uR3,θR)(\vu \in \R^3, \theta \in \R), with u=1\|\vu\| = 1 and θ\theta without range restriction, [u^]\skewm{\vu} is skew-symmetric, and thus by fact 3
exp(θ[u^])=kθkk![u^]k\begin{align} \exp{(\theta \skewm{\vu})} = \sum_k \frac{\theta^k}{k!}\skewm{\vu}^k\end{align}
is a spatial rotation matrix in SO(3)SO(3).
Using the constraint u=1,[u^]2=uuTI\| \vu \| = 1, \skewm{\vu}^2 = \vu \vu^T - \mI,
[u^]0=I[u^]1=[u^][u^]2=uuTI[u^]3=[u^][u^]4=[u^]2\begin{align} \skewm{\vu}^0 = \mI \quad \skewm{\vu}^1 = \skewm{\vu} \quad \skewm{\vu}^2 = \vu \vu^T - \mI \quad \skewm{\vu}^3 = -\skewm{\vu} \quad \skewm{\vu}^4 = - \skewm{\vu}^2 \dots\end{align}
Therefore
exp(θ[u^])=I+[u^](θθ33!+θ55!)+[u^]2(θ22!θ44!+θ66!)=I+sinθ[u^]+(1cosθ)[u^]2Rodrigues formulaeqnlabel-eq:rod\begin{align} \exp{(\theta \skewm{\vu})} &= \mI + \skewm{\vu}(\theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \dots) + \skewm{\vu}^2(\frac{\theta^2}{2!} - \frac{\theta^4}{4!} + \frac{\theta^6}{6!} - \dots) \nonumber \\ &= \boxed{ \mI + \sin\theta \skewm{\vu} + (1 - \cos\theta) \skewm{\vu}^2 \quad \text{Rodrigues formula} } \label{eq:rod}\end{align}
produces a rotation matrix. Futhermore, the action
exp(θ[u^])(su)=sIu+ssinθ[u^]u+s(1cosθ)[u^]2u=su\begin{align} \exp{(\theta \skewm{\vu})} (s\vu) = s\mI \vu + s \cdot \sin\theta \cancel{\skewm{\vu} \vu} + s\cdot (1 - \cos\theta) \cancel{\skewm{\vu}^2 \vu} = s\vu\end{align}
preserves vectors along axis u\vu. For a unit-length vector x\vx orthogonal to u\vu, [u^]x=u×x\skewm{\vu}\vx = \vu \times \vx provides the 3rd orthogonal vector y\vy. [u,x,y][\vu, \vx, \vy] forms a right-hand oriented basis.
exp(θ[u^])(tx)=I(tx)+sinθt(u×x)y+(1cosθ)t(u×(u×x))x\begin{align} \exp{(\theta \skewm{\vu})} \, (t\vx) = \mI (t\vx) + \sin\theta \cdot t \cdot \underbrace{(\vu \times \vx)}_{\vy} + (1 - \cos\theta) \cdot t \cdot \underbrace{ (\vu \times (\vu \times \vx)) }_{-\vx}\end{align}
rotates txt\vx on the x,y\vx , \vy plane, which is orthogonal and right-handed to u\vu, from x\vx to y\vy by angle θ\theta. The same conclusion can be reached by a spectral analysis on the complex eigenvalues. Action on any vector can be decomposed into action on some su+txs\vu + t \vx.
Therefore it is appropriate to name (θ,u)(\theta, \vu) the axis-angle parameterization of a rotation matrix.
This mapping is not injective. To produce the same rotation matrix, for some integer kk, one could have
  1. (θ,u)(\theta, \vu) and (θ+k2π,u)(\hphantom{-} \theta + k \cdot 2 \pi, \hphantom{-} \vu), e.g. k=1,((2πθ),u)k=1, (-(2\pi - \theta), \vu) rotates through the opposite turn.
  2. (θ,u)(\theta, \vu) and (θ+k2π,u)(-\theta + k \cdot 2 \pi , -\vu), e.g. k=1,(2πθ,u)k=1, (2\pi - \theta, -\vu) keeps the rotation angle under π\pi.
  3. At θ=k2π\theta = k \cdot 2\pi, any u\vu maps to identity matrix.
See the cases with quaternion for contrast.
[3-number, unconstrained parameterization Given an unnormalized hR3\vh \in \R^3,
exp([h^])=k[h^]kk!=exp(h[h^]h)=I+sinhh[h^]+(1cosh)h2[h^]2if h>0eqnlabel-eq:rod_h.\begin{align} \exp{(\skewm{\vh})} &= \sum_k \frac{\skewm{\vh}^k}{k!} \nonumber \\ &= \exp \left( \|\vh\| \frac{ \skewm{\vh} }{\|\vh\|} \right) = \mI + \frac{ \sin \|\vh\|}{\|\vh\|} \skewm{\vh} + \frac{(1 - \cos \|\vh\|)}{\|\vh\|^2} \skewm{\vh}^2 \quad \text{if}~ \| \vh \| > 0 \label{eq:rod\_h} .\end{align}
With unconstrained h\vh, its magnitude is small for rotation close to I\mI, and it is numerically stabler to use
sinxx=1x26+x4120+O(x6),1cosxx2=12x224+x4720+O(x6),\begin{align*} \frac{\sin x}{x} = 1 - \frac{x^2}{6} + \frac{x^4}{120} + O(x^6), \quad \frac{1 - \cos x}{x^2} = \frac{1}{2} - \frac{x^2}{24} + \frac{x^4}{720} + O(x^6),\end{align*}
since numerical software is unaware of the interaction bewtween sinx\sin x and 1/x1/x when the two parts are evaluated separately. To look up series like these, use wolfram alpha.
Some authors do not make a distinction between eq. 6 and 9. eq. 6 cover many common use cases; axis and angle are often specified separately. A major advantage of eq. 9 is that it allows unconstrained optimization of rotations through h\vh. Optimization using (θ,u)(\theta, \vu) parameterization in eq. 6 needs extra care to maintain the norm 1 constraint on u\vu.

3.1. Log Map: Matrix \rightarrow Axis-Angle, and inversion in general

Every rotation matrix can be produced by some axis-angle (θ,u)(\theta, \vu) s.t. θ[0,π]\theta \in [0, \pi], because if θ>π\theta > \pi, (2πθ,u)(2\pi - \theta, -\vu) produces the same rotation.
Thus knowing cosθ\cos \theta (or tanθ\tan \theta) is sufficient to determine rotation angle θ\theta, since arccos\arccos (or arctan\arctan) is unambiguous for [0,π][0, \pi]. On the contrary sinθ\sin \theta alone is insufficient to determine θ\theta. It is no coincidence that both rotation matrix and quaternion provide unambiguous ways to get cosθ\cos \theta (for quat, cosθ=2w21\cos \theta = 2 w^2 - 1), and sinθ=±1cos2θ\sin \theta = \pm \sqrt{1 - \cos^2 \theta} is only decidable up to sign, with ++ chosen to have θ\theta in range [0,π][0, \pi].
In the case of rotation matrix,
  1. tr(R)=3+0+(1cosθ)(2)=1+2cosθcosθ=tr(R)12,sinθ=+1cos2θ \tr{\mR} = 3 + 0 + (1 - \cos\theta)\cdot(-2) = 1 + 2\cos\theta \Longrightarrow \cos\theta = \frac{\tr{\mR} - 1}{2}, \sin \theta = + \sqrt{1 - \cos^2\theta} .
  2. RRT=2[u^]sinθ\mR - \mR^T = 2 \skewm{\vu} \sin \theta. u=(RRT)2sinθ\vu = \frac{(\mR - \mR^T)^{\vee}}{2\sin\theta} if sinθ0\sin \theta \neq 0. By deciding the sign of sinθ\sin \theta to be ++, we have also locked the sign of u\vu. If tr(R)=1,cosθ=1,θ=π\tr{\mR}=-1,\cos\theta=-1, \theta = \pi, R\mR is symmetric, and sinθ=0\sin\theta = 0; we obtain pairwise products of entries of u\vu by R=I+2[u^]2uuT=R+I2\mR = \mI + 2\skewm{\vu}^2 \Rightarrow \vu \vu^T = \frac{\mR + \mI}{2}. With sinθ=0\sin\theta=0, the sign of u\vu is free to flip. Set one of the non-zero element to positive, and get the rest from pairwise products.
  3. R+RT=2(I+(1cosθ(tr(R)1)/2))[u^]2)uuT=R+RT+(1tr(R))I3tr(R)\mR + \mR^T = 2(\mI + (1 - \underbrace{\cos \theta}_{ (\tr{\mR} - 1)/2) } ) \skewm{\vu}^2 ) \Longrightarrow \vu \vu^T = \frac{\mR + \mR^T + (1 - \tr{\mR})\mI}{3 - \tr{\mR}} if tr(R)3\tr{\mR} \neq 3. Rotation axis at I\mI is undefined. This identity provides pairwise products of u\vu at all θ>0\theta > 0, and we can compute u\vu up to sign flip as described above. But make sure that the sign of u\vu is compatible with the +ve sign of sinθ\sin \theta as their products are locked by [u^]sinθ=RRT2\skewm{\vu} \sin \theta = \frac{\mR - \mR^T}{2}. The need for the compatibility check makes this route not that useful vs item 2.
The alternative is rotation matrix \rightarrow quaternion \rightarrow axis-angle. The baseline approach in rotation matrix \rightarrow quaternion is similar to what is outlined in this section.

4. A Historical Note on Quaternions

We discuss an easier and more historically faithful way to motivate the use of quaternion for rotations.
Quaternions are often introduced as hypercomplex numbers. The problem with this presentation is that it is difficult to intuit a priori why conjugation R(q)x=qxq\mR(\vq) \vx = \vq \vx \vq^* should perform rotation acting on x\vx while product q3=q2q1\vq_3 = \vq_2 \vq_1 implements rotation composition. The formulae can be algebraically verified posthoc, but their origins are puzzling. Why not use x=qx\vx' = \vq \vx to rotate vectors? Hamilton himself seemed to have struggled with this exact question, even after Cayley's article came to his attention.
It turns out that historically Rodrigues first developed the use of a 3- or 4-number tuple as a rational representation of rotations without trigonometric functions. The concept of a matrix was not yet prevalent, but his proposed formulae in effect 1) converts the 3/4 numbers to a rotation matrix, and 2) composes successive rotations into a new 3/4-number tuple.
Arthur Cayley studied [#6, 1843] Rodrigues' results, and realized [#20, 1845] that quaternions can be manipulated to reproduce Rodrigues' formulae. His background in Rodrigues' prior work made the discovery much more plausible, and he went on to extend quaternions in several directions, one of which is the representation of 4D rotations [#137].
Hypercomplex numbers have their uses in algebra, but for the purpose of spatial rotations it is perhaps helpful to downweigh the emphasis a little bit and focus instead on Rodrigues' original perspective, which is purely geometric and arguably more inventable.
We do not dispute the importance of vanilla complex numbers a+bia + bi in understanding rotations. The role of the eigenvalues of a 3×33 \times 3 rotation matrix are easy to describe. Hypercomplex numbers are just more unwieldy to mentally visualize.

5. Absorbing ambiguities with intermediary: a simple 5D representation

The spirit is to capture intermediate variables in Rodrigues mapping of eq. 6. Also, we want to avoid invoking sin,cos\sin, \cos so as to make derivative operations more numerically stable?
One source of ambiguity in axis-angle is the angle. We attempt to over-parameterize it in order to reduce the ambiguities.
We could over-parameterize the angle θ\theta to absorb its ambiguities.
Let (α,β):=(sinθ,cosθ)(\alpha, \beta) := (\sin \theta, \cos \theta) .
Let (x,y):=(cosθ,sinθ)(x, y) := (\cos \theta, \sin \theta) .
R(x,y,u)=I+y[u^]+(1x)[u^]2.\begin{align} \mR(x, y, \vu) = \mI + y \skewm{\vu} + (1 - x) \skewm{\vu}^2 .\end{align}
Unit norm constraint α2+β2=1\alpha^2 + \beta^2 = 1.
Now the forward mapping becomes
R(α,β,u)=I+α[u^]+(1β)[u^]2.\begin{align} \mR(\alpha, \beta, \vu) = \mI + \alpha \skewm{\vu} + (1 - \beta) \skewm{\vu}^2 .\end{align}
The ambiguities have been reduced. To map to the same rotation matrix,
  1. (α,β,u)(\alpha, \beta, \vu) and (α,β,u)(-\alpha, \beta, -\vu). This keeps the rotation angle under π\pi (it's not flipped axis-angle).
  2. When (α,β)=(0,1)(\alpha, \beta) = (0, 1), any u\vu maps to identity matrix.
One way to motivate quaternion is that it resolves case 2 by merging sine and axis u\vu.

6. Quaternion, a 4D representation

Using the fact that sinθ=2sinθ2cosθ2\sin \theta = 2 \sin\frac{\theta}{2} \cos\frac{\theta}{2} and cosθ=2cos2θ21\cos \theta = 2 \cos^2 \frac{\theta}{2} - 1 [derive with Euler's identity]
exp(θ[u^])=I+2cosθ2sinθ2[u^]+(12cos2θ2+1)[u^]2=I+2cosθ2sinθ2[u^]+2(sinθ2[u^])2.\begin{align} \exp(\theta \skewm{\vu} ) &= \mI + 2 \cos\frac{\theta}{2} \sin\frac{\theta}{2} \skewm{\vu} + (1 - 2\cos^2 \frac{\theta}{2} + 1) \skewm{\vu}^2 \nonumber \\ &= \mI + 2 \cos\frac{\theta}{2} \sin\frac{\theta}{2} \skewm{\vu} + 2 \left( \sin \frac{\theta}{2} \skewm{\vu} \right)^2.\end{align}
We capture the intermediary by letting (w,v):=(cosθ2,sinθ2u)\boxed{ (w, \vv) := ( \cos \frac{\theta}{2}, \sin \frac{\theta}{2} \vu )}. (w,v)(w, \vv) is referred to as quaternion. This 4d vector is naturally of norm 1.
[Reduced ambiguities Since (θ,u)many-to-1(w,v)2-to-1R(\theta, \vu) \underbrace{\Longrightarrow}_{\text{many-to-1}} (w, \vv) \underbrace{\Longrightarrow}_{\text{2-to-1}} \mR . The intermediary (w,v)(w, \vv) absorbs and consolidates the many-to-one exponential map of (θ,u)R(\theta, \vu) \Rightarrow \mR into a 2-to-1 mapping through (w,v):=(cosθ2,sinθ2u)(w, \vv) := (\cos \frac{\theta}{2}, \sin \frac{\theta}{2} \vu ). (w,v)(w, \vv) and (w,v)(-w, -\vv) map to the same rotation, but this 2-to-1 mapping does not correspond to simply flipping the axis-angle. Flipping the signs of (θ,u)(\theta, \vu) in (cosθ2,sinθ2u)(\cos \frac{\theta}{2}, \sin \frac{\theta}{2} \vu ) doesn't change its value. To change (w,v)(w, \vv) to (w,v)(-w, -\vv), the options are
  • (θ2,u)(θ+k2π2,u)(\frac{\theta}{2}, \vu) \rightarrow (\frac{ \hphantom{-} \theta + k \cdot 2\pi}{2}, \hphantom{-}\vu) for odd int kk, e.g. k=1,((2πθ)2,u)k=1, (\frac{-(2\pi - \theta)}{2}, \vu) rotates through the opposite turn.
  • (θ2,u)(θ+k2π2,u)(\frac{\theta}{2}, \vu) \rightarrow (\frac{ -\theta + k \cdot 2\pi}{2}, -\vu) for odd int kk, e.g. k=1,(2πθ2,u)k=1, (\frac{2\pi - \theta}{2}, -\vu) keeps rotation angle under π\pi.
To keep at the same (w,v)(w, \vv), the options are
  • (θ2,u)(θ+k2π2,u)(\frac{\theta}{2}, \vu) \rightarrow (\frac{\hphantom{-}\theta + k \cdot 2 \pi}{2}, \hphantom{-}\vu) for some even int kk, e.g. k=0k=0
  • (θ2,u)(θ+k2π2,u)(\frac{\theta}{2}, \vu) \rightarrow (\frac{-\theta + k \cdot 2\pi }{2}, -\vu) for some even int kk, e.g. k=0,(θ2,u)k=0, (-\frac{\theta}{2}, -\vu) for flipped axis-angle.
In this sense, the 2-to-1 ambiguities of quaternion is the result of grouping the many-to-1 ambiguities that axis-angle possesses into "even'' and "odd'' sides, because all four axis-angle configurations, with kk being even or odd, map to the same rotation matrix.

6.1. Quaternion \rightarrow Rotation Matrix

The forward mapping starting from the intermediary (w,v)(w, \vv) is
R(w,v)=I+2w[v^]+2[v^]2=I+2w[v^]+2(vvTvTvI)=(12(v12+v22+v32))I+2(vvT+w[v^])=(12(v12+v22+v32))I+2[v12000v22000v32]diagonal entries+2[0v1v2v3wv1v3+v2wv2v1+v3w0v2v3v1wv3v1v2wv3v2+v1w0]off-diagonal entries.eqnlabel-eq:quat2mat\begin{align} \mR(w, \vv) &= \mI + 2 w \cdot \skewm{\vv} + 2\skewm{\vv}^2 \\ &= \mI + 2 w \cdot \skewm{\vv} + 2 (\vv \vv^T - \vv^T \vv\mI) \nonumber \\ &= \left(1 - 2(v_1^2 + v_2^2 + v_3^2) \right) \mI + 2(\vv \vv^T +w \cdot \skewm{\vv}) \nonumber \\ &= \underbrace{ \left(1 - 2(v_1^2 + v_2^2 + v_3^2) \right) \mI + 2 \begin{bmatrix} v_1^2 & 0 & 0 \\ 0 & v_2^2 & 0 \\ 0 & 0 & v_3^2 \end{bmatrix} }_{\text{diagonal entries}} + \underbrace{ 2 \begin{bmatrix} 0 & v_1 v_2 - v_3 w & v_1 v_3 + v_2 w \\ v_2 v_1 + v_3 w & 0 & v_2 v_3 - v_1 w \\ v_3 v_1 - v_2 w & v_3 v_2 + v_1 w & 0 \end{bmatrix} }_{\text{off-diagonal entries}} . \label{eq:quat2mat}\end{align}
The off-diagonal entries in eq. 14 are unambiguous. For the diagonal entries, using the constraint w2+v12+v22+v32=1w^2 + v_1^2 + v_2^2 + v_3^2 = 1, there are three different ways to write it in the literature and various codebases. WLOG consider the 2nd diagonal entry i.e. R11\mR_{11} with 0-based indexing,
R11=12(v12+v22+v32)+2v22=12v122v32eqnlabel-eq:quatm_v1=12(1w2)+2v22=2w2+2v221eqnlabel-eq:quatm_v2=(w2+v12+v22+v32)2v122v32=w2v12+v22v32.eqnlabel-eq:quatm_v3\begin{align} \mR_{11} = 1 - 2(v_1^2 + v_2^2 + v_3^2) + 2v_2^2 &= 1 - 2v_1^2 - 2 v_3^2 \label{eq:quatm\_v1} \\ &= 1 - 2(1 - w^2) + 2v_2^2 = 2 w^2 + 2v_2^2 - 1 \label{eq:quatm\_v2} \\ &= (w^2 + v_1^2 + v_2^2 + v_3^2) - 2v_1^2 - 2 v_3^2 = w^2 - v_1^2 + v_2^2 - v_3^2 . \label{eq:quatm\_v3}\end{align}
Two remarks
  1. Version 3 / eq. 17 provides critical insight for (parallelizable) inverse mapping from rotation matrix to quaternion. Version 1 / eq. 15 is for example used in Gaussian splatting. A few other papers use version 2.
  2. These different diagonal parameterizations will result in different gradients when backprop-ing from R(w,v)\mR(w, \vv) to [w,v][w, \vv]. But after projecting the gradient (w.r.t. norm 11 constraint / to be orthogonal to [w,v][w, \vv]), the gradients become the same [missing proof].

6.2. Rotation Matrix \rightarrow Quaternion

The problem with the inversion step is always "what happens if one of them is 00''. [Baseline We first present a baseline method that involves explicit if-else branching. This is not parallel-friendly and also numerically not ideal because small input perturbations could trigger different logic pathways.
tr(R)=3+0+22vTv=34v2  v2=3tr(R)4, w2=1v2=1+tr(R)4.RRT=4w[v^],R+RT=2I+4[v^]2=4vvT+(24v2)Ieqnlabel-eq:quatm_offdiag.\begin{align} \tr{\mR} = \mycolor{lightgray}{3 + 0 + 2 \cdot -2 \vv^T \vv =} 3 - 4 \| \vv \|^2 &~\Longrightarrow~ \| \vv \|^2 = \frac{3 - \tr{\mR}}{4}, ~ w^2 = \mycolor{lightgray}{ 1 - \| \vv \|^2 = } \frac{1 + \tr{\mR}}{4} . \\ \mR - \mR^T = 4w \cdot \skewm{\vv} &, \qquad \mR + \mR^T = \mycolor{lightgray}{ 2\mI + 4\skewm{\vv}^2 } = 4 \vv \vv^T + (2 - 4\|\vv\|^2)\mI \label{eq:quatm\_offdiag} .\end{align}
If tr(R)>1\tr{\mR} > -1, then we set w=+1+tr(R)4w = + \sqrt{\frac{1 + \tr{\mR}}{4}} and v=(RRT)4w\vv = \frac{(\mR - \mR^T)^{\vee}}{4w}. Note that when R=I,tr(R)=3,w=1,v=04\mR = \mI, \tr{\mR} = 3, w=1, v = \frac{\bm{0}}{4} is correct.
If tr(R)=1\tr{\mR} = -1, then w=0,R=RT,v=1w=0, \mR = \mR^T, \| \vv \| = 1 and vvT=R+I2\vv \vv^T = \frac{\mR + \mI}{2}, from which we read off v12,v22,v32,v1v2,v1v3,v2v3v_1^2, v_2^2, v_3^2, v_1v_2, v_1v_3, v_2v_3. At most two of v1,v2,v3v_1, v_2, v_3 could be 00. Fix the non-zero one to be positive, and obtain the other two.
Overall this routine uses a significant number of conditional branching.
[Better method Using the diagonal parameterization in eq. 17, we have
[1111111111111111]L[w2v12v22v32]=[1R00R11R22][w2v12v22v32]=14L[1R00R11R22].\begin{align} \underbrace{ \begin{bmatrix*}[r] 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \\ \end{bmatrix*} }_{\mL} \begin{bmatrix} w^2 \\ v_1^2 \\ v_2^2 \\ v_3^2 \end{bmatrix} = \begin{bmatrix} 1 \\ \mR_{00} \\ \mR_{11} \\ \mR_{22} \end{bmatrix} \quad \Longrightarrow \quad \begin{bmatrix} w^2 \\ v_1^2 \\ v_2^2 \\ v_3^2 \end{bmatrix} = \frac{1}{4} \mL \begin{bmatrix} 1 \\ \mR_{00} \\ \mR_{11} \\ \mR_{22} \end{bmatrix} .\end{align}
The first row of L\mL comes from the norm 1 constraint. We supply it so that the linear system is solvable. Interestingly L1\mL^{-1} is 14L\frac{1}{4}\mL. It makes 12L\frac{1}{2}\mL an involutary, symmetric, orthogonal matrix.
Now that we have the squares of each number, it remains to determine the signs. Using eq. 19, we can use the off-diagonal entries to obtain pairwise products
wv1=R21R124,wv2=R02R204,wv3=R10R014,v1v2=R01+R104,v1v3=R02+R204,v2v3=R12+R214,\begin{align} w v_1 = \frac{\mR_{21} - \mR_{12}}{4},& \quad w v_2 = \frac{\mR_{02} - \mR_{20}}{4}, \quad w v_3 = \frac{\mR_{10} - \mR_{01}}{4} , \nonumber \\ v_1 v_2 = \frac{\mR_{01} + \mR_{10}}{4},& \quad v_1 v_3 = \frac{\mR_{02} + \mR_{20}}{4}, \quad v_2 v_3 = \frac{\mR_{12} + \mR_{21}}{4} ,\end{align}
and finally
[w,v1,v2,v3]=[w2,wv1,wv2,wv3]+w2=[wv1,v12,v1v2,v1v3]+v12=[wv2,v1v2,v22,v2v3]+v22=[wv3,v1v3,v2v3,v32]+v32.\begin{align} \myscalebox{1.15}{ [w, v_1, v_2, v_3] = \frac{[w^2, wv_1, w v_2, w v_3]}{+\sqrt{w^2}} = \frac{[wv_1, v_1^2, v_1v_2, v_1v_3]}{+\sqrt{v_1^2}} = \frac{[wv_2, v_1v_2, v_2^2, v_2v_3]}{+\sqrt{v_2^2}} = \frac{[wv_3, v_1v_3, v_2v_3, v_3^2]}{+\sqrt{v_3^2}} . }\end{align}
At any time, at least one of [w,v1,v2,v3][w, v_1, v_2, v_3] is non-zero due to norm-1 constraint, and thus a common strategy is to select the one with the largest denominator out of the 4 options.
A batch-parallel implementation can be found in PyTorch3d.

6.3. Quaternion \rightarrow Axis-Angle

[to 4 numbers (θ,u)(\theta, \vu) Over the batch set w=cosθ2w = \cos \frac{\theta}{2} non-negative so that θ\theta is within [0,π][0, \pi]. Recall and contrast that for axis-angle, no restriction is placed on the value of cosθ\cos \theta for θ\theta to be in [0,π][0, \pi].
q=[q.w0]qθ=2atan2(v,w)u=v/v  if  θ>0  else  u=0.\begin{align} \vq &= [\vq.w \ge 0] \cdot \vq \nonumber \\ \theta &= 2 \cdot \text{atan2}(\|\vv\|, w) \nonumber \\ \vu &= \vv / \| \vv \| \text{~~if~~} \theta > 0 \text{~~else~~} \vu = \bm{0} .\end{align}
[to 3 numbers hR3\vh \in \R^3 If the goal is instead to directly invert to unnormalized axis, then it is numerically stabler to use small angle approximation arctan(x)=x13x3+\arctan(x) = x - \frac{1}{3}x^3 + \dots.
q=[q.w0]qh=2(vw13(vw)3)vv=(2w23v2w3)v  if  v<ϵ=2atan2(v,w)vv  else.\begin{align} \vq &= [\vq.w \ge 0] \cdot \vq \nonumber \\ \vh &= 2\left( \frac{\|\vv\|}{w} - \frac{1}{3} \left(\frac{\|\vv\|}{w} \right)^3 \right) \frac{\vv}{\|\vv\|} = \left( \frac{2}{w} - \frac{2}{3} \cdot \frac{\|\vv\|^2}{w^3} \right) \vv \text{~~if~~} \| \vv \| < \epsilon \nonumber \\ &= 2 \cdot \text{atan2}(\|\vv\|, w) \cdot \frac{\vv}{\|\vv\|} \text{~~else} .\end{align}
See implementation in LieTorch, which is itself based on Sophus. Moreover, this stackexchange post pointed out that we can use half-angle formula of tangent to implement atan2.

6.4. Gradient of Matrix w.r.t. Quaternion

v1[v^]2vvTvTvI=[0v2v3v22v10v302v1] v2[v^]2=[2v2v10v10v30v32v2] v3[v^]2=[2v30v102v3v2v1v20].\begin{align} \partial_{v_1} \underbrace{ \skewm{\vv}^2 }_{\vv \vv^T - \vv^T \vv \mI} = \begin{bmatrix} 0 & v_2 & v_3 \\ v_2 & -2v_1 & 0 \\ v_3 & 0 & -2v_1 \\ \end{bmatrix} ~ \partial_{v_2} \skewm{\vv}^2 = \begin{bmatrix} -2v_2 & v_1 & 0 \\ v_1 & 0 & v_3 \\ 0 & v_3 & -2v_2 \\ \end{bmatrix} ~ \partial_{v_3} \skewm{\vv}^2 = \begin{bmatrix} -2v_3 & 0 & v_1 \\ 0 & -2v_3 & v_2 \\ v_1 & v_2 & 0 \\ \end{bmatrix} .\end{align}
Using the diagonal param of eq. 15, wR=2[v^]\partial_w \mR = 2 \skewm{\vv} , and
v1R=2([00000w0w0]+[0v2v3v22v10v302v1]) v2R=2([00w000w00]+[2v2v10v10v30v32v2]) v3R=2([0w0w00000]+[2v30v102v3v2v1v20]).\begin{align} \myscalebox{0.65}{ \partial_{v_1} \mR = 2\left( \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & -w \\ 0 & w & 0 \\ \end{bmatrix} + \begin{bmatrix} 0 & v_2 & v_3 \\ v_2 & -2v_1 & 0 \\ v_3 & 0 & -2v_1 \\ \end{bmatrix} \right) } ~ \myscalebox{0.65}{ \partial_{v_2} \mR = 2\left( \begin{bmatrix} 0 & 0 & w \\ 0 & 0 & 0 \\ -w & 0 & 0 \\ \end{bmatrix} + \begin{bmatrix} -2v_2 & v_1 & 0 \\ v_1 & 0 & v_3 \\ 0 & v_3 & -2v_2 \\ \end{bmatrix} \right) } ~ \myscalebox{0.65}{ \partial_{v_3} \mR = 2\left( \begin{bmatrix} 0 & -w & 0 \\ w & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} + \begin{bmatrix} -2v_3 & 0 & v_1 \\ 0 & -2v_3 & v_2 \\ v_1 & v_2 & 0 \\ \end{bmatrix} \right) } .\end{align}
Again the gradient is different depending on the diag parameterization used. But they become the same after projection.

7. Continuity of Rotation Representation

Mapping from matrix R3×3\R^{3 \times 3} to lower-dimensional representation needs to be continuous. Inversion map needs to be continuous.
Zhou's definition of continuity and why axis-angle, quat fail to qualify.
Here are three rotation matrices which correspond to rotation angles 175,180,185175, 180, 185 around axis 13[1,1,1]\frac{1}{\sqrt{3}} [1, 1, 1]
(πϵ,u)(\pi - \epsilon, \vu) and (π+ϵ,u)(\pi + \epsilon, \vu) are very similar, but different rotations. Think of rotating 179179 vs 181181 degrees around axis u\vu. The rotation matrices are very close.
The axis-angle representations are (179,u),(179,u)(179, \vu), (179, -\vu).
The quaternion representations are (cos1792,sin1792u),(cos1812,sin1812u)(cos1792,sin1792u)(\cos\frac{179}{2}, \sin \frac{179}{2} \vu ), (\cos\frac{181}{2}, \sin\frac{181}{2}\vu) \rightarrow (\cos\frac{179}{2}, -\sin\frac{179}{2}\vu)

8. Interpolation

It is often stated that quaternion is the ideal representation to interpolate rotations with. Its advantage is not obvious because the behaviors of quaternions are hard to intuit. This section attempts to establish some precise characterizations.

8.1. Axis-Angle Interpolation

Rotations are actions / groups, and an intuitive way to interpolate from action R1\mR_1 to R2\mR_2 by ratio tt is to gradually apply the connecting action by
interp(R1R2,t)=R1(R1TR2)t=(R2R1T)tR1.\begin{align} \text{interp}(\mR_1 \rightarrow \mR_2, t) = \mR_1 (\mR_1^T\mR_2)^t \mycolor{lightgray}{= (\mR_2 \mR_1^T)^t \mR_1}.\end{align}
It would satisfy the endpoints: interp(R1R2,0)=R1\text{interp}(\mR_1 \rightarrow \mR_2, 0) = \mR_1 and interp(R1R2,1)=R2\text{interp}(\mR_1 \rightarrow \mR_2, 1) = \mR_2.
[Ambiguity of (R1TR2)t,t(0,1)(\mR_1^T\mR_2)^t, t \in (0, 1) For diagonalizable A=XΛX1,tR\mA = \mX \bm{\Lambda} \mX^{-1}, t \in \R, matrix power is defined as At:=XΛtX1\mA^t := \mX \bm{\Lambda}^t \mX^{-1} . In particular, using fact 5, exp(A)t=(Xexp(Λ)X1)t:=Xexp(Λ)tX1=Xexp(tΛ)X1=exp(tXΛX1)=exp(tA)\exp(\mA)^t = \mycolor{lightgray}{ (\mX\exp(\bm{\Lambda})\mX^{-1})^t := \mX\exp(\bm{\Lambda})^t\mX^{-1} = \mX\exp(t\bm{\Lambda})\mX^{-1} = \exp(t\mX\bm{\Lambda}\mX^{-1}) } = \exp(t\mA) .
Rotation matrices are unitarily diagonalizable, and in 3D Λ\bm{\Lambda} is some diag[1,a+bi,abi]\textbf{diag}[1, a+bi, a-bi] with a2+b2=1a^2 + b^2=1. The problem is that for matrix with complex eigenvalues, non-integer power of a complex number (a+bi)t(a + bi)^t is not uniquely defined [details]. We need to first take complex log and convert a+bia + bi to polar form eiθe^{i\theta}, before raising to eitθe^{it \cdot \theta}. Complex logarithm is not unique. eiθ=ei(θ+k2π)e^{i\theta} = e^{i(\theta + k \cdot 2\pi)} for any integer kk. When raised to power tt, however,
eit(θ+k2π)=eitθeitk2π1 unless t is integer.\begin{align} e^{it \cdot (\theta + k \cdot 2\pi)} = e^{it \cdot \theta} \cdot \underbrace{e^{it\cdot k \cdot 2\pi}}_{ \neq 1 \text{ unless t is integer} } .\end{align}
Hence in general (eiθ)t(ei(θ+k2π))t( e^{i\theta} )^t \neq ( e^{i(\theta + k \cdot 2\pi)} )^t for t(0,1)t \in (0, 1).
The situation manifests as the ambiguity of log-map from rotation matrix to axis-angle R1TR2=expθ[w^]\mR_1^T\mR_2 = \exp \theta \skewm{\vw}. We need to ensure θ[0,π]\theta \in [0, \pi], so that interpolation takes place over the small arc,
interp(R1R2,t)=R1(expθ[w^])t=R1exp(tθ[w^])=R1(I+sintθ[w^]+(1costθ)[w^]2).\begin{align} \text{interp}(\mR_1 \rightarrow \mR_2, t) &= \mR_1 (\exp \theta \skewm{\vw})^t = \mR_1 \exp (t\theta \skewm{\vw}) \nonumber \\ &= \mR_1 (\mI + \sin t\theta \skewm{\vw} + (1 - \cos t\theta) \skewm{\vw}^2).\end{align}
Quaternion needs to deal with this too.

8.2. Quaternion power?

Quaternion product:
(w2,v2)(w1,v1)=(w2w1v2v1,w2v1+w1v2+v2×v1)\begin{align} (w_2, \vv_2) (w_1, \vv_1) = (w_2w_1 - \vv_2 \vv_1, w_2 \vv_1 + w_1 \vv_2 + \vv_2 \times \vv_1)\end{align}
This formula is, at the moment, not easily inventable without thinking about hypercomplex numbers.
Rodrigues's derivation in 1840 uses the geometry of the spherical triangle formed by the 3 rotation axes. That picture is intuitive enough and clearly explains the critical role of half angles and bisecting lines. See Altmann Rotations Quaternion and Double Groups p159.
But working out the actual formula from the picture requires spherical trigonometry (not too bad) in addition to extensive, non-obvious algebraic steps. It seems improbable that one could naturally arrive at the quaternion product without knowing the final goal beforehand as a guide. Detecting the similarity between Rodrigues' rotation composition rule and quaternion product is easier in hindsight.
I tried finding it from the matrix product by identifying new (w3,v3)(w_3, \vv_3). Also tried mathematica. Gave up.

8.3. Slerp: spherical linear interpolation

The slerping routine is generally applicable to vectors. Besides rotation, it is most often used to interpolate Gaussian noise vectors that act as initiating latents of deep generative models.
The gist is to establish a basis in the plane spanned by {u,v}\{\vu, \vv\}, and we need 22 orthogonal axis vectors. The angle θ\theta between u,v\vu, \vv is dependent on the choice of basis. The ordering of the basis vectors may change θ\theta to θ-\theta or 2πθ2\pi - \theta. What is invariant is cosθ=cos(θ)=cos(2πθ)\cos \theta = \cos (-\theta) = \cos(2\pi - \theta) . Assuming u,v\vu, \vv are of unit norm, cosθ=uv\cos\theta = \vu \cdot \vv, and again dot product is invariant to change of basis.
Since the interpolation starts at u\vu, we let it be one of the axis so that the algebra is simpler. To find the other orthogonal axis w\vw, we need the precise phase, for which we let sinθ=+1cos2θ\sin \theta = + \sqrt{1 - \cos^2 \theta} in order that the angle θ[0,π]\theta \in [0, \pi] and the interpolation travels along the small arc.
v=ucosθ+wsinθ  w=v(vu)u+1(vu)2.\begin{align} \vv = \vu \cos \theta + \vw \sin \theta ~\Longrightarrow ~ \vw = \frac{\vv - (\vv \cdot \vu) \vu }{+\sqrt{1 - (\vv \cdot \vu)^2}} .\end{align}
To compute the interpolated vector with ratio t[0,1]t \in [0, 1],
slerp(uv,t)=ucostθ+wsintθ=ucostθ+vucosθsinθsintθ=u(sinθcostθcosθsintθ)+vsintθsinθ=u(sinθcos(tθ)+cosθsin(tθ))+vsintθsinθ=usin(1t)θ+vsintθsinθeqnlabel-eq:slerp.\begin{align} \text{slerp}(\vu \rightarrow \vv, t) &= \vu \cos t\theta + \vw \sin t\theta = \vu \cos t\theta + \frac{\vv - \vu \cos \theta }{\sin \theta} \cdot \sin t\theta \nonumber \\ &= \frac{\vu(\sin\theta \cos t\theta - \cos \theta \sin t\theta ) + \vv \sin t\theta }{\sin \theta} \nonumber \\ &= \frac{\vu(\sin\theta \cos (-t\theta) + \cos \theta \sin (-t\theta) ) + \vv \sin t\theta }{\sin \theta} \nonumber \\ &= \frac{\vu\sin(1 - t)\theta + \vv \sin t\theta }{\sin \theta} \label{eq:slerp} .\end{align}
The slerping formula in eq. 32 is symmetrical in that slerp(uv,t)=slerp(vu,1t)\text{slerp}(\vu \rightarrow \vv, t) = \text{slerp}(\vv \rightarrow \vu, 1- t), as it should be.

8.4. Quaternion Slerp

There are two issues: why slerp quat? What do you get?
q1q2=(w1,v1)(w2,v2)=(w1w2+v1v2cos(θ/2),w1v2w2v1v1×v2sin(θ/2)u12)\begin{align} \vq_1^* \vq_2 = (w_1, -\vv_1) (w_2, \vv_2) = (\underbrace{w_1w_2 + \vv_1 \cdot \vv_2}_{\cos (\theta/2)}, \underbrace{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}_{\sin (\theta/2) \, \vu_{1 \rightarrow 2}} )\end{align}
cosΩ=q1q2=w1w2+v1v2=(q1q2).w=cos(θ/2)u12=w1v2w2v1v1×v2sinΩ\begin{align} \cos \Omega = \vq_1 \cdot \vq_2 = w_1 w_2 + \vv_1 \cdot \vv_2 = (\vq_1^*\vq_2).w = \cos (\theta / 2) \\ \vu_{1 \rightarrow 2} = \frac{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}{\sin \Omega}\end{align}
if you can invent the quat product, then
  • show that (w,v)(w,v)=(1,0)(w, \vv) (w, -\vv) = (1, \bm{0}) if q=1|q| = 1
  • show that (w,v)2=(wwvTv,2wv)=(cos2θ2sin2θ2,2cosθ2sinθ2u)=(cosθ,sinθu)(w, \vv)^2 = (ww - \vv^T \vv, 2wv) = (\cos^2 \frac{\theta}{2} - \sin^2 \frac{\theta}{2}, 2\cos \frac{\theta}{2} \sin \frac{\theta}{2} \vu ) = (\cos \theta, \sin \theta \vu) and by induction for all positive integer tt. Negative and real defined analogously. The notion of quat exponential not strictly needed.
q1(q1q2)t=q1(costθ2,sintθ2u12)=(w1,v1)(costΩ,  sintΩu12)=(w1costΩsintΩv1u12, w1sintΩu12+costΩv1+sintΩv1×u12)=(w1costΩsintΩv1u12,  costΩv1+sintΩ(w1u12+v1×u12)\begin{align} \vq_1 (\vq_1^* \vq_2)^t &= \vq_1 ( \cos t \frac{\theta}{2} ,\: \sin t \frac{\theta}{2} \, \vu_{1 \rightarrow 2}) = ( w_1, \vv_1) ( \cos t \Omega ,\; \sin t \Omega \, \vu_{1 \rightarrow 2} ) \\ &= ( w_1 \cos t \Omega - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} ,~ w_1 \sin t \Omega \, \vu_{1 \rightarrow 2} + \cos t \Omega \vv_1 + \sin t \Omega \, \vv_1 \times \vu_{1 \rightarrow 2} ) \\ &= ( w_1 \cos t \Omega - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} ,\; \cos t \Omega \, \vv_1 + \sin t \Omega (w_1 \vu_{1 \rightarrow 2} + \vv_1 \times \vu_{1 \rightarrow 2} )\end{align}
sintΩv1u12=sintΩv1(w1v2w2v1v1×v2)sinΩ=sintΩw2v1v1w1v1v2sinΩ  where  v1v1=1w12=sintΩw2w1(w1w2+v1v2)sinΩ=sintΩw2w1cosΩsinΩ\begin{align} - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} &= - \sin t \Omega \, \frac{\vv_1 \cdot (w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2)}{\sin \Omega} \\ &= \sin t \Omega \, \frac{w_2 \vv_1 \cdot \vv_1 -w_1\vv_1 \cdot \vv_2 }{\sin \Omega} ~~\text{where}~~ \vv_1 \cdot \vv_1 = 1 - w_1^2 \\ &= \sin t \Omega \, \frac{w_2 - w_1( w_1 w_2 + \vv_1 \cdot \vv_2) }{\sin \Omega} = \sin t \Omega \, \frac{w_2 - w_1 \cos \Omega }{\sin \Omega}\end{align}
sintΩ(w1u12+v1×u12)=sintΩ(w1w1v2w2v1v1×v2sinΩ+v1×w1v2w2v1v1×v2sinΩ)=sintΩw1w1v2w1w2v1+v1×(v1×v2)sinΩ=sintΩw1w1v2w1w2v1+(v1Tv1)v2(v1Tv2)v1sinΩ=sintΩ(w1w1+v1Tv1)v2(w1w2+v1Tv2)v1sinΩ=sintΩv2v1cosΩsinΩ\begin{align} & \sin t \Omega (w_1 \vu_{1 \rightarrow 2} + \vv_1 \times \vu_{1 \rightarrow 2} ) \\ =& \sin t \Omega \left( w_1 \frac{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}{\sin \Omega} + \vv_1 \times \frac{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}{\sin \Omega} \right) \\ =& \sin t \Omega \frac{ w_1 w_1 \vv_2 - w_1 w_2 \vv_1 + \vv_1 \times (\vv_1 \times \vv_2) }{\sin \Omega} \\ =& \sin t \Omega \frac{ w_1 w_1 \vv_2 - w_1 w_2 \vv_1 + (\vv_1^T \vv_1) \vv_2 - (\vv_1^T \vv_2) \vv_1 }{\sin \Omega} \\ =& \sin t \Omega \frac{ (w_1 w_1 + \vv_1^T \vv_1) \vv_2 - (w_1 w_2 + \vv_1^T \vv_2) \vv_1 }{\sin \Omega} = \sin t \Omega \frac{ \vv_2 - \vv_1\cos \Omega }{\sin \Omega}\end{align}
q1(q1q2)t=(w1costΩsintΩv1u12,  costΩv1+sintΩ(w1u12+v1×u12)=(w1costΩsintΩw2w1cosΩsinΩ, v1costΩ+sintΩv2v1cosΩsinΩ)=q1costΩsintΩq2q1cosΩsinΩ=slerp(q1q2,t)\begin{align} \vq_1 (\vq_1^* \vq_2)^t &= ( w_1 \cos t \Omega - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} ,\; \cos t \Omega \, \vv_1 + \sin t \Omega (w_1 \vu_{1 \rightarrow 2} + \vv_1 \times \vu_{1 \rightarrow 2} ) \\ &= \left( w_1 \cos t \Omega - \sin t \Omega \, \frac{w_2 - w_1 \cos \Omega }{\sin \Omega},~ \vv_1 \cos t \Omega + \sin t \Omega \frac{ \vv_2 - \vv_1\cos \Omega }{\sin \Omega} \right) \\ &= \vq_1 \cos t \Omega - \sin t \Omega \, \frac{\vq_2 - \vq_1 \cos \Omega }{\sin \Omega} \\ &= \text{slerp}(\vq_1 \rightarrow \vq_2, t)\end{align}

9. Cayley Map / Real Projective Patch

One way to think of it is it further parameterizes quaternion. The other way is cayley map (I+A)(IA)1(I+A)(I-A)^{-1}. Prove that they always agree.
[Online Patch Switching problem with cayley map is zero denominator. Adaptively switch patches. Doesn't play nice with running statistics like momentum. paper. Sitao Xiang's paper proposed patch ensemble...
In some sense gradient tangential projection is doing "continuous patch switching''?

10. Modified Cayley Map

A more judicious parameterization?

11. Concerns

When dynamics is involved, exp map might be more natural since angular velocities are naturally expressed in axis-angle form.

12. Tangent Space Backpropagation

τ(log[f(R)1f(Rexp[τ^])])\begin{align} \frac{\partial}{\partial \vtau} \left( \log \left[ f(\mR)^{-1} f\left( \mR \exp\skewm{\vtau} \right) \right] \right)^{\vee}\end{align}
When f(R)=R1f(\mR) = \mR^{-1},
τ(log[(R1)1(Rexp[τ^])1])=τ(log[R(exp[τ^])1R1])=τ(logexp(R[τ^]R1))=τ(R[τ^]R1)=AdjR.\begin{align} \frac{\partial}{\partial \vtau} \left( \log \left[ (\mR^{-1})^{-1} \left( \mR \exp\skewm{\vtau} \right)^{-1} \right]\right)^{\lor} &= \frac{\partial}{\partial \vtau} \left( \log \left[ \mR (\exp\skewm{\vtau})^{-1} \mR^{-1} \right]\right)^{\lor} \nonumber \\ &= \frac{\partial}{\partial \vtau} \left( \log \exp( \mR \skewm{-\vtau} \mR^{-1} ) \right)^{\lor} \nonumber \\ &= \frac{\partial}{\partial \vtau} \left( \mR \skewm{-\vtau} \mR^{-1} \right)^{\lor} \nonumber \\ &= -\text{Adj}_{\mR} .\end{align}
When f(R)=RBf(\mR) = \mR \mB,
τ(log[(RB)1(Rexp[τ^]B)])=τ(log[B1exp[τ^]B])=τ(logexp[B1[τ^]B])=AdjB1\begin{align} \frac{\partial}{\partial \vtau} \left( \log \left[ (\mR\mB)^{-1} \left( \mR \exp\skewm{\vtau} \mB \right) \right] \right)^{\vee} &= \frac{\partial}{\partial \vtau} \left( \log \left[ \mB^{-1} \exp\skewm{\vtau} \mB \right] \right)^{\vee} \nonumber \\ &= \frac{\partial}{\partial \vtau} \left( \log \exp \left[ \mB^{-1} \skewm{\vtau} \mB \right] \right)^{\vee} = \text{Adj}_{\mB^{-1}}\end{align}
When f(R)=BRf(\mR) = \mB \mR ,
τ(log[(BR)1(BRexp[τ^])])=ττ=I\begin{align} \frac{\partial}{\partial \vtau} \left( \log \left[ (\mB \mR)^{-1} \left( \mB \mR \exp\skewm{\vtau} \right) \right] \right)^{\vee} = \frac{\partial}{\partial \vtau} \vtau = \mI\end{align}
When f(R)=Rxf(\mR) = \mR \vx,
τ(log[(Rexp[τ^])xRx])=τ((Rexp[τ^])xRx)=τRexp[τ^]x=τR(I+[τ^]+)x=τR[τ^]x=R[x^]\begin{align} \frac{\partial}{\partial \vtau} \left( \log \left[ \left( \mR \exp\skewm{\vtau} \right) \vx - \mR \vx \right] \right)^{\lor} &= \frac{\partial}{\partial \vtau} \left( \left( \mR \exp\skewm{\vtau} \right) \vx - \mR \vx \right) \nonumber \\ &= \frac{\partial}{\partial \vtau} \mR \exp\skewm{\vtau} \vx \nonumber \\ &= \frac{\partial}{\partial \vtau} \mR (\mI + \skewm{\vtau} + \dots) \vx = \frac{\partial}{\partial \vtau} \mR \skewm{\vtau} \vx \nonumber \\ &= -\mR \skewm{\vx}\end{align}
Approximating with I+[τ^]+\mI + \skewm{\vtau} + \dots is fine here since there is no log\log outside. In other cases, we rely on logexp\log \exp to cancel out each other.

13. Some Cross Product Identities

[x^]x=0[x^]y=[y^]x[x^][y^]=yxT(xy)Ix×(y×z)=(xz)y(xy)z[x×y^]=[x^][y^][y^][x^]=yxTxyT[(Ax)×(Ay)^]=A[x×y^]AT(Ax)×(Ay)=det(A)AT(x×y)[x^]A+AT[x^]=tr(A)[x^][Ax^][Ax^]=det(A)AT[x^]A1\begin{align*} \skewm{\vx} \vx &= \bm{0} \\ \skewm{\vx} \vy &= -\skewm{\vy} \vx \\ \skewm{\vx} \skewm{\vy} &= \vy \vx^T - (\vx \cdot \vy)\mI \\ \vx \times (\vy \times \vz) &= (\vx \cdot \vz)\vy - (\vx \cdot \vy)\vz \\ [\widehat{ \vx \times \vy}] &= \skewm{\vx}\skewm{\vy} - \skewm{\vy}\skewm{\vx} = \vy \vx^T - \vx \vy^T \\ [\widehat{ (\mA\vx) \times (\mA\vy)}] &= \mA [\widehat{ \vx \times \vy}] \mA^T \\ (\mA\vx) \times (\mA\vy) &= \det(\mA) \mA^{-T} (\vx \times \vy) \\ \skewm{\vx}\mA + \mA^T\skewm{\vx} &= \tr{\mA} \skewm{\vx} - \skewm{\mA \vx} \\ \skewm{\mA \vx} &= \det(\mA) \mA^{-T} \skewm{\vx} \mA^{-1}\end{align*}
the last few ones are strange. ask gpt.

14. Dummy Section Header

15. Dummy Section Header

16. Dummy Section Header

16.1. Dummy Subsection Header

16.2. Dummy Subsection Header

17. Dummy Section Header

18. Dummy Section Header

18.1. Dummy Subsection Header

18.2. Dummy Subsection Header

18.3. Dummy Subsection Header

18.4. Dummy Subsection Header

19. Dummy Section Header

Last updated on 2024-04-03. Design inspired by distill.