Hypercomplex numbers have their uses in algebra, but for the purpose of spatial rotations it is perhaps helpful to downweigh the emphasis a little bit and focus instead on Rodrigues' original perspective, which is purely geometric and arguably more inventable. It turns out that historically Rodrigues first developed the use of a 3- or 4-number tuple as a rational representation of rotations without trigonometric functions. The concept of a matrix was not yet prevalent, but his proposed formulae in effect 1) converts the 3/4 numbers to a rotation matrix, and 2) composes successive rotations into a new 3/4-number tuple.
[u^]2=uuT−(uTu)I. tr([u^]2)=−2uTu. If u has norm 1, [u^]2=uuT−I.
2. Matrix Exponential
Define matrix exponential of a square matrix X∈Rn×n to be the series
exp(X):=k=0∑∞k!Xk.
Fact 1: The series always converges [proof]. exp(A)exp(B)=exp(B)exp(A) in general. exp(X) results in an invertible matrix [proof]: exp(−X)exp(X)=exp(X)exp(−X)=I.
Fact 2: Matrix exponential is generally not a 1-to-1 mapping (it is in 1D). It's generally many-to-1 [example]. See the detailed case breakdown for R3×3 in the axis-angle mapping section below.
Fact 3: If X is skew-symmetric, then exp(X) is an orthogonal matrix, i.e. its transpose is its inverse:
exp(X)T=exp(XT)=exp(−X)=exp(X)−1.
Furthermore, its determinant det(exp(X))=exptr(X)=exp(0)=1, by Jacobi's identity [proof]. Thus, exp(X) produces special orthogonal matrices SO(n).
We also claim that every special orthogonal matrix R can be generated by some skew-symmetric matrix X. The mapping is surjective [missing proof].
Fact 4: If (λ,v) is an eigen val/vec pair of X, then exp(X)v=∑k=0∞k!λkv. (eλ,v) is an eigen pair of exp(X).
Fact 6: ∂texp(tA)=∂t(I+tA+2!t2A2+…)=A+tA2+2!t2A3+…=Aexp(tA)=exp(tA)A. Let y(t)=exp(tA)x. Then y˙(t)=Aexp(tA)x=Ay(t).
We are therefore able to come up with guesses when encountering ODEs of the form
B˙=AB or BAy˙=Ay⟹B(t)=exp(tA)⟹y(t)=exp(tA)xfor some initial x.
These are less useful for rotations, but central in modeling probability distributions that appear in the context of markov chains.
3. Axis-Angle Mapping
[4-number, constrained parameterization]
In 3D, given (u∈R3,θ∈R), with ∥u∥=1 and θ without range restriction, [u^] is skew-symmetric, and thus by fact 3
preserves vectors along axis u. For a unit-length vector x orthogonal to u, [u^]x=u×x provides the 3rd orthogonal vector y. [u,x,y] forms a right-hand oriented basis.
rotates tx on the x,y plane, which is orthogonal and right-handed to u, from x to y by angle θ. The same conclusion can be reached by a spectral analysis on the complex eigenvalues. Action on any vector can be decomposed into action on some su+tx.
Therefore it is appropriate to name (θ,u) the axis-angle parameterization of a rotation matrix.
This mapping is not injective. To produce the same rotation matrix, for some integer k, one could have
(θ,u) and (−θ+k⋅2π,−u), e.g. k=1,(−(2π−θ),u) rotates through the opposite turn.
(θ,u) and (−θ+k⋅2π,−u), e.g. k=1,(2π−θ,−u) keeps the rotation angle under π.
At θ=k⋅2π, any u maps to identity matrix.
See the cases with quaternion for contrast.
[3-number, unconstrained parameterization] Given an unnormalized h∈R3,
since numerical software is unaware of the interaction bewtween sinx and 1/x when the two parts are evaluated separately. To look up series like these, use wolfram alpha.
Some authors do not make a distinction between eq. 6 and 9.
eq. 6 cover many common use cases; axis and angle are often specified separately.
A major advantage of eq. 9 is that it allows unconstrained optimization of rotations through h. Optimization using (θ,u) parameterization in eq. 6 needs extra care to maintain the norm 1 constraint on u.
3.1. Log Map: Matrix → Axis-Angle, and inversion in general
Every rotation matrix can be produced by some axis-angle (θ,u) s.t. θ∈[0,π], because if θ>π, (2π−θ,−u) produces the same rotation.
Thus knowing cosθ (or tanθ) is sufficient to determine rotation angle θ, since arccos (or arctan) is unambiguous for [0,π]. On the contrary sinθ alone is insufficient to determine θ.
It is no coincidence that both rotation matrix and quaternion provide unambiguous ways to get cosθ (for quat, cosθ=2w2−1), and sinθ=±1−cos2θ is only decidable up to sign, with + chosen to have θ in range [0,π].
R−RT=2[u^]sinθ. u=2sinθ(R−RT)∨ if sinθ=0. By deciding the sign of sinθ to be +, we have also locked the sign of u. If tr(R)=−1,cosθ=−1,θ=π, R is symmetric, and sinθ=0; we obtain pairwise products of entries of u by R=I+2[u^]2⇒uuT=2R+I. With sinθ=0, the sign of u is free to flip. Set one of the non-zero element to positive, and get the rest from pairwise products.
R+RT=2(I+(1−(tr(R)−1)/2)cosθ)[u^]2)⟹uuT=3−tr(R)R+RT+(1−tr(R))I if tr(R)=3. Rotation axis at I is undefined. This identity provides pairwise products of u at all θ>0, and we can compute u up to sign flip as described above. But make sure that the sign of u is compatible with the +ve sign of sinθ as their products are locked by [u^]sinθ=2R−RT. The need for the compatibility check makes this route not that useful vs item 2.
The alternative is rotation matrix → quaternion → axis-angle. The baseline approach in rotation matrix → quaternion is similar to what is outlined in this section.
4. A Historical Note on Quaternions
We discuss an easier and more historically faithful way to motivate the use of quaternion for rotations.
Quaternions are often introduced as hypercomplex numbers.
The problem with this presentation is that it is difficult to intuit a priori why conjugation R(q)x=qxq∗ should perform rotation acting on x while product q3=q2q1 implements rotation composition. The formulae can be algebraically verified posthoc, but their origins are puzzling. Why not use x′=qx to rotate vectors?
Hamilton himself seemed to have struggled with this exact question, even after Cayley's article came to his attention.
It turns out that historically Rodrigues first developed the use of a 3- or 4-number tuple as a rational representation of rotations without trigonometric functions. The concept of a matrix was not yet prevalent, but his proposed formulae in effect 1) converts the 3/4 numbers to a rotation matrix, and 2) composes successive rotations into a new 3/4-number tuple.
Arthur Cayley studied [#6, 1843] Rodrigues' results,
and realized [#20, 1845] that quaternions can be manipulated to reproduce Rodrigues' formulae.
His background in Rodrigues' prior work made the discovery much more plausible, and he went on to extend quaternions in several directions, one of which is the representation of 4D rotations [#137].
Hypercomplex numbers have their uses in algebra, but for the purpose of spatial rotations it is perhaps helpful to downweigh the emphasis a little bit and focus instead on Rodrigues' original perspective, which is purely geometric and arguably more inventable.
We do not dispute the importance of vanilla complex numbers a+bi in understanding rotations. The role of the eigenvalues of a 3×3 rotation matrix are easy to describe. Hypercomplex numbers are just more unwieldy to mentally visualize.
5. Absorbing ambiguities with intermediary:
a simple 5D representation
The spirit is to capture intermediate variables in Rodrigues mapping of eq. 6. Also, we want to avoid invoking sin,cos so as to make derivative operations more numerically stable?
One source of ambiguity in axis-angle is the angle. We attempt to over-parameterize it in order to reduce the ambiguities.
We could over-parameterize the angle θ to absorb its ambiguities.
Let (α,β):=(sinθ,cosθ).
Let (x,y):=(cosθ,sinθ).
R(x,y,u)=I+y[u^]+(1−x)[u^]2.
Unit norm constraint α2+β2=1.
Now the forward mapping becomes
R(α,β,u)=I+α[u^]+(1−β)[u^]2.
The ambiguities have been reduced. To map to the same rotation matrix,
(α,β,u) and (−α,β,−u). This keeps the rotation angle under π (it's not flipped axis-angle).
When (α,β)=(0,1), any u maps to identity matrix.
One way to motivate quaternion is that it resolves case 2 by merging sine and axis u.
6. Quaternion, a 4D representation
Using the fact that sinθ=2sin2θcos2θ and cosθ=2cos22θ−1 [derive with Euler's identity]
We capture the intermediary by letting (w,v):=(cos2θ,sin2θu). (w,v) is referred to as quaternion. This 4d vector is naturally of norm 1.
[Reduced ambiguities]
Since (θ,u)many-to-1⟹(w,v)2-to-1⟹R. The intermediary (w,v) absorbs and consolidates the many-to-one exponential map of (θ,u)⇒R into a 2-to-1 mapping through
(w,v):=(cos2θ,sin2θu). (w,v) and (−w,−v) map to the same rotation, but this 2-to-1 mapping does not correspond to simply flipping the axis-angle. Flipping the signs of (θ,u) in (cos2θ,sin2θu) doesn't change its value.
To change (w,v) to (−w,−v), the options are
(2θ,u)→(2−θ+k⋅2π,−u) for odd int k, e.g. k=1,(2−(2π−θ),u) rotates through the opposite turn.
(2θ,u)→(2−θ+k⋅2π,−u) for odd int k, e.g. k=1,(22π−θ,−u) keeps rotation angle under π.
To keep at the same (w,v), the options are
(2θ,u)→(2−θ+k⋅2π,−u) for some even int k, e.g. k=0
(2θ,u)→(2−θ+k⋅2π,−u) for some even int k, e.g. k=0,(−2θ,−u) for flipped axis-angle.
In this sense, the 2-to-1 ambiguities of quaternion is the result of grouping the many-to-1 ambiguities that axis-angle possesses into "even'' and "odd'' sides, because all four axis-angle configurations, with k being even or odd, map to the same rotation matrix.
6.1. Quaternion → Rotation Matrix
The forward mapping starting from the intermediary (w,v) is
The off-diagonal entries in eq. 14 are unambiguous. For the diagonal entries, using the constraint w2+v12+v22+v32=1, there are three different ways to write it in the literature and various codebases. WLOG consider the 2nd diagonal entry i.e. R11 with 0-based indexing,
Version 3 / eq. 17 provides critical insight for (parallelizable) inverse mapping from rotation matrix to quaternion. Version 1 / eq. 15 is for example used in Gaussian splatting. A few other papers use version 2.
These different diagonal parameterizations will result in different gradients when backprop-ing from R(w,v) to [w,v]. But after projecting the gradient (w.r.t. norm 1 constraint / to be orthogonal to [w,v]), the gradients become the same [missing proof].
6.2. Rotation Matrix → Quaternion
The problem with the inversion step is always "what happens if one of them is 0''.
[Baseline]
We first present a baseline method that involves explicit if-else branching. This is not parallel-friendly and also numerically not ideal because small input perturbations could trigger different logic pathways.
If tr(R)>−1, then we set w=+41+tr(R) and v=4w(R−RT)∨. Note that when R=I,tr(R)=3,w=1,v=40 is correct.
If tr(R)=−1, then w=0,R=RT,∥v∥=1 and vvT=2R+I, from which we read off v12,v22,v32,v1v2,v1v3,v2v3. At most two of v1,v2,v3 could be 0. Fix the non-zero one to be positive, and obtain the other two.
Overall this routine uses a significant number of conditional branching.
[Better method]
Using the diagonal parameterization in eq. 17, we have
The first row of L comes from the norm 1 constraint. We supply it so that the linear system is solvable. Interestingly L−1 is 41L. It makes 21L an involutary, symmetric, orthogonal matrix.
Now that we have the squares of each number, it remains to determine the signs. Using eq. 19, we can use the off-diagonal entries to obtain pairwise products
At any time, at least one of [w,v1,v2,v3] is non-zero due to norm-1 constraint, and thus a common strategy is to select the one with the largest denominator out of the 4 options.
A batch-parallel implementation can be found in PyTorch3d.
6.3. Quaternion → Axis-Angle
[to 4 numbers (θ,u)]
Over the batch set w=cos2θ non-negative so that θ is within [0,π]. Recall and contrast that for axis-angle, no restriction is placed on the value of cosθ for θ to be in [0,π].
[to 3 numbers h∈R3]
If the goal is instead to directly invert to unnormalized axis,
then it is numerically stabler to use small angle approximation arctan(x)=x−31x3+….
See implementation in LieTorch, which is itself based on Sophus.
Moreover, this stackexchange post pointed out that we can use half-angle formula of tangent to implement atan2.
Again the gradient is different depending on the diag parameterization used. But they become the same after projection.
7. Continuity of Rotation Representation
Mapping from matrix R3×3 to lower-dimensional representation needs to be continuous. Inversion map needs to be continuous.
Zhou's definition of continuity and why axis-angle, quat fail to qualify.
Here are three rotation matrices which correspond to rotation angles 175,180,185 around axis 31[1,1,1]
(π−ϵ,u) and (π+ϵ,u) are very similar, but different rotations. Think of rotating 179 vs 181 degrees around axis u. The rotation matrices are very close.
The axis-angle representations are (179,u),(179,−u).
The quaternion representations are (cos2179,sin2179u),(cos2181,sin2181u)→(cos2179,−sin2179u)
8. Interpolation
It is often stated that quaternion is the ideal representation to interpolate rotations with.
Its advantage is not obvious because the behaviors of quaternions are hard to intuit. This section attempts to establish some precise characterizations.
8.1. Axis-Angle Interpolation
Rotations are actions / groups,
and an intuitive way to interpolate from action R1 to R2 by ratio t is to gradually apply the connecting action by
interp(R1→R2,t)=R1(R1TR2)t=(R2R1T)tR1.
It would satisfy the endpoints: interp(R1→R2,0)=R1 and interp(R1→R2,1)=R2.
[Ambiguity of (R1TR2)t,t∈(0,1)]
For diagonalizable A=XΛX−1,t∈R, matrix power is defined as At:=XΛtX−1. In particular, using fact 5, exp(A)t=(Xexp(Λ)X−1)t:=Xexp(Λ)tX−1=Xexp(tΛ)X−1=exp(tXΛX−1)=exp(tA) .
Rotation matrices are unitarily diagonalizable, and in 3D Λ is some diag[1,a+bi,a−bi] with a2+b2=1.
The problem is that for matrix with complex eigenvalues, non-integer power of a complex number (a+bi)t is not uniquely defined [details].
We need to first take complex log and convert a+bi to polar form eiθ, before raising to eit⋅θ. Complex logarithm is not unique. eiθ=ei(θ+k⋅2π) for any integer k. When raised to power t, however,
eit⋅(θ+k⋅2π)=eit⋅θ⋅=1 unless t is integereit⋅k⋅2π.
Hence in general (eiθ)t=(ei(θ+k⋅2π))t for t∈(0,1).
The situation manifests as the ambiguity of log-map from rotation matrix to axis-angle R1TR2=expθ[w^]. We need to ensure θ∈[0,π], so that interpolation takes place over the small arc,
This formula is, at the moment, not easily inventable without thinking about hypercomplex numbers.
Rodrigues's derivation in 1840 uses the geometry of the spherical triangle formed by the 3 rotation axes. That picture is intuitive enough and clearly explains the critical role of half angles and bisecting lines.
See Altmann Rotations Quaternion and Double Groups p159.
But working out the actual formula from the picture requires spherical trigonometry (not too bad) in addition to extensive, non-obvious algebraic steps. It seems improbable that one could naturally arrive at the quaternion product without knowing the final goal beforehand as a guide. Detecting the similarity between Rodrigues' rotation composition rule and quaternion product is easier in hindsight.
I tried finding it from the matrix product by identifying new (w3,v3). Also tried mathematica. Gave up.
8.3. Slerp: spherical linear interpolation
The slerping routine is generally applicable to vectors. Besides rotation, it is most often used to interpolate Gaussian noise vectors that act as initiating latents of deep generative models.
The gist is to establish a basis in the plane spanned by {u,v}, and we need 2 orthogonal axis vectors.
The angle θ between u,v is dependent on the choice of basis. The ordering of the basis vectors may change θ to −θ or 2π−θ. What is invariant is cosθ=cos(−θ)=cos(2π−θ). Assuming u,v are of unit norm, cosθ=u⋅v, and again dot product is invariant to change of basis.
Since the interpolation starts at u, we let it be one of the axis so that the algebra is simpler.
To find the other orthogonal axis w, we need the precise phase, for which we let sinθ=+1−cos2θ in order that the angle θ∈[0,π] and the interpolation travels along the small arc.
v=ucosθ+wsinθ⟹w=+1−(v⋅u)2v−(v⋅u)u.
To compute the interpolated vector with ratio t∈[0,1],
show that (w,v)2=(ww−vTv,2wv)=(cos22θ−sin22θ,2cos2θsin2θu)=(cosθ,sinθu) and by induction for all positive integer t. Negative and real defined analogously. The notion of quat exponential not strictly needed.
One way to think of it is it further parameterizes quaternion. The other way is cayley map (I+A)(I−A)−1. Prove that they always agree.
[Online Patch Switching] problem with cayley map is zero denominator. Adaptively switch patches. Doesn't play nice with running statistics like momentum. paper. Sitao Xiang's paper proposed patch ensemble...
In some sense gradient tangential projection is doing "continuous patch switching''?
10. Modified Cayley Map
A more judicious parameterization?
11. Concerns
When dynamics is involved, exp map might be more natural since angular velocities are naturally expressed in axis-angle form.