Why slerp the quaternions?

2024-12

We use quaternion slerping to interpolate between rotations. But I've seldom seen a precise discussion on what it does and why it's correct. This tutorial/review article provides a walkthrough of some core identities for 3D rotations, and ends with a discussion that establishes the equivalence between axis-angle interpolation and quaternion slerping.

1. 3D skew-symmetric matrix
2. Matrix Exponential
3. Rotation Exponential Map
3.1. axis-angle map: 4-number, constrained.
3.2. exponential map: 3-number, unconstrained
3.3. Log Map, Matrix $\rightarrow$ Axis-Angle, and inversion in general
4. A Historical Note on Quaternions
5. Absorbing ambiguities with intermediary: a simple 5D representation
6. Quaternion, a 4D representation
6.1. Quaternion $\rightarrow$ Rotation Matrix
6.2. Rotation Matrix $\rightarrow$ Quaternion
6.3. Quaternion $\rightarrow$ Axis-Angle
6.4. Gradient of Matrix w.r.t. Quaternion
7. Interpolation
7.1. Axis-Angle Interpolation
7.2. Quaternion product
7.3. Slerp: spherical linear interpolation
7.4. Quaternion Slerp

1. 3D skew-symmetric matrix

\mX

is skew symmetric if

\mX^T = -\mX

. The common way to parameterize such a matrix in

\R^{3 \times 3}

is with a vector

\vu = [x, y, z]^T

s.t.

\begin{align} \skewm{\vu} = \begin{bmatrix*}[r] 0 & -z & y \\ z & 0 & -x \\ -y & x & 0 \\ \end{bmatrix*}.\end{align}

The variable placement and the signs are very deliberate.

\skewm{\vu}

is rank-2 with null-space being

\vu

itself.

Useful identities of a skew-symmetric matrix:

Cross product matrix: $\skewm{\vx} \vy = \vx \times \vy \qquad \skewm{\vx} \vy = -\skewm{\vy} \vx \qquad \skewm{\vx} \vx = \bm{0}$ .
$\skewm{\vu}^2 = \vu \vu^T - (\vu^T\vu) \mI$ . $\tr{\skewm{\vu}^2} = -2\vu^T\vu$ . If $\vu$ has norm 1, $\skewm{\vu}^2 = \vu \vu^T - \mI$ .

2. Matrix Exponential

Define matrix exponential of a square matrix

\mX \in \R^{n \times n}

to be the series

\begin{align} \exp{(\mX)} := \sum_{k=0}^\infty \frac{\mX^k}{k!}. \label{eq:expm}\end{align}

Fact 1: The series converges (absolutely; allows reordering) [proof].

\exp{(\mA)}\exp{(\mB)} \neq \exp{(\mB)}\exp{(\mA)}

in general.

\exp{(\mX)}

results in an invertible matrix [proof]:

\exp{(-\mX)} \exp{(\mX)} = \exp{(\mX)} \exp{(-\mX)} = \mI

Fact 2: Matrix exponential is generally not a 1-to-1 mapping (it is for

\R^1

). It's generally many-to-1 [example]. See the detailed case breakdown for

\R^{3 \times 3}

in the axis-angle mapping section below.

Fact 3: If

\mX

is skew-symmetric, then

\exp{(\mX)}

is an orthogonal matrix, i.e. its transpose is its inverse:

\exp{(\mX)}^T = \exp{(\mX^T)} = \exp{(-\mX)} = \exp{(\mX)}^{-1}

. Furthermore, its determinant

\text{det}(\exp(\mX)) = \exp{\tr{\mX}} = \exp(0) =1

, by Jacobi's identity [proof]. Thus,

\exp(\mX)

produces special orthogonal matrices

SO(n)

We also claim that every special orthogonal matrix

\mR

can be generated by some skew-symmetric matrix

\mX

. The mapping is surjective [missing proof].

Fact 4: If

(\lambda,\vv)

is an eigen val/vec pair of

\mX

, then

\exp(\mX) \vv = \sum_{k=0}^\infty \frac{\lambda^k}{k!} \vv

(e^\lambda, \vv)

is an eigen pair of

\exp(\mX)

. Fact 5:

\exp{(\mB \mA \mB^{-1})} \mycolor{lightgray}{ = \mI + \frac{(\mB \mA \mB^{-1})^2}{2!} + \dots = \mB \mI \mB^{-1} + \mB\frac{\mA^2}{2!}\mB^{-1} + \dots }= \mB \exp{(\mA)} \mB^{-1}

Fact 6:

\partial_t \exp({t\mA}) \mycolor{lightgray}{ = \partial_t(\mI + t\mA + \frac{t^2}{2!}\mA^2 + \dots ) = \mA + t\mA^2 + \frac{t^2}{2!} \mA^3 + \dots }= \mA \exp{(t\mA)} = \exp{(t\mA)} \mA

. Let

\vy(t) = \exp({t\mA})\vx

. Then

\dot \vy(t) \mycolor{lightgray}{ = \mA \exp{(t\mA)} \vx } = \mA \vy(t)

We are therefore able to come up with guesses when encountering ODEs of the form

\begin{align} \dot \mB = \mA \mB \text{ or } \mB \mA &~\Longrightarrow~ \mB(t) = \exp(t\mA) \nonumber \\ \dot \vy = \mA \vy &~\Longrightarrow~ \vy(t) = \exp({t\mA}) \vx \quad \text{for some initial $\vx$}.\end{align}

These are less useful for rotations, but central in modeling probability distributions of markov chains.

3. Rotation Exponential Map

3.1. axis-angle map: 4-number, constrained.

In 3D, given

(\theta \in \R, \vu \in \R^3)

, with

\|\vu\| = 1

and

\theta

without

range restriction,

\skewm{\vu}

is skew-symmetric, and by fact 3

\begin{align} \exp{(\theta \skewm{\vu})} = \sum_k \frac{\theta^k}{k!}\skewm{\vu}^k\end{align}

is a spatial rotation matrix in

SO(3)

Using the constraint

\| \vu \| = 1, \skewm{\vu}^2 = \vu \vu^T - \mI

\begin{align} \skewm{\vu}^0 = \mI \quad \skewm{\vu}^1 = \skewm{\vu} \quad \skewm{\vu}^2 = \vu \vu^T - \mI \quad \skewm{\vu}^3 = -\skewm{\vu} \quad \skewm{\vu}^4 = - \skewm{\vu}^2 \dots\end{align}

and thus

\begin{align} \exp{(\theta \skewm{\vu})} &= \mI + \skewm{\vu}(\theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \dots) + \skewm{\vu}^2(\frac{\theta^2}{2!} - \frac{\theta^4}{4!} + \frac{\theta^6}{6!} - \dots) \nonumber \\ &= \boxed{ \mI + \sin\theta \skewm{\vu} + (1 - \cos\theta) \skewm{\vu}^2 \quad \text{Rodrigues formula} } \label{eq:rod}\end{align}

produces a rotation matrix. Futhermore, the action preserves vectors along axis

\vu

\begin{align} \exp{(\theta \skewm{\vu})} (s\vu) = s\mI \vu + s \cdot \sin\theta \cancel{\skewm{\vu} \vu} + s\cdot (1 - \cos\theta) \cancel{\skewm{\vu}^2 \vu} = s\vu .\end{align}

For a unit-length vector

\vx

orthogonal to

\vu

\skewm{\vu}\vx = \vu \times \vx

provides the 3rd orthogonal vector

\vy

[\vu, \vx, \vy]

forms a right-hand oriented basis.

\begin{align} \exp{(\theta \skewm{\vu})} \, (t\vx) = \mI (t\vx) + \sin\theta \cdot t \cdot \underbrace{(\vu \times \vx)}_{\vy} + (1 - \cos\theta) \cdot t \cdot \underbrace{ (\vu \times (\vu \times \vx)) }_{-\vx}\end{align}

rotates

t\vx

on the

\vx , \vy

plane, which is orthogonal and right-handed to

\vu

, from

\vx

\vy

by angle

\theta

. The same conclusion can be reached by a spectral analysis on the complex eigenvalues. Action on any vector can be decomposed into action on some

s\vu + t \vx

Therefore it is appropriate to name

(\theta, \vu)

the axis-angle parameterization of a rotation matrix.

This mapping is not injective. To produce the same rotation matrix, for some integer

k

, one could have

$(\theta, \vu)$ and $(\hphantom{-} \theta + k \cdot 2 \pi, \hphantom{-} \vu)$ , e.g. $k=-1, (-(2\pi - \theta), \vu)$ rotates through the opposite turn.
$(\theta, \vu)$ and $(-\theta + k \cdot 2 \pi , -\vu)$ , e.g. $k=1, (2\pi - \theta, -\vu)$ keeps the rotation angle under $\pi$ .
At $\theta = k \cdot 2\pi$ , any $\vu$ maps to identity matrix.

See the cases with quaternion for contrast.

3.2. exponential map: 3-number, unconstrained

Sometimes the terms axis-angle map and exponential map are used interchangeably. We make the distinction more explicit. Axis-angle map takes 4 numbers

(\theta, \vu )

to rotation matrix via matrix exponential.

\|\vu\|=1

simplifies the infinite series into the concise Rodrigues formula.

Given 3 unconstrained numbers

\vh \in \R^3, \vh \neq \bm{0}

, the axis-angle form is

(\|\vh\|, \frac{\vh}{\|\vh\|})

, and

\begin{align} \exp{(\skewm{\vh})} &= \exp \left( \|\vh\| \frac{ \skewm{\vh} }{\|\vh\|} \right) = \mI + \frac{ \sin \|\vh\|}{\|\vh\|} \skewm{\vh} + \frac{(1 - \cos \|\vh\|)}{\|\vh\|^2} \skewm{\vh}^2 \quad \label{eq:rod\_h} .\end{align}

For

\vh = \bm{0}

\exp{(\skewm{\vh})} = \mI

by the series in eq 2. For

\|\vh\|

small and rotation close to

\mI

, it is numerically stabler to use

\begin{align*} \frac{\sin x}{x} = 1 - \frac{x^2}{6} + \frac{x^4}{120} + O(x^6), \quad \frac{1 - \cos x}{x^2} = \frac{1}{2} - \frac{x^2}{24} + \frac{x^4}{720} + O(x^6),\end{align*}

because numerical software is unaware of the interaction bewtween trig terms and

1/x

when the two parts are evaluated separately. To look up series like these, use wolfram alpha.

Eq. 9 bijectively identifies

\underset{\vh \neq \bm{0}}{\vh} \longleftrightarrow (\underset{\mycolor{teal}{\theta > 0}}{\theta}, \vu)

. At

\vh = 0

, we stipulate

\vh \longleftrightarrow (0, \bm{0})

Due to their 1-to-1 correspondence, statements about

\vh \in \R^3

can be made using

(\underset{\theta \ge 0}{\theta}, \vu)

as its proxy. To put it differently,

\theta \ge 0

For example,

\exp(\skewm{\vh}) \neq \exp([\widehat{-\vh}])

, and their proxies

(\theta, \vu)

(\theta, -\vu)

share the same angle but flipped axis, (not opposite angle, same axis).

[Invertibility of exponential map]

exponential map

\exp(\skewm{\vh})

is injective and thus invertible when the proxy angle

(\theta := \|\vh\|, \vu)

is in range

0 \le \theta < \pi

. The inverse is called the logarithm map.

Proof. Earlier we have listed the three types for ambiguous cases of axis-angle mapping.

\exp(\skewm{\vh}) = \mI \Rightarrow \vh = 0

. For

\theta := \|\vh\| \in [0, \pi)

, the most important case

(2\pi - \theta, -\vu)

is the same rotation but its angle

2\pi - \theta > \pi

violates the range. Other cases are outside the range too. Within

[0, \pi)

the mapping is 1-to-1.

Some authors do not make a distinction between axis-angle map and exponential map; we emphasize it at the cost of verbosity. An advantage of exponential map is that it allows unconstrained optimization of rotations via

\vh

. Optimizing axis-angle

(\theta, \vu)

requires care to maintain the norm 1 constraint on

\vu

3.3. Log Map, Matrix $\rightarrow$ Axis-Angle, and inversion in general

Log map inverts rotation matrix

\mR

\vh

such that

\exp(\skewm{\vh}) = \mR

. Again, inverting to

\vh

is 1-to-1 identified as inverting to some

(\underset{\theta \ge 0}{\theta}, \vu)

. So here we consider the general problem of inverting to axis-angle mapping.

Every rotation matrix can be produced by some axis-angle

(\theta, \vu)

s.t.

\theta \in [0, \pi]

, because if

\theta > \pi

(2\pi - \theta, -\vu)

produces the same rotation.

\cos \theta

, since

\arccos

(or

\arctan

) is unambiguous for

[0, \pi]

. On the contrary

\sin \theta

alone is

insufficient

to determine

\theta

. It is no coincidence that both rotation matrix and quaternion provide

\cos \theta

(for quat,

\cos \theta = 2 w^2 - 1

), and

\sin \theta = \pm \sqrt{1 - \cos^2 \theta}

is only decidable up to sign, with

+

chosen to have

\theta

in range

[0, \pi]

In the case of rotation matrix,

$\tr{\mR} = 3 + 0 + (1 - \cos\theta)\cdot(-2) = 1 + 2\cos\theta \Longrightarrow \cos\theta = \frac{\tr{\mR} - 1}{2}, \sin \theta = + \sqrt{1 - \cos^2\theta}$ .
$\mR - \mR^T = 2 \skewm{\vu} \sin \theta$ . $\vu = \frac{(\mR - \mR^T)^{\vee}}{2\sin\theta}$ if $\sin \theta \neq 0$ . By deciding the sign of $\sin \theta$ to be $+$ , we have also locked the sign of $\vu$ . If $\tr{\mR}=-1,\cos\theta=-1, \theta = \pi$ , $\mR$ is symmetric, and $\sin\theta = 0$ ; we obtain pairwise products of entries of $\vu$ by $\mR = \mI + 2\skewm{\vu}^2 \Rightarrow \vu \vu^T = \frac{\mR + \mI}{2}$ . With $\sin\theta=0$ , the sign of $\vu$ is free to flip. Set one of the non-zero element to positive, and get the rest from pairwise products.
$\mR + \mR^T = 2(\mI + (1 - \underbrace{\cos \theta}_{ (\tr{\mR} - 1)/2) } ) \skewm{\vu}^2 ) \Longrightarrow \vu \vu^T = \frac{\mR + \mR^T + (1 - \tr{\mR})\mI}{3 - \tr{\mR}}$ if $\tr{\mR} \neq 3$ . Rotation axis at $\mI$ is undefined. This identity provides pairwise products of $\vu$ at all $\theta > 0$ , and we can compute $\vu$ up to sign flip as described above. But make sure that the sign of $\vu$ is compatible with the +ve sign of $\sin \theta$ as their products are locked by $\skewm{\vu} \sin \theta = \frac{\mR - \mR^T}{2}$ . The need for the compatibility check makes this route not that useful vs item 2.

The alternative is rotation matrix

\rightarrow

quaternion

\rightarrow

axis-angle. The baseline approach in rotation matrix

\rightarrow

quaternion is similar to what is outlined in this section.

4. A Historical Note on Quaternions

We discuss an easier and more historically faithful way to motivate the use of quaternion for rotations.

Quaternions are often introduced as hypercomplex numbers. The problem with this presentation is that it is difficult to intuit

a priori

why conjugation

\mR(\vq) \vx = \vq \vx \vq^*

should perform rotation acting on

\vx

while product

\vq_3 = \vq_2 \vq_1

implements rotation composition. The formulae can be algebraically verified posthoc, but their origins are puzzling. Why not use

\vx' = \vq \vx

to rotate vectors? Hamilton himself seemed to have struggled with this exact question, even after Cayley's article came to his attention.

It turns out that historically Rodrigues first developed the use of a 3- or 4-number tuple as a rational representation of rotations without trigonometric functions. The concept of a matrix was not yet prevalent, but his proposed formulae in effect 1) converts the 3/4 numbers to a rotation matrix, and 2) composes successive rotations into a new 3/4-number tuple.

Arthur Cayley studied [#6, 1843] Rodrigues' results, and noticed [#20, 1845] that quaternion conjugation coincides with Rodrigues' formulae. His background in Rodrigues' prior work made the discovery much more plausible, and he went on to extend quaternions in several directions, one of which is the representation of 4D rotations [#137].

5. Absorbing ambiguities with intermediary: a simple 5D representation

The spirit is to capture intermediate variables in Rodrigues mapping of eq. 6.

One source of ambiguity in axis-angle is the angle. We over-parameterize

\theta

to absorb its ambiguities.

Let

(x, y) := (\cos \theta, \sin \theta)

. The naming is consistent with

\cos\theta

measuring the

x

-axis of a point on unit circle.

The forward mapping becomes

\begin{align} \mR(x, y, \vu) = \mI + y \skewm{\vu} + (1 - x) \skewm{\vu}^2 .\end{align}

The ambiguities have been reduced. To map to the same rotation matrix,

$(x, y, \vu)$ and $(x, -y, -\vu)$ . This keeps the rotation angle under $\pi$ .
When $(x, y) = (1, 0)$ , any $\vu$ maps to identity matrix.

One way to motivate quaternion is that it resolves case 2 by

\vu

6. Quaternion, a 4D representation

Using the fact that

\sin \theta = 2 \sin\tfrac{\theta}{2} \cos\tfrac{\theta}{2}

and

\cos \theta = 2 \cos^2 \frac{\theta}{2} - 1

[derive with Euler's identity]

\begin{align} \exp(\theta \skewm{\vu} ) &= \mI + 2 \cos\frac{\theta}{2} \sin\frac{\theta}{2} \skewm{\vu} + (1 - 2\cos^2 \frac{\theta}{2} + 1) \skewm{\vu}^2 \nonumber \\ &= \mI + 2 \cos\frac{\theta}{2} \sin\frac{\theta}{2} \skewm{\vu} + 2 \left( \sin \frac{\theta}{2} \skewm{\vu} \right)^2.\end{align}

We capture the intermediary by letting

\boxed{ (w, \vv) := ( \cos \tfrac{\theta}{2}, \sin \tfrac{\theta}{2} \vu )}

(w, \vv)

is referred to as quaternion. This 4d vector is naturally of norm 1.

[Reduced ambiguities]

Since

(\theta, \vu) \underbrace{\Longrightarrow}_{\text{many-to-1}} (w, \vv) \underbrace{\Longrightarrow}_{\text{2-to-1}} \mR

. The intermediary

(w, \vv)

absorbs and consolidates the many-to-one exponential map of

(\theta, \vu) \Rightarrow \mR

into a 2-to-1 mapping through

(w, \vv) := (\cos \frac{\theta}{2}, \sin \frac{\theta}{2} \vu )

(w, \vv)

and

(-w, -\vv)

map to the same rotation, but this 2-to-1 mapping

does not

correspond to simply flipping the axis-angle. Flipping the signs of

(\theta, \vu)

(\cos \frac{\theta}{2}, \sin \frac{\theta}{2} \vu )

doesn't change its value. To change

(w, \vv)

(-w, -\vv)

, the options are

$(\frac{\theta}{2}, \vu) \rightarrow (\frac{ \hphantom{-} \theta + k \cdot 2\pi}{2}, \hphantom{-}\vu)$ for odd int $k$ , e.g. $k=1, (\frac{-(2\pi - \theta)}{2}, \vu)$ rotates through the opposite turn.
$(\frac{\theta}{2}, \vu) \rightarrow (\frac{ -\theta + k \cdot 2\pi}{2}, -\vu)$ for odd int $k$ , e.g. $k=1, (\frac{2\pi - \theta}{2}, -\vu)$ keeps rotation angle under $\pi$ .

To keep at the same

(w, \vv)

, the options are

$(\frac{\theta}{2}, \vu) \rightarrow (\frac{\hphantom{-}\theta + k \cdot 2 \pi}{2}, \hphantom{-}\vu)$ for some even int $k$ , e.g. $k=0$
$(\frac{\theta}{2}, \vu) \rightarrow (\frac{-\theta + k \cdot 2\pi }{2}, -\vu)$ for some even int $k$ , e.g. $k=0, (-\frac{\theta}{2}, -\vu)$ for flipped axis-angle.

In this sense, the 2-to-1 ambiguities of quaternion is the result of grouping the many-to-1 ambiguities that axis-angle possesses into "even" and "odd" sides, because all four axis-angle configurations, with

k

being even or odd, map to the same rotation matrix.

6.1. Quaternion $\rightarrow$ Rotation Matrix

The forward mapping starting from the intermediary

(w, \vv)

\begin{align} \mR(w, \vv) &= \mI + 2 w \cdot \skewm{\vv} + 2\skewm{\vv}^2 \\ &= \mI + 2 w \cdot \skewm{\vv} + 2 (\vv \vv^T - \vv^T \vv\mI) \nonumber \\ &= \left(1 - 2(v_1^2 + v_2^2 + v_3^2) \right) \mI + 2(\vv \vv^T +w \cdot \skewm{\vv}) \nonumber \\ &= \underbrace{ \left(1 - 2(v_1^2 + v_2^2 + v_3^2) \right) \mI + 2 \begin{bmatrix} v_1^2 & 0 & 0 \\ 0 & v_2^2 & 0 \\ 0 & 0 & v_3^2 \end{bmatrix} }_{\text{diagonal entries}} + \underbrace{ 2 \begin{bmatrix} 0 & v_1 v_2 - v_3 w & v_1 v_3 + v_2 w \\ v_2 v_1 + v_3 w & 0 & v_2 v_3 - v_1 w \\ v_3 v_1 - v_2 w & v_3 v_2 + v_1 w & 0 \end{bmatrix} }_{\text{off-diagonal entries}} . \label{eq:quat2mat}\end{align}

The off-diagonal entries in eq. 13 are unambiguous. For the diagonal entries, using the constraint

w^2 + v_1^2 + v_2^2 + v_3^2 = 1

, there are three different ways to write it in the literature and various codebases. WLOG consider the 2nd diagonal entry i.e.

\mR_{11}

with 0-based indexing,

\begin{align} \mR_{11} = 1 - 2(v_1^2 + v_2^2 + v_3^2) + 2v_2^2 &= 1 - 2v_1^2 - 2 v_3^2 \label{eq:quatm\_v1} \\ &= 1 - 2(1 - w^2) + 2v_2^2 = 2 w^2 + 2v_2^2 - 1 \label{eq:quatm\_v2} \\ &= (w^2 + v_1^2 + v_2^2 + v_3^2) - 2v_1^2 - 2 v_3^2 = w^2 - v_1^2 + v_2^2 - v_3^2 . \label{eq:quatm\_v3}\end{align}

Two remarks

Version 3 / eq. 16 provides critical insight for (parallelizable) inverse mapping from rotation matrix to quaternion. Version 1 / eq. 14 is for example used in Gaussian splatting. A few other papers use version 2.
These different diagonal parameterizations will result in $different gradients$ when backprop-ing from $\mR(w, \vv)$ to $[w, \vv]$ . But after projecting the gradient (w.r.t. norm $1$ constraint / to be orthogonal to $[w, \vv]$ ), the gradients become $the same$ [missing proof].

6.2. Rotation Matrix $\rightarrow$ Quaternion

The problem with the inversion step is always "what happens if one of them is

0

[Baseline]

We first present a baseline method that involves explicit if-else branching. This is not parallel-friendly and also numerically not ideal because small input perturbations could execute a different code branch.

\begin{align} \tr{\mR} = \mycolor{lightgray}{3 + 0 + 2 \cdot -2 \vv^T \vv =} 3 - 4 \| \vv \|^2 &~\Longrightarrow~ \| \vv \|^2 = \frac{3 - \tr{\mR}}{4}, ~ w^2 = \mycolor{lightgray}{ 1 - \| \vv \|^2 = } \frac{1 + \tr{\mR}}{4} . \\ \mR - \mR^T = 4w \cdot \skewm{\vv} &, \qquad \mR + \mR^T = \mycolor{lightgray}{ 2\mI + 4\skewm{\vv}^2 } = 4 \vv \vv^T + (2 - 4\|\vv\|^2)\mI \label{eq:quatm\_offdiag} .\end{align}

\tr{\mR} > -1

, then we set

w = + \sqrt{\frac{1 + \tr{\mR}}{4}}

and

\vv = \frac{(\mR - \mR^T)^{\vee}}{4w}

. Note that when

\mR = \mI, \tr{\mR} = 3, w=1, v = \frac{\bm{0}}{4}

is correct.

\tr{\mR} = -1

, then

w=0, \mR = \mR^T, \| \vv \| = 1

and

\vv \vv^T = \tfrac{\mR + \mI}{2}

, from which we read off

v_1^2, v_2^2, v_3^2, v_1v_2, v_1v_3, v_2v_3

. At most two of

v_1, v_2, v_3

could be

0

. Fix the non-zero one to be positive, and obtain the other two.

Overall this routine uses a significant number of conditional branching.

[Better method]

Using the diagonal parameterization in eq. 16, we have

\begin{align} \underbrace{ \begin{bmatrix*}[r] 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \\ \end{bmatrix*} }_{\mL} \begin{bmatrix} w^2 \\ v_1^2 \\ v_2^2 \\ v_3^2 \end{bmatrix} = \begin{bmatrix} 1 \\ \mR_{00} \\ \mR_{11} \\ \mR_{22} \end{bmatrix} \quad \Longrightarrow \quad \begin{bmatrix} w^2 \\ v_1^2 \\ v_2^2 \\ v_3^2 \end{bmatrix} = \frac{1}{4} \mL \begin{bmatrix} 1 \\ \mR_{00} \\ \mR_{11} \\ \mR_{22} \end{bmatrix} .\end{align}

The first row of

\mL

comes from the norm 1 constraint. We supply it so that the linear system is solvable. Interestingly

\mL^{-1}

\frac{1}{4}\mL

. It makes

\frac{1}{2}\mL

an involutary, symmetric, orthogonal matrix.

Now that we have the squares of each number, it remains to determine the signs. Using eq. 18, we can use the off-diagonal entries to obtain pairwise products

\begin{align} w v_1 = \frac{\mR_{21} - \mR_{12}}{4},& \quad w v_2 = \frac{\mR_{02} - \mR_{20}}{4}, \quad w v_3 = \frac{\mR_{10} - \mR_{01}}{4} , \nonumber \\ v_1 v_2 = \frac{\mR_{01} + \mR_{10}}{4},& \quad v_1 v_3 = \frac{\mR_{02} + \mR_{20}}{4}, \quad v_2 v_3 = \frac{\mR_{12} + \mR_{21}}{4} ,\end{align}

and finally

\begin{align} \myscalebox{1.15}{ [w, v_1, v_2, v_3] = \frac{[w^2, wv_1, w v_2, w v_3]}{+\sqrt{w^2}} = \frac{[wv_1, v_1^2, v_1v_2, v_1v_3]}{+\sqrt{v_1^2}} = \frac{[wv_2, v_1v_2, v_2^2, v_2v_3]}{+\sqrt{v_2^2}} = \frac{[wv_3, v_1v_3, v_2v_3, v_3^2]}{+\sqrt{v_3^2}} . }\end{align}

At any time, at least one of

[w, v_1, v_2, v_3]

is non-zero due to norm-1 constraint, and thus a common strategy is to select the one with the largest denominator out of the 4 options.

A batch-parallel implementation can be found in PyTorch3d.

6.3. Quaternion $\rightarrow$ Axis-Angle

(\theta, \vu)

Over the batch set

w = \cos \frac{\theta}{2}

non-negative so that

\theta

is within

[0, \pi]

. Recall and contrast that for axis-angle, no restriction is placed on the value of

\cos \theta

for

\theta

to be in

[0, \pi]

\begin{align} \vq &= [\vq.w \ge 0] \cdot \vq \nonumber \\ \theta &= 2 \cdot \text{atan2}(\|\vv\|, w) \nonumber \\ \vu &= \vv / \| \vv \| \text{~~if~~} \theta > 0 \text{~~else~~} \vu = \bm{0} .\end{align}

\vh \in \R^3

If the goal is instead to directly invert to unnormalized axis, then it is numerically stabler to use small angle approximation

\arctan(x) = x - \tfrac{1}{3}x^3 + \dots

\begin{align} \vq &= [\vq.w \ge 0] \cdot \vq \nonumber \\ \vh &= 2\left( \frac{\|\vv\|}{w} - \frac{1}{3} \left(\frac{\|\vv\|}{w} \right)^3 \right) \frac{\vv}{\|\vv\|} = \left( \frac{2}{w} - \frac{2}{3} \cdot \frac{\|\vv\|^2}{w^3} \right) \vv \text{~~if~~} \| \vv \| < \epsilon \nonumber \\ &= 2 \cdot \text{atan2}(\|\vv\|, w) \cdot \frac{\vv}{\|\vv\|} \text{~~else} .\end{align}

See implementation in LieTorch, which is itself based on Sophus. Moreover, this stackexchange post pointed out that we can use half-angle formula of tangent to implement atan2.

6.4. Gradient of Matrix w.r.t. Quaternion

\begin{align} \partial_{v_1} \underbrace{ \skewm{\vv}^2 }_{\vv \vv^T - \vv^T \vv \mI} = \begin{bmatrix} 0 & v_2 & v_3 \\ v_2 & -2v_1 & 0 \\ v_3 & 0 & -2v_1 \\ \end{bmatrix} ~ \partial_{v_2} \skewm{\vv}^2 = \begin{bmatrix} -2v_2 & v_1 & 0 \\ v_1 & 0 & v_3 \\ 0 & v_3 & -2v_2 \\ \end{bmatrix} ~ \partial_{v_3} \skewm{\vv}^2 = \begin{bmatrix} -2v_3 & 0 & v_1 \\ 0 & -2v_3 & v_2 \\ v_1 & v_2 & 0 \\ \end{bmatrix} .\end{align}

Using the diagonal param of eq. 14,

\partial_w \mR = 2 \skewm{\vv}

, and

\begin{align} \myscalebox{0.65}{ \partial_{v_1} \mR = 2\left( \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & -w \\ 0 & w & 0 \\ \end{bmatrix} + \begin{bmatrix} 0 & v_2 & v_3 \\ v_2 & -2v_1 & 0 \\ v_3 & 0 & -2v_1 \\ \end{bmatrix} \right) } ~ \myscalebox{0.65}{ \partial_{v_2} \mR = 2\left( \begin{bmatrix} 0 & 0 & w \\ 0 & 0 & 0 \\ -w & 0 & 0 \\ \end{bmatrix} + \begin{bmatrix} -2v_2 & v_1 & 0 \\ v_1 & 0 & v_3 \\ 0 & v_3 & -2v_2 \\ \end{bmatrix} \right) } ~ \myscalebox{0.65}{ \partial_{v_3} \mR = 2\left( \begin{bmatrix} 0 & -w & 0 \\ w & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} + \begin{bmatrix} -2v_3 & 0 & v_1 \\ 0 & -2v_3 & v_2 \\ v_1 & v_2 & 0 \\ \end{bmatrix} \right) } .\end{align}

Again the gradient is different depending on the diag parameterization used. But they become the same after projection.

\begin{align*} \def\arraystretch{1.5} \begin{array}{|c|c|c|} \hline \vu = \frac{1}{\sqrt{3}} [1, 1, 1] & 179^\circ \text{ around } \vu & 182^\circ \text{ around } \vu \\ \hline \textbf{Matrix} & \begin{bmatrix*}[r] -0.3332 & 0.6565 & 0.6767 \\ 0.6767 & -0.3332 & 0.6565 \\ 0.6565 & 0.6767 & -0.3332 \end{bmatrix*} & \begin{bmatrix*}[r] -0.3329 & 0.6866 & 0.6463 \\ 0.6463 & -0.3329 & 0.6866 \\ 0.6866 & 0.6463 & -0.3329 \end{bmatrix*} \\ \hline \textbf{Axis-Angle} & (179^{\circ}, \vu) & \mycolor{lightgray}{ (360^{\circ}-182^{\circ}, -\vu) = } (178^{\circ}, -\vu) \\ \hline \textbf{Quaternion} & \left(\cos\frac{179^\circ}{2}, \vu \sin \frac{179^\circ}{2}\right) & \mycolor{lightgray}{\left(\cos\frac{360^\circ-182^\circ}{2}, -\vu \sin \frac{360^\circ-182^\circ}{2}\right) }= \left(\cos\frac{178^\circ}{2}, -\vu\sin\frac{178^\circ}{2}\right) \\ \hline \end{array}\end{align*}

7. Interpolation

It is often stated that quaternion is the ideal representation to interpolate rotations with. Its advantage is not obvious because the behaviors of quaternions are hard to intuit. This section attempts to establish some precise characterizations.

7.1. Axis-Angle Interpolation

Rotations are actions / groups, and an intuitive way to interpolate from action

\mR_1

\mR_2

by ratio

t

is to gradually apply the connecting action by

\begin{align} \text{interp}(\mR_1 \rightarrow \mR_2, t) = \mR_1 (\mR_1^T\mR_2)^t \mycolor{lightgray}{= (\mR_2 \mR_1^T)^t \mR_1}.\end{align}

It would satisfy the endpoints:

\text{interp}(\mR_1 \rightarrow \mR_2, 0) = \mR_1

and

\text{interp}(\mR_1 \rightarrow \mR_2, 1) = \mR_2

For diagonalizable

\mA = \mX \bm{\Lambda} \mX^{-1}, t \in \R

, matrix power is defined as

\mA^t := \mX \bm{\Lambda}^t \mX^{-1}

. In particular, using fact 5,

\exp(\mA)^t = \mycolor{lightgray}{ (\mX\exp(\bm{\Lambda})\mX^{-1})^t := \mX\exp(\bm{\Lambda})^t\mX^{-1} = \mX\exp(t\bm{\Lambda})\mX^{-1} = \exp(t\mX\bm{\Lambda}\mX^{-1}) } = \exp(t\mA)

. The natural way to implement this interpolation is to invert

\mR_1^T\mR_2

to its axis-angle mapping and then do

\mR_1 (\exp \theta \skewm{\vw})^t

(\mR_1^T\mR_2)^t, t \in (0, 1)

Just as the axis-angle form of a rotation is not unique, raising matrix to a power

t

is not unique either. Rotation matrices are unitarily diagonalizable, and in 3D

\bm{\Lambda}

is some

\textbf{diag}[1, a+bi, a-bi]

with

a^2 + b^2=1

. The problem is that for matrix with complex eigenvalues,

(a + bi)^t

[details]. We need to first take complex log and convert

a + bi

to polar form

e^{i\theta}

, before raising to

e^{it \cdot \theta}

. Complex logarithm is not unique (just like axis-angle of a rotation is not).

e^{i\theta} = e^{i(\theta + k \cdot 2\pi)}

for any integer

k

. When raised to power

t

, however,

\begin{align} e^{it \cdot (\theta + k \cdot 2\pi)} = e^{it \cdot \theta} \cdot \underbrace{e^{it\cdot k \cdot 2\pi}}_{ \neq 1 \text{ unless t is integer} } .\end{align}

Hence in general

( e^{i\theta} )^t \neq ( e^{i(\theta + k \cdot 2\pi)} )^t

for

t \in (0, 1)

The situation manifests as the ambiguity of log-map from rotation matrix to axis-angle

\mR_1^T\mR_2 = \exp \theta \skewm{\vw}

. We need to ensure

\theta \in [0, \pi]

, so that interpolation takes place over the small arc,

\begin{align} \text{interp}(\mR_1 \rightarrow \mR_2, t) &= \mR_1 (\exp \theta \skewm{\vw})^t = \mR_1 \exp (t\theta \skewm{\vw}) \nonumber \\ &= \mR_1 (\mI + \sin t\theta \skewm{\vw} + (1 - \cos t\theta) \skewm{\vw}^2).\end{align}

The overall goal of the section is to show that quaternion slerping is exactly equivalent to this formula.

7.2. Quaternion product

The Quaternion product is

\begin{align} (w_2, \vv_2) (w_1, \vv_1) &= (w_2w_1 - \vv_2 \vv_1, w_2 \vv_1 + w_1 \vv_2 + \vv_2 \times \vv_1) \\ &= \left( w_2 \mI + \begin{bmatrix} 0 & -\vv_2^T \\ \vv_2 & \skewm{\vv_2} \end{bmatrix} \right) \begin{bmatrix}w_1 \\ \vv_1\end{bmatrix}. \label{eq:quat\_prod\_mat}\end{align}

This formula is not easily inventable without thinking about hypercomplex numbers. The matrix version of quaternion product in eq. 30 is used in rigid body dynamics to write a linear ODE.

Rodrigues's derivation in 1840 uses the geometry of the spherical triangle formed by the 3 rotation axes. That picture is intuitive enough and clearly explains the critical role of half angles and bisecting lines. See Altmann Rotations Quaternion and Double Groups p159.

But working out the actual formula from the picture requires spherical trigonometry (not too bad) in addition to extensive, non-obvious algebraic steps. It seems improbable that one could naturally arrive at the quaternion product without knowing the final goal beforehand as a guide. Detecting the similarity between Rodrigues' rotation composition rule and quaternion product seems only possible in hindsight.

7.3. Slerp: spherical linear interpolation

The slerping routine is generally applicable to vectors. Besides rotation, it is most often used to interpolate Gaussian noise vectors that act as initiating latents of deep generative models.

To interpolate from

\vu

\vv

, the gist is to establish a basis in the plane spanned by

\{\vu, \vv\}

, and we need

2

orthogonal axis vectors. The angle

\theta

between

\vu, \vv

dependent

on the choice of basis. The ordering of the basis vectors may change

\theta

-\theta

2\pi - \theta

. What is invariant is

\cos \theta = \cos (-\theta) = \cos(2\pi - \theta)

. Assuming

\vu, \vv

are of unit norm,

\cos\theta = \vu \cdot \vv

, and again dot product is invariant to change of basis.

Since the interpolation starts at

\vu

, we let it be one of the axis for simpler algebra. To find the other orthogonal axis

\vw

, we need the precise phase, for which we let

\sin \theta = + \sqrt{1 - \cos^2 \theta}

in order that the angle

\theta \in [0, \pi]

and the interpolation travels along the small arc.

\begin{align} \vv = \vu \cos \theta + \vw \sin \theta ~\Longrightarrow ~ \vw = \frac{\vv - (\vv \cdot \vu) \vu }{+\sqrt{1 - (\vv \cdot \vu)^2}} .\end{align}

To compute the interpolated vector with ratio

t \in [0, 1]

\begin{align} \text{slerp}(\vu \rightarrow \vv, t) &= \vu \cos t\theta + \vw \sin t\theta = \vu \cos t\theta + \frac{\vv - \vu \cos \theta }{\sin \theta} \cdot \sin t\theta \nonumber \\ &= \frac{\vu(\sin\theta \cos t\theta - \cos \theta \sin t\theta ) + \vv \sin t\theta }{\sin \theta} \nonumber \\ &= \frac{\vu(\sin\theta \cos (-t\theta) + \cos \theta \sin (-t\theta) ) + \vv \sin t\theta }{\sin \theta} \nonumber \\ &= \frac{\vu\sin(1 - t)\theta + \vv \sin t\theta }{\sin \theta} \label{eq:slerp} .\end{align}

The slerping formula in eq. 32 is symmetrical i.e.

\text{slerp}(\vu \rightarrow \vv, t) = \text{slerp}(\vv \rightarrow \vu, 1- t)

7.4. Quaternion Slerp

First of all,

\mR_1 (\mR_1^T\mR_2)^t

directly corresponds to

\vq_1 (\vq_1^* \vq_2)^t

, and we define the connecting

\begin{align} \vq_1^* \vq_2 = (w_1, -\vv_1) (w_2, \vv_2) = (\underbrace{w_1w_2 + \vv_1 \cdot \vv_2}_{\cos (\theta/2)}, \underbrace{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}_{\sin (\theta/2) \, \vu_{1 \rightarrow 2}} ) .\end{align}

The axis-angle interpolation when written in quaternion form becomes

\begin{align} \text{interp}(\mR_1 \rightarrow \mR_2, t) &= \mR_1 (\mR_1^T\mR_2)^t \nonumber \\ &= \mR_1 (\mI + \sin t\theta \skewm{\vw} + (1 - \cos t\theta) \skewm{\vw}^2) \nonumber \\ &= \vq_1 (\vq_1^* \vq_2)^t \nonumber \\ &= \vq_1 ( \cos t \frac{\theta}{2} ,\: \sin t \frac{\theta}{2} \, \vu_{1 \rightarrow 2}) .\end{align}

But this is using quaternion product, not slerping. To slerp between

\vq_1, \vq_2

, and show that's it's equivalent to quat product, we let their geometric angle be

\Omega

\begin{align} \cos \Omega = \vq_1 \cdot \vq_2 &= \mycolor{lightgray}{ w_1 w_2 + \vv_1 \cdot \vv_2 = (\vq_1^*\vq_2).w =} \cos (\theta / 2) \\ \vu_{1 \rightarrow 2} &= \frac{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}{\sin \Omega}.\end{align}

and the power interpolation when expressed using

\cos \Omega = \cos (\theta / 2)

and

\vu_{1 \rightarrow 2} = \frac{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}{\sin \Omega}

\begin{align} \vq_1 (\vq_1^* \vq_2)^t &= \vq_1 ( \cos t \frac{\theta}{2} ,\: \sin t \frac{\theta}{2} \, \vu_{1 \rightarrow 2}) = ( w_1, \vv_1) ( \cos t \Omega ,\; \sin t \Omega \, \vu_{1 \rightarrow 2} ) \nonumber \\ &= ( w_1 \cos t \Omega - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} ,~ w_1 \sin t \Omega \, \vu_{1 \rightarrow 2} + \cos t \Omega \vv_1 + \sin t \Omega \, \vv_1 \times \vu_{1 \rightarrow 2} ) \nonumber \\ &= ( w_1 \cos t \Omega - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} ,\; \cos t \Omega \, \vv_1 + \sin t \Omega (w_1 \vu_{1 \rightarrow 2} + \vv_1 \times \vu_{1 \rightarrow 2} ) .\end{align}

First look at the scalar part of this quaternion,

\begin{align} - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} &= - \sin t \Omega \, \frac{\vv_1 \cdot (w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2)}{\sin \Omega} \nonumber \\ &= \sin t \Omega \, \frac{w_2 \vv_1 \cdot \vv_1 -w_1\vv_1 \cdot \vv_2 }{\sin \Omega} ~~\text{where}~~ \vv_1 \cdot \vv_1 = 1 - w_1^2 \nonumber \\ &= \sin t \Omega \, \frac{w_2 - w_1( w_1 w_2 + \vv_1 \cdot \vv_2) }{\sin \Omega} = \sin t \Omega \, \frac{w_2 - w_1 \cos \Omega }{\sin \Omega}.\end{align}

The scalar part of the quaternion is already very close to the form of the slerping forumla. We now do the same for the quat vector component,

\begin{align} & \sin t \Omega (w_1 \vu_{1 \rightarrow 2} + \vv_1 \times \vu_{1 \rightarrow 2} ) \nonumber \\ =& \sin t \Omega \left( w_1 \frac{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}{\sin \Omega} + \vv_1 \times \frac{w_1 \vv_2 - w_2 \vv_1 - \vv_1 \times \vv_2}{\sin \Omega} \right) \nonumber \\ =& \sin t \Omega \frac{ w_1 w_1 \vv_2 - w_1 w_2 \vv_1 + \vv_1 \times (\vv_1 \times \vv_2) }{\sin \Omega} \nonumber \\ =& \sin t \Omega \frac{ w_1 w_1 \vv_2 - w_1 w_2 \vv_1 + (\vv_1^T \vv_1) \vv_2 - (\vv_1^T \vv_2) \vv_1 }{\sin \Omega} \nonumber \\ =& \sin t \Omega \frac{ (w_1 w_1 + \vv_1^T \vv_1) \vv_2 - (w_1 w_2 + \vv_1^T \vv_2) \vv_1 }{\sin \Omega} = \sin t \Omega \frac{ \vv_2 - \vv_1\cos \Omega }{\sin \Omega} .\end{align}

The last step is to put the scalar and the vector part together,

\begin{align} \vq_1 (\vq_1^* \vq_2)^t &= ( w_1 \cos t \Omega - \sin t \Omega \, \vv_1 \cdot \vu_{1 \rightarrow 2} ,\; \cos t \Omega \, \vv_1 + \sin t \Omega (w_1 \vu_{1 \rightarrow 2} + \vv_1 \times \vu_{1 \rightarrow 2} ) \nonumber \\ &= \left( w_1 \cos t \Omega - \sin t \Omega \, \frac{w_2 - w_1 \cos \Omega }{\sin \Omega},~ \vv_1 \cos t \Omega + \sin t \Omega \frac{ \vv_2 - \vv_1\cos \Omega }{\sin \Omega} \right) \nonumber \\ &= \vq_1 \cos t \Omega - \sin t \Omega \, \frac{\vq_2 - \vq_1 \cos \Omega }{\sin \Omega} = \text{slerp}(\vq_1 \rightarrow \vq_2, t).\end{align}

We have established that

\begin{align} \text{interp}(\mR_1 \rightarrow \mR_2, t) &= \mR_1 (\mR_1^T\mR_2)^t \nonumber \\ &= \mR_1 (\mI + \sin t\theta \skewm{\vw} + (1 - \cos t\theta) \skewm{\vw}^2) \nonumber \\ &= \vq_1 (\vq_1^* \vq_2)^t \nonumber \\ &= \vq_1 ( \cos t \frac{\theta}{2} ,\: \sin t \frac{\theta}{2} \, \vu_{1 \rightarrow 2}) \nonumber \\ &= \text{slerp}(\vq_1 \rightarrow \vq_2, t).\end{align}

Last updated on 2025-11-05. Design inspired by distill.