The secondhand smoking blog

2025-02
Comparing the variance of gradients by reinforce vs. re-parameterization
Estimated gradients by reinforce and sampling re-parameterization are equal in expectation. We discuss the contributing factor that causes the former to generally have higher variance.
2025-02
Reward to go as variance reduction
It's nice that we can compute policy gradient using "reward-to-go" instead of sum of whole trajectory reward. The writeup expands on this part that many lecture slides do not cover in details. Sometimes this is referred to as "policy gradient theorem".
2024-12
Why slerp the quaternions?
We use quaternion slerping to interpolate between rotations. But I've seldom seen a precise discussion on what it does and why it's correct. This tutorial/review article provides a walkthrough of some core identities for 3D rotations, and ends with a discussion that establishes the equivalence between axis-angle interpolation and quaternion slerping.
2024-05
A derivation of Fokker Planck equation
We derive the Fokker-Planck equation (forward equation) and the backward equation. FPE is central in diffusion models to guarantee the correctness of image generation algorithm. Despite its extensive use in diffusion literature, the FPE background is not often presented in details.
2022-02
Concise DLT with Einsum
Direct Linear Transform (DLT) problems in computer vision involve messy coefficients in the linear system of equations. Appropriate use of an orthogonalization routine and einsum simplifies the implementation; it also helps to make the code more standardized and readable.
2022-01
Intersection of two lines in 3D by QR / SVD
Finding the intersection of two lines in 3D in a less ad-hoc manner using surface constraints and QR / SVD factorization.

Last updated on 2025-05-07.