Estimated gradients by reinforce and sampling re-parameterization are equal in expectation. We discuss the contributing factor that causes the former to generally have higher variance.
It's nice that we can compute policy gradient using "reward-to-go" instead of sum of whole trajectory reward. The writeup expands on this part that many lecture slides do not cover in details. Sometimes this is referred to as "policy gradient theorem".
We use quaternion slerping to interpolate between rotations. But I've seldom seen a precise discussion on what it does and why it's correct. This tutorial/review article provides a walkthrough of some core identities for 3D rotations, and ends with a discussion that establishes the equivalence between axis-angle interpolation and quaternion slerping.
We derive the Fokker-Planck equation (forward equation) and the backward equation. FPE is central in diffusion models to guarantee the correctness of image generation algorithm. Despite its extensive use in diffusion literature, the FPE background is not often presented in details.
Direct Linear Transform (DLT) problems in computer vision involve messy coefficients in the linear system of equations. Appropriate use of an orthogonalization routine and einsum simplifies the implementation; it also helps to make the code more standardized and readable.
Finding the intersection of two lines in 3D in a less ad-hoc manner using surface constraints and QR / SVD factorization.
Last updated on 2025-05-07.