Problem Definition

Multipath

Let \(t\) denote a discrete time step, and let \(s_t\) denote the state of an agent at time \(t\), the future trajectory \(\mathbf{s} = [s_1, \dots, s_T ]\) is a sequence of states from \(t = 1\) to a fixed time horizon \(T\). The authors refer to a state in a trajectory as a waypoint.

Suppose we wish to model a set of K anchor trajectories \(\mathcal{A} = \{ \mathbf{a}^k \}_{k=1}^K\), where each anchor trajectory is a sequence of states: \(\mathbf{a}^k = [ a_1^k, \dots, a_T^k]\).

In Multipath (Chai, 2019), the authors make the simplifying assumption that uncertainty is unimodal given intent, and model control uncertainty as a Gaussian distribution dependent on each waypoint state of an anchor trajectory:

\[\phi(s_t^k \mid \mathbf{a}^k, \mathbf{x}) = \mathcal{N}\Big(s_t^k \mid a_t^k + \mu_t^k(\mathbf{x}), \Sigma_t^k(\mathbf{x})\Big)\]

The Gaussian parameters \(\mu_t^k\) and \(\sigma_t^k\) are directly predicted by the model as a function of x for each time-step of each anchor trajectory \(\mathbf{a}_t^k\). Note in the Gaussian distribution mean, \(a_t^k + \mu_t^k(\mathbf{x})\), the \(\mu_t^k(\mathbf{x})\) represents a scene-specific offset from the anchor state \(a_t^k\); it can be thought of as modeling a scene-specific residual or error term on top of the prior anchor distribution.

To obtain a distribution over the entire state space, we marginalize over agent intent:

\[p(\mathbf{s} \mid \mathbf{x}) = \sum\limits_{k=1}^K \pi(\mathbf{a}^k \mid \mathbf{x}) \prod\limits_{t=1}^T \phi(s_t \mid \mathbf{a}^k, \mathbf{x})\]

Note that this yields a Gaussian Mixture Model distribution, with mixture weights fixed over all time steps.

Given data of the form \(\{(\mathbf{x}^m, \mathbf{\hat{s}}^m)\}_{m=1}^M\), the authors learn to predict distribution parameters \(\pi(\mathbf{a}^k \mid \mathbf{x}), \mu(\mathbf{x})_t^k\) and \(\Sigma(\mathbf{x})_t^k\) as outputs of a deep neural network parameterized by weights \(\theta\) with the following negative log-likelihood loss built upon Equation 2 above

\[\ell(\theta) = - \sum_{m=1}^M \sum_{k=1}^K \mathbb{1}(k = \hat{k}^m) \Bigg[ \log \pi(\mathbf{a}^k \mid \mathbf{x}^m; \theta) + \sum\limits_{t=1}^T \log \mathcal{N} (s_t^k \mid a_t^k + \mu_t^k, \Sigma_t^k; \mathbf{x}^m; \theta)\Bigg]\]

a time-sequence extension of standard GMM likelihood fitting. The notation \(\mathbb{1}(\cdot)\) is the indicator function, and \(\hat{k}^m\) is the index of the anchor most closely matching the groundtruth trajectory \(\mathbf{\hat{s}}^m\), measured as \(\ell^2\)-norm distance in state-sequence space. This hard-assignment of groundtruth anchors sidesteps the intractability of direct GMM likelihood fitting, avoids resorting to an expectation-maximization procedure, and gives practitioners control over the design of the anchors as they wish (see our choice below). One could also employ a soft-assignment to anchors (e.g., proportional to the distance of the anchor to the groundtruth trajectory), just as easily

EWTA

“evolving WTA” (EWTA) (Makansi et al, 2019)

References

[1] Yuning Chai, Benjamin Sapp, Mayank Bansal, Dragomir Anguelov. MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction. CoRL, 2019. PDF.

[2] Osama Makansi, Eddy Ilg, Ozgun Cicek, and Thomas Brox. Overcoming limitations of mixture density networks: A sampling and fitting framework for multimodal future prediction. CVPR 2019. PDF