The model learning for DPADModel is done in 4 steps as follows (Methods):
- In the first optimization step, we learn the parameters
$A'^{(1)}(\cdot)$ ,$K^{(1)}(\cdot)$ , and$C^{(1)}_z(\cdot)$ of the following RNN:
and estimate its latent state RNNModel object with RNNModel implements each of the RNN parameters, RegressionModel class.
- The second optimization step uses the extracted latent state
$x^{(1)}_k$ from the RNN and fits the parameter$C_y^{(1)}$ in
while minimizing the NLL of predicting the neural activity RegressionModel class.
- In the third optimization step, we learn the parameters
$A^{(2)}(\cdot)$ ,$K^{(2)}(\cdot)$ , and$C^{(2)}_y(\cdot)$ of the following RNN:
and estimate its latent state RNNModel object with the concatenation of RNNModel again implements each of the RNN parameters, RegressionModel class.
- The fourth optimization step uses the extracted latent states in optimization steps 1 and 3 (i.e.,
$x^{(1)}_k$ and$x^{(2)}_k$ ) and fits$C_z$ in:
while minimizing the behavior prediction negative log-likelihood. This step again implements RegressionModel class.
For additional options and generalizations to these steps, please read Methods in the DPAD paper.
Objective function of each optimization step is the negative log-likelihood (NLL) associated with the time series predicted in that optimization step, i.e.
To support non-Gaussian data modalities, e.g., categorical behavior, DPAD adjusts the objectives of the four optimization steps and the architecture of the readout parameters based on the NLL of the relevant distribution. For example, for categorical behavior
- we change the behavior readout parameter
$C_z$ to have an output dimension of$n_z \times n_c$ instead of$n_z$ , where$n_c$ denotes the number of behavior categories or classes, and - we apply a Softmax normalization to the output of the behavior readout parameter
$C_z$ to ensure that for each of the$n_z$ behavior dimensions, the predicted probabilities for all the$n_c$ classes add up to 1, so that they represent valid probability mass functions.
For details, see Methods.
We also extend DPAD to modeling intermittently measured behavior time series. To do so, when forming the behavior loss, we only compute the NLL loss on samples where the behavior is measured (i.e., mask the other samples) and solve the optimization with this loss. Doing so, the modeling approach becomes applicable to intermittently measured behavior signals (ED Figs. 8-9, S Fig. 8 in the DPAD paper).