Merge pull request #116 from lsst/tickets/SP-2880

MelissaGraham · web-flow · commit c5d97ea571ce · 2026-02-05T10:05:34.000-08:00
SP-2880: incorporate review comments into NBs 207 1, 2
diff --git a/DP1/200_Data_Products/207_Timeseries_features/207_1_Timeseries_values.ipynb b/DP1/200_Data_Products/207_Timeseries_features/207_1_Timeseries_values.ipynb
@@ -22,7 +22,7 @@
     "Data Release: [Data Preview 1](https://dp1.lsst.io)\\\n",
     "Container Size: Large\\\n",
     "LSST Science Pipelines version: r29.2.0\\\n",
-    "Last verified to run: 2025-12-30\\\n",
+    "Last verified to run: 2026-02-03\\\n",
     "Repository: [github.com/lsst/tutorial-notebooks](https://github.com/lsst/tutorial-notebooks)\\\n",
     "DOI: [10.11578/rubin/dc.20250909.20](https://doi.org/10.11578/rubin/dc.20250909.20)"
    ]
@@ -85,7 +85,7 @@
     "* `<f>_psfFluxNdata` : The number of associated `diaSources`.\n",
     "* `<f>_psfFluxErrMean` : Mean of the `diaSource` PSF flux errors.\n",
     "* `<f>_psfFluxMin`, `Max` : Minimum and maximum `diaSource` PSF flux.\n",
-    "* `<f>_psfFluxMean`, `MeanErr` : Weighted mean of `diaSource` PSF flux, and its standard error.\n",
+    "* `<f>_psfFluxMean`, `MeanErr` : Inverse variance weighted mean of `diaSource` PSF flux, and its standard error.\n",
     "* `<f>_psfFluxSigma` : Standard deviation of the distribution of `<f>_psfFlux`.\n",
     "* `<f>_psfFluxMAD` : Median absolute deviation (MAD) of `diaSource` PSF flux. Does not include scale factor for comparison to sigma.\n",
     "* `<f>_psfFluxChi2` : $\\chi^2$ statistic for the scatter of `<f>_psfFlux` around `<f>_psfFluxMean`.\n",
@@ -339,7 +339,7 @@
     "However, because the timeseries features in the `DiaObject` table are calculated using the detection (unforced) flux measurements from the `DiaSource` table, this tutorial also uses them.\n",
     "The key difference to be aware of is that the `DiaSource` table only contains flux measurements for visits in which the `diaObject` was detected with SNR$\\geq5$ in the difference image, whereas the `ForcedSourceOnDiaObject` table contains forced flux measurements for *all* visits.\n",
     "\n",
-    "Define a query to retrieve the $g$ and $r$-band difference-image (`psfFlux`) and direct-image (`scienceFlux`) measurements, the flux errors, and the exposure time midpoint Modified Julian Date (`midpointMjdTai`) from the `DiaSource` table."
+    "Define a query to retrieve the $g$- and $r$-band difference-image (`psfFlux`) and direct-image (`scienceFlux`) measurements, the flux errors, and the exposure time midpoint Modified Julian Date (`midpointMjdTai`) from the `DiaSource` table."
    ]
   },
   {
@@ -592,7 +592,7 @@
     "Note that this feature uses the *unweighted* average flux ($\\bar{f}$) in its calculation:\n",
     "$\\sigma_f = \\sqrt{\\frac{\\sum{(f - \\bar{f})^2}}{N-1}}$.\n",
     "\n",
-    "The `<f>_psfFluxMAD`, the median absolute deviation (MAD), is the median value of an array composed of the absolute values of the differences between the `psfFlux` and the median value of the `psfFlux`, for a given filter.\n",
+    "The `<f>_psfFluxMAD`, the median absolute deviation (MAD), is the median value of an array composed of the absolute values of the differences between the `psfFlux` and the median value of the `psfFlux`, for a given filter, where the median flux is `<f>_psfFluxPercentile50` (see Section 3.8).\n",
     "\n",
     "Recalculate the `<f>_psfFluxSigma` and `MAD` columns using the `DiaSource` table, and confirm they match the `DiaObject` table."
    ]
@@ -625,42 +625,59 @@
    "id": "8ba9a97d-2df3-48c4-92af-ebf95b8444ea",
    "metadata": {},
    "source": [
-    "Show the minimum, maximum, mean, sigma, and MAD values as lines on the lightcurve and the flux distribution.\n",
-    "\n",
-    "Define the labels, linewidths, and linestyles of the lines to represent the statistical features, then create the plot."
+    "Show the minimum and maximum, mean and sigma, and median and MAD values as lines on the lightcurve and the flux distribution."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e7dc1ab3-1bf5-4acb-a962-2b18735734f8",
+   "id": "3484a8a8-bb89-4b0d-b19d-dd97997f12c4",
    "metadata": {},
    "outputs": [],
    "source": [
-    "s_lb = ['min/max', None, 'mean', 'sigma', None, 'MAD', None]\n",
-    "s_lw = [1, 1, 1, 2, 2, 1, 1]\n",
-    "s_ls = ['dashed', 'dashed', 'solid', 'dotted', 'dotted', '-.', '-.']\n",
     "fig, ax = plt.subplots(2, 2, figsize=(8, 5), sharex='col')\n",
-    "for f, filt in enumerate(use_filters):\n",
-    "    tx = np.where(star_diaSources['band'] == filt)[0]\n",
-    "    ax[f, 0].errorbar(star_diaSources['midpointMjdTai'][tx], star_diaSources['psfFlux'][tx],\n",
-    "                      yerr=star_diaSources['psfFluxErr'][tx], fmt=f_sym[filt],\n",
-    "                      ms=7, alpha=0.5, mew=0, color=f_col[filt], label=filt + ', star')\n",
-    "    ax[f, 1].hist(star_diaSources['psfFlux'][tx], bins=30,\n",
-    "                  alpha=0.5, color=f_col[filt], label=filt)\n",
-    "    s_vl = [star_diaObject[filt + '_psfFluxMin'], star_diaObject[filt + '_psfFluxMax'],\n",
-    "            star_diaObject[filt + '_psfFluxMean'],\n",
-    "            star_diaObject[filt + '_psfFluxMean'] - star_diaObject[filt + '_psfFluxSigma'],\n",
-    "            star_diaObject[filt + '_psfFluxMean'] + star_diaObject[filt + '_psfFluxSigma'],\n",
-    "            star_diaObject[filt + '_psfFluxMean'] - star_diaObject[filt + '_psfFluxMAD'],\n",
-    "            star_diaObject[filt + '_psfFluxMean'] + star_diaObject[filt + '_psfFluxMAD']]\n",
-    "    for s in range(len(s_vl)):\n",
-    "        ax[f, 0].axhline(s_vl[s], lw=s_lw[s], ls=s_ls[s], color='grey', label=s_lb[s])\n",
-    "        ax[f, 1].axvline(s_vl[s], lw=s_lw[s], ls=s_ls[s], color='grey', label=s_lb[s])\n",
-    "    ax[f, 0].set_ylabel('psfFlux [nJy]')\n",
-    "    ax[f, 1].set_ylabel('N(diaSources)')\n",
-    "    ax[f, 1].legend(bbox_to_anchor=(1.7, 1), loc='upper right')\n",
-    "    del tx, s_vl\n",
+    "\n",
+    "f = 0\n",
+    "filt = 'g'\n",
+    "s_lw = [1, 1, 1, 1, 1]\n",
+    "s_ls = ['solid', 'solid', 'dashed', 'dotted', 'dotted']\n",
+    "\n",
+    "s_lb = ['min/max', None, 'mean', 'sigma', None]\n",
+    "tx = np.where(star_diaSources['band'] == filt)[0]\n",
+    "ax[0, 0].errorbar(star_diaSources['midpointMjdTai'][tx], star_diaSources['psfFlux'][tx],\n",
+    "                  yerr=star_diaSources['psfFluxErr'][tx], fmt=f_sym[filt],\n",
+    "                  ms=7, alpha=0.5, mew=0, color=f_col[filt], label=filt + ', star')\n",
+    "ax[0, 1].hist(star_diaSources['psfFlux'][tx], bins=30,\n",
+    "              alpha=0.5, color=f_col[filt], label=filt)\n",
+    "s_vl = [star_diaObject[filt + '_psfFluxMin'], star_diaObject[filt + '_psfFluxMax'],\n",
+    "        star_diaObject[filt + '_psfFluxMean'],\n",
+    "        star_diaObject[filt + '_psfFluxMean'] - star_diaObject[filt + '_psfFluxSigma'],\n",
+    "        star_diaObject[filt + '_psfFluxMean'] + star_diaObject[filt + '_psfFluxSigma']]\n",
+    "for s in range(len(s_vl)):\n",
+    "    ax[0, 0].axhline(s_vl[s], lw=s_lw[s], ls=s_ls[s], color='grey', label=s_lb[s])\n",
+    "    ax[0, 1].axvline(s_vl[s], lw=s_lw[s], ls=s_ls[s], color='grey', label=s_lb[s])\n",
+    "ax[0, 1].legend(bbox_to_anchor=(1.7, 1), loc='upper right')\n",
+    "ax[0, 0].set_ylabel('psfFlux [nJy]')\n",
+    "del tx, s_vl, s_lb\n",
+    "\n",
+    "s_lb = ['min/max', None, 'median', 'MAD', None]\n",
+    "tx = np.where(star_diaSources['band'] == filt)[0]\n",
+    "ax[1, 0].errorbar(star_diaSources['midpointMjdTai'][tx], star_diaSources['psfFlux'][tx],\n",
+    "                  yerr=star_diaSources['psfFluxErr'][tx], fmt=f_sym[filt],\n",
+    "                  ms=7, alpha=0.5, mew=0, color=f_col[filt], label=filt + ', star')\n",
+    "ax[1, 1].hist(star_diaSources['psfFlux'][tx], bins=30,\n",
+    "              alpha=0.5, color=f_col[filt], label=filt)\n",
+    "s_vl = [star_diaObject[filt + '_psfFluxMin'], star_diaObject[filt + '_psfFluxMax'],\n",
+    "        star_diaObject[filt + '_psfFluxPercentile50'],\n",
+    "        star_diaObject[filt + '_psfFluxPercentile50'] - star_diaObject[filt + '_psfFluxMAD'],\n",
+    "        star_diaObject[filt + '_psfFluxPercentile50'] + star_diaObject[filt + '_psfFluxMAD']]\n",
+    "for s in range(len(s_vl)):\n",
+    "    ax[1, 0].axhline(s_vl[s], lw=s_lw[s], ls=s_ls[s], color='grey', label=s_lb[s])\n",
+    "    ax[1, 1].axvline(s_vl[s], lw=s_lw[s], ls=s_ls[s], color='grey', label=s_lb[s])\n",
+    "ax[1, 0].set_ylabel('psfFlux [nJy]')\n",
+    "ax[1, 1].set_ylabel('N(diaSources)')\n",
+    "ax[1, 1].legend(bbox_to_anchor=(1.7, 1), loc='upper right')\n",
+    "del tx, s_vl\n",
     "ax[1, 0].set_xlabel('MJD [d]')\n",
     "ax[1, 1].set_xlabel('psfFlux [nJy]')\n",
     "plt.suptitle('Statistical features on the lightcurve and flux distribution of a variable star')\n",
@@ -674,7 +691,7 @@
    "id": "42e723a1-be50-4aa2-978b-1b76a4985486",
    "metadata": {},
    "source": [
-    "> **Figure 2:** The $g$- and $r$-band lightcurves (left) and flux distributions (right) for the variable star, with the minimum and maximum (dashed), mean (solid), and standard deviation (sigma; dotted) of the flux values overplotted as grey lines."
+    "> **Figure 2:** The $g$-band lightcurves (left) and flux distributions (right) for the variable star, with the minimum and maximum (solid), mean and standard deviation (top row; dashed and dotted), and median and MAD (bottom row; dashed and dotted) of the flux values overplotted as grey lines. "
    ]
   },
   {
@@ -687,7 +704,7 @@
     "The `<f>_psfFluxChi2` for a given filter is $\\chi^2 = \\sum\\left(\\frac{f - \\bar{f}}{e_f}\\right)^2$,\n",
     "where $f$ is the `psfFlux`, $\\bar{f}$ is the weighted mean of the `psfFlux`, and $e_f$ is the `psfFluxErr`.\n",
     "\n",
-    "Notice that the equation for $\\chi^2$ is quite similar to the one for $\\sigma_f$, except in $\\chi^2$ the flux differences from the mean are weighted by the flux error.\n",
+    "Notice that the equation for $\\chi^2$ is quite similar to the one for $\\sigma_f$, except in $\\chi^2$ the flux differences from the mean are weighted by the flux error, whereas $\\sigma_f$ uses a denominator of $N-1$.\n",
     "\n",
     "In a sense, the $\\chi^2$ is a measure of whether the flux variability is more than that expected from random variations due to measurement uncertainty.\n",
     "\n",
@@ -966,28 +983,31 @@
     "\n",
     "The `<f>_psfFluxLinearSlope` and `<f>_psfFluxLinearIntercept` are the result of fitting a linear regression using the `scipy.optimize.lsq_linear` function.\n",
     "\n",
-    "The `psfFluxMaxSlope` is the maximum \"instantaneous\" slope, or the maximum ration of the series of time-ordered values of $\\Delta f / \\Delta t$.\n",
+    "The `psfFluxMaxSlope` is the maximum \"instantaneous\" slope, or the maximum ratio of the series of time-ordered values of $\\Delta f / \\Delta t$.\n",
     "\n",
     "Demonstrate how to derive the linear fit parameters."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9b8e89d5-a363-4095-a70b-9d670621946f",
+   "id": "b3ea4d9a-5816-4dc9-bd64-31494f790e7b",
    "metadata": {},
    "outputs": [],
    "source": [
     "for f, filt in enumerate(use_filters):\n",
     "    diao_b = float(star_diaObject[filt + '_psfFluxLinearIntercept'])\n",
     "    diao_m = float(star_diaObject[filt + '_psfFluxLinearSlope'])\n",
     "    diao_mm = float(star_diaObject[filt + '_psfFluxMaxSlope'])\n",
-    "    tx = np.where(star_diaSources['band'] == filt)[0]\n",
-    "    A = np.array([star_diaSources['midpointMjdTai'][tx]\n",
-    "                  / star_diaSources['psfFluxErr'][tx],\n",
-    "                  1.0 / star_diaSources['psfFluxErr'][tx]]).transpose()\n",
-    "    dias_m, dias_b = lsq_linear(A, star_diaSources['psfFlux'][tx]\n",
-    "                                / star_diaSources['psfFluxErr'][tx]).x\n",
+    "    tx = np.where((star_diaSources['band'] == filt)\n",
+    "                  & (~np.isnan(star_diaSources['psfFlux']))\n",
+    "                  & (~np.isnan(star_diaSources['psfFluxErr']))\n",
+    "                  & (~np.isnan(star_diaSources['midpointMjdTai'])))[0]\n",
+    "    fluxes = star_diaSources['psfFlux'][tx]\n",
+    "    errors = star_diaSources['psfFluxErr'][tx]\n",
+    "    times = star_diaSources['midpointMjdTai'][tx]\n",
+    "    A = np.array([times / errors, 1 / errors]).transpose()\n",
+    "    dias_m, dias_b = lsq_linear(A, fluxes / errors).x\n",
     "    sx = np.argsort(star_diaSources['midpointMjdTai'][tx])\n",
     "    dias_mm = (np.diff(star_diaSources['psfFlux'][tx[sx]])\n",
     "               / np.diff(star_diaSources['midpointMjdTai'][tx[sx]])).max()\n",
@@ -997,15 +1017,15 @@
     "    print('Max Slope  %10.0f  %10.0f ' % (diao_mm, dias_mm))\n",
     "    if f == 0:\n",
     "        print(' ')\n",
-    "    del diao_b, diao_m, diao_mm, tx, A, dias_b, dias_m, sx, dias_mm"
+    "    del diao_b, diao_m, diao_mm, tx, fluxes, errors, times, A, dias_b, dias_m, sx, dias_mm"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "515aaa7f-6566-4a71-b85c-a73509522900",
    "metadata": {},
    "source": [
-    "**> Notice:** For the linear fits, the re-derived parameters from the `DiaSource` table are not an exact match to the parameters stored in the `DiaObject` table.\n",
+    ">**Notice:** For the linear fits, the re-derived parameters from the `DiaSource` table are not an exact match to the parameters stored in the `DiaObject` table. It is unclear why as the code above uses the same calculation as in the [diaCalculationPlugin](https://github.com/lsst/meas_base/blob/main/python/lsst/meas/base/diaCalculationPlugins.py) for class `LinearFitDiaPsfFlux`.\n",
     "\n",
     "The known variable star is not expected to exhibit a linear slope.\n",
     "Plot the lightcurve and show the best fit line."
@@ -1099,6 +1119,14 @@
    "metadata": {},
    "outputs": [],
    "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "22f49979-af1e-4884-9436-e4b44d143e8a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
diff --git a/DP1/200_Data_Products/207_Timeseries_features/207_2_Timeseries_distributions.ipynb b/DP1/200_Data_Products/207_Timeseries_features/207_2_Timeseries_distributions.ipynb
@@ -22,7 +22,7 @@
     "Data Release: [Data Preview 1](https://dp1.lsst.io)\\\n",
     "Container Size: Large\\\n",
     "LSST Science Pipelines version: r29.2.0\\\n",
-    "Last verified to run: 2025-12-30\\\n",
+    "Last verified to run: 2026-02-03\\\n",
     "Repository: [github.com/lsst/tutorial-notebooks](https://github.com/lsst/tutorial-notebooks)\\\n",
     "DOI: [10.11578/rubin/dc.20250909.20](https://doi.org/10.11578/rubin/dc.20250909.20)"
    ]
@@ -326,7 +326,11 @@
    "id": "16eeb034-6728-406e-bfc8-7b875c3a9416",
    "metadata": {},
    "source": [
-    "Retrieve the $r$-band difference-image (`psfFlux`) and direct-image (`scienceFlux`) measurements from the `DiaSource` table, their errors, and the midpoint of the exposure time as a Modified Julian Date (MJD)."
+    "Retrieve the $r$-band difference-image (`psfFlux`) and direct-image (`scienceFlux`) measurements from the `DiaSource` table, their errors, and the midpoint of the exposure time as a Modified Julian Date (MJD).\n",
+    "\n",
+    ">**Notice:** In general, for scientific analyses of lightcurves it is recommended to use the *forced* flux measurements in the `ForcedSourceOnDiaObject` table.\n",
+    "However, because the timeseries features in the `DiaObject` table are calculated using the detection (unforced) flux measurements from the `DiaSource` table, this tutorial also uses them.\n",
+    "The key difference to be aware of is that the `DiaSource` table only contains flux measurements for visits in which the `diaObject` was detected with SNR$\\geq5$ in the difference image, whereas the `ForcedSourceOnDiaObject` table contains forced flux measurements for *all* visits."
    ]
   },
   {
@@ -527,7 +531,7 @@
    "source": [
     "> **Figure 2**: The `psfFlux` (difference image flux; top) and `scienceFlux` (direct image flux; bottom) for all visits in which the star was detected with SNR>5 in the difference image, versus the MJD of the visit, for the $r$-band. The errors bars on the flux are, for most data points, relatively too small to be seen.\n",
     "\n",
-    "Notice that visits for which this object's brightness was similar to that of its brightness in the template image are missing from the lightcurve above (i.e., visits around MJD $\\sim 606404$ days, when `psfFlux` was near 0 nJy).\n",
+    "Notice that visits for which this object's brightness was similar to that of its brightness in the template image are missing from the lightcurve above (i.e., visits around MJD $\\sim 60640$ days, when `psfFlux` was near 0 nJy).\n",
     "\n",
     "Notice also that because this object is an astrophysical transient embedded in its host galaxy, the `scienceFlux` measurements are contaminated by host galaxy light.\n",
     "This contamination is not simply a \"pedestal\" (i.e., are not a constant amount added to the `psfFlux`) because the forced photometry uses the PSF of the image.\n",
@@ -542,7 +546,7 @@
    "source": [
     "### 2.3. Samples of diaObjects\n",
     "\n",
-    "Retrieve two samples of `diaObjects`: one from the same DP1 field as the star (the \"Low-$b$\" field), and one from the same DP1 field as the transient (the ECDFS field).\n",
+    "Retrieve two samples of `diaObjects`: one from the same DP1 field as the star (the low Galactic latitude or \"Low-$b$\" field), and one from the same DP1 field as the transient (the ECDFS field).\n",
     "\n",
     "To define the cone search, use the coordinates of the star and transient, and a 2.0 deg radius, which will encompass the relevant DP1 field.\n",
     "Print the coordinates of the star and transient."
@@ -589,7 +593,7 @@
    "metadata": {},
    "source": [
     "Apply the constraints in the ADQL query statement, and add a requirement that only `diaObjects` which were detected at least 10 times in the $r$-band be retrieved.\n",
-    "This will constraint will mean that the timeseries features used in this tutorial are all measured from at least 10 detections.\n",
+    "This constraint will mean that the timeseries features used in this tutorial are all measured from at least 10 detections.\n",
     "\n",
     "Create the query for the star's field (the \"Low-$b$\" field) and submit the query job."
    ]
@@ -716,7 +720,7 @@
    "id": "8c66e3e1-2aec-464a-9e7d-36a8782cc36e",
    "metadata": {},
    "source": [
-    "> **Figure 3:** The number of `diaObjects` with a given number of `diaSources` (number of difference image detections) in the $r$-band. While the star is in the tail of its population's distribution, the transient is not."
+    "> **Figure 3:** The number of `diaObjects` with a given number of `diaSources` (number of difference image detections) in the $r$-band. The vertical lines mark the values for the star and transient respectively (as in the legend). While the star is in the tail of its population's distribution, the transient is not."
    ]
   },
   {