|
521 | 521 | " error = y_pred - y_train\n", |
522 | 522 | "\n", |
523 | 523 | " # Ajuste de pesos con gradientes\n", |
524 | | - " #dz = error # BCE\n", |
525 | | - " dz = 2 * error * self.sigmoid_derivative(y_pred) # MSE\n", |
| 524 | + " dz = error # BCE\n", |
| 525 | + " #dz = 2 * error * self.sigmoid_derivative(y_pred) # MSE\n", |
526 | 526 | " dw = X.T @ dz / n\n", |
527 | 527 | " db = np.sum(dz) / n\n", |
528 | 528 | " self.weights -= lr * dw\n", |
|
534 | 534 | " return history\n" |
535 | 535 | ] |
536 | 536 | }, |
| 537 | + { |
| 538 | + "cell_type": "markdown", |
| 539 | + "source": [ |
| 540 | + "## ¿Por qué se calcula así?\n", |
| 541 | + "\n", |
| 542 | + "Para la función de binary **binary cross-entropy**, la derivada con respecto a la predicción es:\n", |
| 543 | + "```\n", |
| 544 | + "∂L/∂y_pred = (y_pred - y_train) / (y_pred * (1 - y_pred))\n", |
| 545 | + "```\n", |
| 546 | + "\n", |
| 547 | + "Pero cuando combinamos esto con la derivada de la sigmoid (que ya está en la neurona), el término `y_pred * (1 - y_pred)` se cancela, quedando simplemente:\n", |
| 548 | + "```\n", |
| 549 | + "gradient = y_pred - y_train\n", |
| 550 | + "````\n", |
| 551 | + "\n", |
| 552 | + "Para **MSE**, la función de pérdida es:\n", |
| 553 | + "```\n", |
| 554 | + "L = (1/n) * Σ(y_pred - y_train)²\n", |
| 555 | + "```\n", |
| 556 | + "La derivada con respecto a y_pred es:\n", |
| 557 | + "```\n", |
| 558 | + "∂L/∂y_pred = 2 * (y_pred - y_train)\n", |
| 559 | + "```\n", |
| 560 | + "Cuando aplicas la regla de la cadena con la sigmoid, obtenemos:\n", |
| 561 | + "```\n", |
| 562 | + "gradient = 2 * (y_pred - y_train) * sigmoid'(y_pred)\n", |
| 563 | + "gradient = 2 * (y_pred - y_train) * y_pred * (1 - y_pred)\n", |
| 564 | + "```" |
| 565 | + ], |
| 566 | + "metadata": { |
| 567 | + "id": "-wj05Ow1Oavy" |
| 568 | + } |
| 569 | + }, |
537 | 570 | { |
538 | 571 | "cell_type": "markdown", |
539 | 572 | "metadata": { |
|
0 commit comments