{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example: Sleep Stage Classifier Evaluation using Accel & Heart Rate\n", "\n", "This notebook demonstrates the evaluation of a classifier for detecting sleep stages: Wakefulness, REM (Rapid Eye Movement), and NREM (Non-Rapid Eye Movement).\n", "- We'll generate synthetic data based on accelerometer and heart rate features.\n", "- We'll then create a simple classifier, and evaluate its performance using confusion matrix, precision, recall, and F1 score.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from scipy.signal import welch\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.metrics import confusion_matrix, precision_recall_fscore_support, accuracy_score, balanced_accuracy_score\n", "\n", "# Set random seed for reproducibility\n", "np.random.seed(42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Synthetic Data Generation\n", "\n", "We'll create synthetic data to simulate accelerometer, heart rate, and heart rate variability (HRV) measurements for different sleep stages. This function generates data with the following characteristics:\n", "\n", "- Wakefulness: Higher accelerometer activity, higher heart rate, lower HRV\n", "- REM: Low accelerometer activity, slightly elevated heart rate, moderate HRV\n", "- NREM: Very low accelerometer activity, lower heart rate, higher HRV\n", "\n", "We then extract a bunch of features from the accelerometer and heart rate signals (e.g. min, max, mean, etc); we also extract some specialized features from heart rate corresponding to heart rate variability (e.g. rmssd, sdnn); and obtain the dominant frequency using the `welch` function from SciPy." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
accel_meanaccel_stdaccel_maxaccel_minaccel_medianaccel_dominant_freqhr_meanhr_stdhr_maxhr_minhrv_rmssdhrv_sdnnhrv_pnn50Stage
03.1601711.3699945.7363760.7702113.21868333.33333371.75427512.820519103.84706154.71114218.54397412.8205190.000000Wakefulness
12.7574891.1876615.7742160.8101582.6682056.66666770.22254211.53013093.63325243.95071814.82899811.5301300.000000Wakefulness
22.5836520.9600014.7126510.8577762.60019330.00000057.9356387.98652677.87484036.57234410.9917757.9865260.000000NREM
32.8327520.9835414.6337150.8368682.71745823.33333366.55752814.36407491.20921236.12214020.30496214.3640740.000000REM
42.8126971.0540195.0320760.9495562.84869236.66666770.13398812.177066113.35858543.79741919.26258412.1770666.896552Wakefulness
\n", "
" ], "text/plain": [ " accel_mean accel_std accel_max accel_min accel_median \\\n", "0 3.160171 1.369994 5.736376 0.770211 3.218683 \n", "1 2.757489 1.187661 5.774216 0.810158 2.668205 \n", "2 2.583652 0.960001 4.712651 0.857776 2.600193 \n", "3 2.832752 0.983541 4.633715 0.836868 2.717458 \n", "4 2.812697 1.054019 5.032076 0.949556 2.848692 \n", "\n", " accel_dominant_freq hr_mean hr_std hr_max hr_min \\\n", "0 33.333333 71.754275 12.820519 103.847061 54.711142 \n", "1 6.666667 70.222542 11.530130 93.633252 43.950718 \n", "2 30.000000 57.935638 7.986526 77.874840 36.572344 \n", "3 23.333333 66.557528 14.364074 91.209212 36.122140 \n", "4 36.666667 70.133988 12.177066 113.358585 43.797419 \n", "\n", " hrv_rmssd hrv_sdnn hrv_pnn50 Stage \n", "0 18.543974 12.820519 0.000000 Wakefulness \n", "1 14.828998 11.530130 0.000000 Wakefulness \n", "2 10.991775 7.986526 0.000000 NREM \n", "3 20.304962 14.364074 0.000000 REM \n", "4 19.262584 12.177066 6.896552 Wakefulness " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def generate_synthetic_data(n_samples=1000, window_size=30, class_distribution=[0.5, 0.3, 0.2]):\n", " # Generate base signals\n", " time = np.arange(n_samples * window_size) / 100 # Assuming 100 Hz sampling rate\n", " accel_x = np.random.randn(n_samples * window_size)\n", " accel_y = np.random.randn(n_samples * window_size)\n", " accel_z = np.random.randn(n_samples * window_size)\n", " heart_rate = np.random.normal(60, 10, n_samples * window_size)\n", " \n", " # Generate labels\n", " labels = np.random.choice(['Wakefulness', 'REM', 'NREM'], n_samples, p=class_distribution)\n", " labels = np.repeat(labels, window_size)\n", " \n", " # Adjust signals based on sleep stage (with more overlap and noise)\n", " for i, label in enumerate(labels):\n", " if label == 'Wakefulness':\n", " accel_x[i] += np.random.normal(0, 1.2)\n", " accel_y[i] += np.random.normal(0, 1.2)\n", " accel_z[i] += np.random.normal(0, 1.2)\n", " heart_rate[i] += np.random.normal(10, 5)\n", " elif label == 'REM':\n", " accel_x[i] += np.random.normal(0, 0.9)\n", " accel_y[i] += np.random.normal(0, 0.9)\n", " accel_z[i] += np.random.normal(0, 0.9)\n", " heart_rate[i] += np.random.normal(5, 3)\n", " else: # NREM\n", " accel_x[i] += np.random.normal(0, 0.8)\n", " accel_y[i] += np.random.normal(0, 0.8)\n", " accel_z[i] += np.random.normal(0, 0.8)\n", " heart_rate[i] += np.random.normal(1, 2)\n", " \n", " # Add random noise to all signals\n", " accel_x += np.random.normal(0, 1, len(accel_x))\n", " accel_y += np.random.normal(0, 1, len(accel_y))\n", " accel_z += np.random.normal(0, 1, len(accel_z))\n", " heart_rate += np.random.normal(0, 5, len(heart_rate))\n", " \n", " # Calculate features for each window\n", " features = []\n", " for i in range(0, len(accel_x), window_size):\n", " window_x = accel_x[i:i+window_size]\n", " window_y = accel_y[i:i+window_size]\n", " window_z = accel_z[i:i+window_size]\n", " window_hr = heart_rate[i:i+window_size]\n", " \n", " # Accelerometer features\n", " accel_magnitude = np.sqrt(window_x**2 + window_y**2 + window_z**2)\n", " f, Pxx = welch(accel_magnitude, fs=100, nperseg=window_size)\n", " dominant_freq = f[np.argmax(Pxx)]\n", " \n", " accel_features = {\n", " 'accel_mean': np.mean(accel_magnitude),\n", " 'accel_std': np.std(accel_magnitude),\n", " 'accel_max': np.max(accel_magnitude),\n", " 'accel_min': np.min(accel_magnitude),\n", " 'accel_median': np.median(accel_magnitude),\n", " 'accel_dominant_freq': dominant_freq\n", " }\n", " \n", " # Heart rate features\n", " hr_diff = np.diff(window_hr)\n", " hrv_features = {\n", " 'hr_mean': np.mean(window_hr),\n", " 'hr_std': np.std(window_hr),\n", " 'hr_max': np.max(window_hr),\n", " 'hr_min': np.min(window_hr),\n", " 'hrv_rmssd': np.sqrt(np.mean(hr_diff**2)),\n", " 'hrv_sdnn': np.std(window_hr),\n", " 'hrv_pnn50': np.sum(np.abs(hr_diff) > 50) / len(hr_diff) * 100\n", " }\n", " \n", " features.append({**accel_features, **hrv_features, 'Stage': labels[i]})\n", " \n", " return pd.DataFrame(features)\n", "\n", "# Generate data\n", "data = generate_synthetic_data(2000)\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Training and Evaluation\n", "\n", "After generating and preprocessing our synthetic sleep stage data, we now move on to training our model and evaluating its performance. This section covers the following steps:\n", "\n", "1. **Data Splitting**: We separate our features (X) and labels (y), then split them into training and testing sets.\n", "\n", "2. **Model Training**: We use a Decision Tree classifier to train on our data.\n", "\n", "3. **Prediction**: We use our trained model to make predictions on the test set.\n", "\n", "4. **Visualization**: We create a heatmap to visualize the confusion matrix, providing an intuitive representation of our model's performance.\n", "\n", "These steps allow us to assess how well our model distinguishes between Wakefulness, REM, and NREM sleep stages based on the synthetic accelerometer and heart rate data we generated.\n", "\n", "The confusion matrix will show us:\n", "- How many instances of each sleep stage were correctly classified (diagonal elements)\n", "- Where misclassifications occurred (off-diagonal elements)\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Overall Accuracy: 0.773\n", "Confusion Matrix:\n", "[[280 37 0]\n", " [ 33 98 29]\n", " [ 2 35 86]]\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Split data into features (X) and labels (y)\n", "X = data.drop('Stage', axis=1)\n", "y = data['Stage']\n", "\n", "# Split into training and testing sets\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n", "\n", "# Train a Decision Tree classifier\n", "clf = DecisionTreeClassifier(random_state=42)\n", "clf.fit(X_train, y_train)\n", "\n", "# Make predictions\n", "y_pred = clf.predict(X_test)\n", "\n", "# Generate confusion matrix\n", "cm = confusion_matrix(y_test, y_pred, labels=['Wakefulness', 'REM', 'NREM'])\n", "\n", "# Calculate accuracy\n", "accuracy = accuracy_score(y_test, y_pred)\n", "\n", "# Calculate precision, recall, and F1 score\n", "precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, average=None, labels=['Wakefulness', 'REM', 'NREM'])\n", "\n", "# Display results\n", "print(f\"\\nOverall Accuracy: {accuracy:.3f}\")\n", "\n", "# Print the confusion matrix\n", "print(\"Confusion Matrix:\")\n", "print(cm)\n", "\n", "# Visualize the confusion matrix\n", "plt.figure(figsize=(5, 4))\n", "sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', \n", " xticklabels=['Wakefulness', 'REM', 'NREM'],\n", " yticklabels=['Wakefulness', 'REM', 'NREM'])\n", "plt.title('Confusion Matrix')\n", "plt.xlabel('Predicted')\n", "plt.ylabel('Actual')\n", "plt.show()\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Precision, Recall and F1 score\n", "\n", "The precision, recall, and F1 scores provide additional insights into the model's performance for each sleep stage:\n", "- Precision: The proportion of correct positive predictions.\n", "- Recall: The proportion of actual positive cases that were correctly identified.\n", "- F1 Score: The harmonic mean of precision and recall, providing a balanced measure of the model's performance.\n", "\n", "Let's proceed with the code to perform these steps and evaluate our sleep stage classifier." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Precision, Recall, and F1 Score:\n", "╒═════════════╤═════════════╤══════════╤════════════╕\n", "│ │ Precision │ Recall │ F1 Score │\n", "╞═════════════╪═════════════╪══════════╪════════════╡\n", "│ Wakefulness │ 0.889 │ 0.883 │ 0.886 │\n", "├─────────────┼─────────────┼──────────┼────────────┤\n", "│ REM │ 0.576 │ 0.613 │ 0.594 │\n", "├─────────────┼─────────────┼──────────┼────────────┤\n", "│ NREM │ 0.748 │ 0.699 │ 0.723 │\n", "╘═════════════╧═════════════╧══════════╧════════════╛\n" ] } ], "source": [ "from tabulate import tabulate\n", "\n", "# Create a DataFrame for precision, recall, and F1 score\n", "results_df = pd.DataFrame({\n", " 'Precision': precision,\n", " 'Recall': recall,\n", " 'F1 Score': f1\n", "}, index=['Wakefulness', 'REM', 'NREM'])\n", "\n", "# Format the DataFrame as a pretty table\n", "table = tabulate(results_df, headers='keys', tablefmt='fancy_grid', floatfmt='.3f')\n", "\n", "# Display results as a table\n", "print(\"\\nPrecision, Recall, and F1 Score:\")\n", "print(table)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature Importance Analysis\n", "\n", "After training our Decision Tree classifier and evaluating its performance, we now turn our attention to understanding which features contribute most significantly to the model's decision-making process. This analysis is crucial for several reasons:\n", "\n", "1. **Model Interpretability**: It helps us understand which aspects of the accelerometer and heart rate data are most influential in distinguishing between sleep stages.\n", "\n", "2. **Feature Selection**: Identifying the most important features can guide future data collection efforts and potentially simplify the model.\n", "\n", "We'll use the Decision Tree's built-in feature importance metric, which measures the average reduction in entropy when a particular feature is used for splitting.\n", "\n", "The following code will:\n", "\n", "1. Extract feature importances from the trained model.\n", "2. Sort these importances in descending order.\n", "3. Visualize the importances using a horizontal bar chart.\n", "4. Print out the numerical values of feature importances.\n", "\n", "The horizontal bar chart provides a visual representation of the relative importance of each feature, while the printed values give us precise numerical data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Feature Importances:\n", "--hr_mean: 0.6134\n", "--accel_mean: 0.0720\n", "--hr_max: 0.0443\n", "--accel_std: 0.0435\n", "--hrv_rmssd: 0.0423\n", "--hr_min: 0.0415\n", "--accel_dominant_freq: 0.0270\n", "--accel_min: 0.0265\n", "--hrv_sdnn: 0.0261\n", "--accel_median: 0.0259\n", "--accel_max: 0.0231\n", "--hr_std: 0.0145\n", "--hrv_pnn50: 0.0000\n" ] } ], "source": [ "# Feature importance\n", "importances = clf.feature_importances_\n", "feature_names = X.columns\n", "\n", "# Sort feature importances in descending order\n", "sorted_idx = np.argsort(importances)\n", "sorted_importances = importances[sorted_idx]\n", "sorted_feature_names = feature_names[sorted_idx]\n", "\n", "# Plot feature importances as a horizontal bar graph\n", "plt.figure(figsize=(8, 4))\n", "y_pos = np.arange(len(sorted_feature_names))\n", "plt.barh(y_pos, sorted_importances)\n", "plt.yticks(y_pos, sorted_feature_names)\n", "plt.xlabel('Importance')\n", "plt.title('Feature Importance')\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "# Print feature importances\n", "print(\"\\nFeature Importances:\")\n", "for name, importance in sorted(zip(feature_names, importances), key=lambda x: x[1], reverse=True):\n", " print(\"--\"+f\"{name}: {importance:.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imbalanced Data and the Limitations of Accuracy\n", "\n", "In many real-world scenarios, we encounter datasets where the distribution of classes is not equal. This is known as class imbalance. For example, in medical diagnosis, the number of healthy patients might vastly outnumber those with a rare condition. In our sleep stage classification, we're simulating a scenario where \"Wakefulness\" is much more common than \"REM\" or \"NREM\" stages.\n", "\n", "### Why Accuracy Can Be Misleading\n", "\n", "When dealing with imbalanced datasets, overall accuracy can be a misleading metric. Here's why:\n", "\n", "1. **Majority Class Bias**: A model can achieve high accuracy simply by always predicting the majority class. For instance, if 95% of our data is \"Wakefulness\", a model that always predicts \"Wakefulness\" would be 95% accurate, despite being utterly useless for identifying the other sleep stages.\n", "\n", "2. **Overlooking Minority Classes**: Accuracy gives equal weight to all predictions. In an imbalanced dataset, the performance on minority classes has little impact on the overall accuracy. A model could perform poorly on important but rare classes while maintaining high overall accuracy.\n", "\n", "3. **False Sense of Performance**: High accuracy might lead us to believe our model is performing well across all classes, when in reality it might be failing to identify critical minority cases.\n", "\n", "### Balanced Accuracy: A Better Metric for Imbalanced Data\n", "\n", "Balanced accuracy addresses these issues by giving equal weight to each class, regardless of its frequency in the dataset. Here's how it works:\n", "\n", "1. **Definition**: Balanced accuracy is the average of recall obtained on each class.\n", "\n", "2. **Calculation**: For each class, calculate the proportion of correct predictions (recall). Then, take the average of these recall values.\n", "\n", "3. **Interpretation**: A balanced accuracy of 0.5 for binary classification (or 1/n for n classes) represents the performance of random guessing. Any value above this indicates better-than-random performance across all classes.\n", "\n", "In the following example, we'll compare regular accuracy with balanced accuracy for both balanced and imbalanced (skewed) datasets. This comparison will highlight how balanced accuracy provides a more realistic assessment of our model's performance, especially when dealing with class imbalance." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Skewed Data Case ---\n", "\n", "Confusion Matrix (Skewed Data):\n", "[[559 11 2]\n", " [ 8 5 5]\n", " [ 0 1 9]]\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Precision, Recall, and F1 Score (Skewed Data):\n", "╒═════════════╤═════════════╤══════════╤════════════╕\n", "│ │ Precision │ Recall │ F1 Score │\n", "╞═════════════╪═════════════╪══════════╪════════════╡\n", "│ Wakefulness │ 0.986 │ 0.977 │ 0.982 │\n", "├─────────────┼─────────────┼──────────┼────────────┤\n", "│ REM │ 0.294 │ 0.278 │ 0.286 │\n", "├─────────────┼─────────────┼──────────┼────────────┤\n", "│ NREM │ 0.562 │ 0.900 │ 0.692 │\n", "╘═════════════╧═════════════╧══════════╧════════════╛\n", "\n", "Class Distribution in Test Set (Skewed Data):\n", "Stage\n", "Wakefulness 0.953333\n", "REM 0.030000\n", "NREM 0.016667\n", "Name: proportion, dtype: float64\n", "\n", "Naive Classifier Accuracy (always predicting Wakefulness): 0.953\n", "\n", "Overall Accuracy (Skewed Data): 0.955\n", "Balanced Accuracy (Skewed Data): 0.718\n", "Naive Classifier Balanced Accuracy: 0.333\n" ] } ], "source": [ "print(\"\\n--- Skewed Data Case ---\\n\")\n", "\n", "# Generate skewed data\n", "skewed_data = generate_synthetic_data(2000, class_distribution=[0.95, 0.03, 0.02])\n", "\n", "# Split data into features (X) and labels (y)\n", "X_skewed = skewed_data.drop('Stage', axis=1)\n", "y_skewed = skewed_data['Stage']\n", "\n", "# Split into training and testing sets\n", "X_train_skewed, X_test_skewed, y_train_skewed, y_test_skewed = train_test_split(X_skewed, y_skewed, test_size=0.3, random_state=42)\n", "\n", "# Train a Decision Tree classifier\n", "clf_skewed = DecisionTreeClassifier(random_state=42)\n", "clf_skewed.fit(X_train_skewed, y_train_skewed)\n", "\n", "# Make predictions\n", "y_pred_skewed = clf_skewed.predict(X_test_skewed)\n", "\n", "# Generate confusion matrix\n", "cm_skewed = confusion_matrix(y_test_skewed, y_pred_skewed, labels=['Wakefulness', 'REM', 'NREM'])\n", "\n", "# Calculate precision, recall, and F1 score\n", "precision_skewed, recall_skewed, f1_skewed, _ = precision_recall_fscore_support(y_test_skewed, y_pred_skewed, average=None, labels=['Wakefulness', 'REM', 'NREM'])\n", "\n", "# Display results\n", "print(\"Confusion Matrix (Skewed Data):\")\n", "print(cm_skewed)\n", "\n", "# Visualize the confusion matrix\n", "plt.figure(figsize=(5, 4))\n", "sns.heatmap(cm_skewed, annot=True, fmt='d', cmap='Blues', \n", " xticklabels=['Wakefulness', 'REM', 'NREM'],\n", " yticklabels=['Wakefulness', 'REM', 'NREM'])\n", "plt.title('Confusion Matrix (Skewed Data)')\n", "plt.xlabel('Predicted')\n", "plt.ylabel('Actual')\n", "plt.show()\n", "\n", "# Create a DataFrame for precision, recall, and F1 score\n", "results_df_skewed = pd.DataFrame({\n", " 'Precision': precision_skewed,\n", " 'Recall': recall_skewed,\n", " 'F1 Score': f1_skewed\n", "}, index=['Wakefulness', 'REM', 'NREM'])\n", "\n", "# Format the DataFrame as a pretty table\n", "table_skewed = tabulate(results_df_skewed, headers='keys', tablefmt='fancy_grid', floatfmt='.3f')\n", "\n", "# Display results as a table\n", "print(\"\\nPrecision, Recall, and F1 Score (Skewed Data):\")\n", "print(table_skewed)\n", "\n", "# Calculate and display class distribution\n", "class_distribution_skewed = y_test_skewed.value_counts(normalize=True)\n", "print(\"\\nClass Distribution in Test Set (Skewed Data):\")\n", "print(class_distribution_skewed)\n", "\n", "# Naive classifier that always predicts the majority class\n", "naive_pred_skewed = np.full_like(y_test_skewed, 'Wakefulness')\n", "naive_accuracy_skewed = accuracy_score(y_test_skewed, naive_pred_skewed)\n", "print(f\"\\nNaive Classifier Accuracy (always predicting Wakefulness): {naive_accuracy_skewed:.3f}\")\n", "\n", "# Calculate overall accuracy for the skewed case\n", "accuracy_skewed = accuracy_score(y_test_skewed, y_pred_skewed)\n", "print(f\"\\nOverall Accuracy (Skewed Data): {accuracy_skewed:.3f}\")\n", "\n", "# In the skewed data section, add this after calculating the overall accuracy\n", "balanced_accuracy_skewed = balanced_accuracy_score(y_test_skewed, y_pred_skewed)\n", "print(f\"Balanced Accuracy (Skewed Data): {balanced_accuracy_skewed:.3f}\")\n", "\n", "# Balanced accuracy for the naive classifier\n", "naive_balanced_accuracy_skewed = balanced_accuracy_score(y_test_skewed, naive_pred_skewed)\n", "print(f\"Naive Classifier Balanced Accuracy: {naive_balanced_accuracy_skewed:.3f}\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 2 }