Skip to content

Fix get_dummies: Only 3 region columns created instead of 4#1

Open
arzaanxeng wants to merge 1 commit into
AkarshVyas:mainfrom
arzaanxeng:main
Open

Fix get_dummies: Only 3 region columns created instead of 4#1
arzaanxeng wants to merge 1 commit into
AkarshVyas:mainfrom
arzaanxeng:main

Conversation

@arzaanxeng

Copy link
Copy Markdown

Problem

pd.get_dummies(..., drop_first=True) was automatically dropping the first region category (northeast) in alphabetical order. This resulted in only 3 dummy columns instead of 4.

Changes Made

  • Changed drop_first=True to drop_first=False
  • Added explicit drop of region_northeast as the reference category (best practice for regression models)

This makes the behavior clearer and avoids confusion.

Changed drop_first=True to False and dropped region_northeast as reference category.
@arzaanxeng

Copy link
Copy Markdown
Author

Hi Akarsh,

I saw that only 3 dummy columns were being created for the region column due to drop_first=True.

I’ve fixed it by creating all 4 columns and then explicitly dropping region_northeast as the reference category (standard practice in machine learning to avoid multicollinearity).

Let me know if you have any feedback or want it done differently!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant