Skip to content

Latest commit

 

History

History
3561 lines (3140 loc) · 69.2 KB

File metadata and controls

3561 lines (3140 loc) · 69.2 KB

Chapter 12: General Operations on Pandas Data Frames

Import Required Libraries

import os
import numpy as np
import pandas as pd

Set Working Directory

working_directory = 'E:/Python_For_DS_V2/Chapter12'
os.chdir(working_directory)

12.1 Head

  • Reference URL
  • This function returns the first n rows for the object based on position.
  • It is useful for quickly testing if your object has the right type of data in it.

Example

df = pd.read_csv('GCELL.txt',header=1)
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
0 HDIKBSC02 4905_Qazia Abad Layyah 0 14905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 14905 7 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
1 HDIKBSC02 4905_Qazia Abad Layyah 1 24905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 24905 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
2 HDIKBSC02 6939_Chak No 90 Mor Layyah (3G-CII-4281) 2 16939_Chak No 90 Mor Layyah (3G-CII-4281) GSM900_DCS1800 410 3 54438 16939 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
3 HDIKBSC02 6939_Chak No 90 Mor Layyah (3G-CII-4281) 3 26939_Chak No 90 Mor Layyah (3G-CII-4281) GSM900_DCS1800 410 3 54438 26939 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
4 HDIKBSC02 6939_Chak No 90 Mor Layyah (3G-CII-4281) 4 36939_Chak No 90 Mor Layyah (3G-CII-4281) GSM900_DCS1800 410 3 54438 36939 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

5 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.head(n=2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
0 HDIKBSC02 4905_Qazia Abad Layyah 0 14905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 14905 7 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
1 HDIKBSC02 4905_Qazia Abad Layyah 1 24905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 24905 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

2 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
0 HDIKBSC02 4905_Qazia Abad Layyah 0 14905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 14905 7 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
1 HDIKBSC02 4905_Qazia Abad Layyah 1 24905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 24905 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

2 rows × 34 columns

12.2 Tail

Example

df = pd.read_csv('GCELL.txt',header=1)
df.tail()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
11691 HKNWBSC02 CII-2553_Chak No 141 M Bahawalnagar Sahiwal (3... 201 CII-2553-2_Chak No 141 M Bahawalnagar Sahiwal ... GSM900_DCS1800 410 3 54436 22553 5 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11692 HKNWBSC02 CII-2553_Chak No 141 M Bahawalnagar Sahiwal (3... 202 CII-2553-3_Chak No 141 M Bahawalnagar Sahiwal ... GSM900_DCS1800 410 3 54436 32553 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11693 HKNWBSC02 CII-2560_Din Pur Khanewal (3G-CII-4423) 229 CII-2560-1_Din Pur Khanewal (3G-CII-4423) GSM900_DCS1800 410 3 54436 12560 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11694 HKNWBSC02 CII-2560_Din Pur Khanewal (3G-CII-4423) 249 CII-2560-2_Din Pur Khanewal (3G-CII-4423) GSM900_DCS1800 410 3 54436 22560 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11695 HKNWBSC02 CII-2560_Din Pur Khanewal (3G-CII-4423) 253 CII-2560-3_Din Pur Khanewal (3G-CII-4423) GSM900_DCS1800 410 3 54436 32560 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

5 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.tail(n=2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
11694 HKNWBSC02 CII-2560_Din Pur Khanewal (3G-CII-4423) 249 CII-2560-2_Din Pur Khanewal (3G-CII-4423) GSM900_DCS1800 410 3 54436 22560 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11695 HKNWBSC02 CII-2560_Din Pur Khanewal (3G-CII-4423) 253 CII-2560-3_Din Pur Khanewal (3G-CII-4423) GSM900_DCS1800 410 3 54436 32560 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

2 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.tail(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
11694 HKNWBSC02 CII-2560_Din Pur Khanewal (3G-CII-4423) 249 CII-2560-2_Din Pur Khanewal (3G-CII-4423) GSM900_DCS1800 410 3 54436 22560 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11695 HKNWBSC02 CII-2560_Din Pur Khanewal (3G-CII-4423) 253 CII-2560-3_Din Pur Khanewal (3G-CII-4423) GSM900_DCS1800 410 3 54436 32560 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

2 rows × 34 columns

12.3 Sample

  • Reference URL
  • Return a random sample of items from an axis of object.

Example

df = pd.read_csv('GCELL.txt',header=1)
df.sample()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
6804 HBWPBSC03 4176_Moza Sui Wala Samma Satta Lodhran 69 14176_Moza Sui Wala Samma Satta Lodhran GSM900 410 3 54437 14176 2 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

1 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.sample(n=3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
2133 HLHRBSC09 CI-1611_Tariq Road Near Airport Lahore-Z2 (3G-... 302 CI-1611-3_Tariq Road Near Airport Lahore-Z2 (3... GSM900_DCS1800 410 3 57705 31611 2 ... 4 1 NO NO NO 4294967295 - UNLOCK DEACTIVATED opname
9134 HMLTBSC05 4122_Bosan Road Multan (3G-CII-3128) 120 34122_Bosan Road Multan (3G-CII-3128) GSM900_DCS1800 410 3 54402 34122 0 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
3426 HHFZBSC01 3803_Pindi Bhattian Sargodha (3G-CI-2132) 75 33803_Pindi Bhattian Sargodha (3G-CI-2132) GSM900_DCS1800 410 3 53330 33803 2 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

3 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.sample(3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
1359 HGJWBSC04 3371_Haji Street Kamonki Gujranwala (3G-CI-4921) 11 33371_Haji Street Kamonki Gujranwala (3G-CI-4921) GSM900 410 3 53340 33371 7 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11595 HKNWBSC02 4550_Zamir Colony Kassowal Sahiwal (3G-CII-4422) 64 24550_Zamir Colony Kassowal Sahiwal (3G-CII-4422) GSM900_DCS1800 410 3 54436 24550 0 ... 4 1 NO NO NO 1 - UNLOCK ACTIVATED opname
11444 HBRWBSC03 4574_Gulshan e Ghani Colony Vehari (3G-CII-3521) 30 34574_Gulshan e Ghani Colony Vehari (3G-CII-3521) GSM900_DCS1800 410 3 54456 34574 3 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname

3 rows × 34 columns

12.3.1 random_state

  • Extract 3 random elements from the Data Frame: Note that we use random_state to ensure the reproducibility of the examples.
df = pd.read_csv('GCELL.txt',header=1)
df.sample(n=3, random_state=1)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
2897 HMWLBSC03 4309_Skaseer (Gold) Mianwali (3G-CII-3827) 32 34309_Skaseer (Gold) Mianwali (3G-CII-3827) GSM900_DCS1800 410 3 54407 34309 4 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
5465 HFSDBSC07 CII-2190_Chak No 78 GB Jawddi Faisalabad-Z2 174 CII-2190-2_Chak No 78 GB Jawddi Faisalabad-Z2 GSM900_DCS1800 410 3 54404 22190 3 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
6811 HBWPBSC03 5572_Jalapur Pirwala Multan 79 25572_Jalapur Pirwala Multan GSM900 410 3 54437 25572 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

3 rows × 34 columns

12.3.2 fract

df = pd.read_csv('GCELL.txt',header=1)
df.sample(frac=0.001,random_state=1)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
2897 HMWLBSC03 4309_Skaseer (Gold) Mianwali (3G-CII-3827) 32 34309_Skaseer (Gold) Mianwali (3G-CII-3827) GSM900_DCS1800 410 3 54407 34309 4 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
5465 HFSDBSC07 CII-2190_Chak No 78 GB Jawddi Faisalabad-Z2 174 CII-2190-2_Chak No 78 GB Jawddi Faisalabad-Z2 GSM900_DCS1800 410 3 54404 22190 3 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
6811 HBWPBSC03 5572_Jalapur Pirwala Multan 79 25572_Jalapur Pirwala Multan GSM900 410 3 54437 25572 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
4288 HLHRBSC08 7069_Chamra Mandi Lahore-Z1 (3G-CI-4322) 249 37069_Chamra Mandi Lahore-Z1 (3G-CI-4322) GSM900_DCS1800 410 3 57704 37069 3 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
11384 HHFZBSC02 3780_Bugga Hafizabad 111 33780_Bugga Hafizabad GSM900_DCS1800 410 3 53329 33780 2 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
5800 HKASBSC03 7621_Rehman Pura (Gold) Kasur (3G-CI-4983) 88 37621_Rehman Pura (Gold) Kasur (3G-CI-4983) GSM900_DCS1800 410 3 53309 37621 2 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
11654 HKNWBSC02 5501_Kot Islam Khanewal (3G-CII-4359) 123 35501_Kot Islam Khanewal (3G-CII-4359) GSM900_DCS1800 410 3 54436 35501 5 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
5169 HLHRBSC15 7085_Salamat Pura Lahore-Z1 (3G-CI-4330) 179 37085_Salamat Pura Lahore-Z1 (3G-CI-4330) GSM900 410 3 57708 37085 3 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
5660 HGJTBSC06 3766_Wazirabad Sialkot (3G-CI-3925) 260 23766_Wazirabad Sialkot (3G-CI-3925) GSM900_DCS1800 410 3 53308 23766 3 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
2717 HFSDBSC06 4035_Officer Colony Faisalabad-Z2 (3G-CII-3041) 43 24035_Officer Colony Faisalabad-Z2 (3G-CII-3041) GSM900_DCS1800 410 3 54424 24035 1 ... 4 1 NO NO NO 1 - UNLOCK ACTIVATED opname
2391 HLHRBSC11 3542_Sheer e Rabbani Sheikhupura (3G-CI-3900) 66 33542_Sheer e Rabbani Sheikhupura (3G-CI-3900) GSM900_DCS1800 410 3 53316 33542 3 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
6520 HDIKBSC03 4324_Imamia Road (Gold) DI Khan (3G-CII-3536) 100 24324_Imamia Road (Gold) DI Khan (3G-CII-3536) GSM900_DCS1800 410 3 54427 24324 2 ... 4 1 NO NO NO 1 - UNLOCK ACTIVATED opname

12 rows × 34 columns

12.3.3 replace

  • An upsample sample of the DataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.
df = pd.read_csv('GCELL.txt',header=1)
df.sample(frac=1.5, replace=True,random_state=1)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
235 HDIKBSC02 5713_Muhalah Qazianwala Layyah (3G-CII-4151) 272 35713_Muhalah Qazianwala Layyah (3G-CII-4151) GSM900_DCS1800 410 3 54438 35713 2 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
5192 HLHRBSC15 7088_PTCL Exchange Gulgasht Town Lahore-Z2 (3G... 250 27088_PTCL Exchange Gulgasht Town Lahore-Z2 (3... GSM900_DCS1800 410 3 57708 27088 5 ... 4 1 NO NO NO 1 - UNLOCK ACTIVATED opname
905 HLHRBSC14 CI-1629_Rehman Plaza Queen Road Lahore-Z2 (3G-... 179 CI-1629-1_Rehman Plaza Queen Road Lahore-Z2 (3... GSM900_DCS1800 410 3 57702 11629 5 ... 4 1 NO NO NO 0 - UNLOCK ACTIVATED opname
10955 HGJWBSC03 CI-1693_Sector Y Peoples Colony Gujranwala (3G... 325 CI-1693-3_Sector Y Peoples Colony Gujranwala (... GSM900_DCS1800 410 3 53303 31693 7 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
7813 HMLTBSC08 5851_Vehari Road Multan (3G-CII-4039) 242 25851_Vehari Road Multan (3G-CII-4039) GSM900_DCS1800 410 3 54418 25851 2 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2748 HFSDBSC06 4079_Tech Society Faisalabad-Z2 (3G-CII-2000) 74 34079_Tech Society Faisalabad-Z2 (3G-CII-2000) GSM900_DCS1800 410 3 54424 34079 1 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
934 HLHRBSC14 7662_Mochi Gate Lahore-Z0 (3G-CI-1761) 208 37662_Mochi Gate Lahore-Z0 (3G-CI-1761) GSM900_DCS1800 410 3 57702 37662 3 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
11011 HGJTBSC04 7870_Fateh Pur Gujrat (3G-CI-2191) 39 27870_Fateh Pur Gujrat (3G-CI-2191) GSM900 410 3 53313 27870 7 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
7784 HMLTBSC08 4262_Chowk Metla Bahawalpur Road Vehari (3G-CI... 213 34262_Chowk Metla Bahawalpur Road Vehari (3G-C... GSM900_DCS1800 410 3 51126 34262 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
1096 HFSDBSC08 4086_Pul Dhengro Sargodha Road Jhang 63 34086_Pul Dhengro Sargodha Road Jhang GSM900 410 3 54423 34086 6 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

17544 rows × 34 columns

12.3.4 weights

Example-7

  • Using a DataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]},
                  index=['falcon', 'dog', 'spider', 'fish'])
df.sample(frac=0.5,replace=True,random_state=1, weights='num_specimen_seen')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
num_legs num_wings num_specimen_seen
falcon 2 2 10
fish 0 0 8
df = pd.read_csv('GCELL.txt',header=1)
df.sample(frac=0.5,random_state=1, weights='Cell LAC')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
4841 HLHRBSC12 CI-1383_Village karbath Lahore-Z2 (3G-CI-1798) 165 CI-1383-3_Village karbath Lahore-Z2 (3G-CI-1798) GSM900_DCS1800 410 3 57701 31383 1 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
8384 HKNWBSC01 6996_Kohiwala Khanewal (3G-CII-4074) 51 36996_Kohiwala Khanewal GSM900_DCS1800 410 3 54421 36996 3 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
1 HDIKBSC02 4905_Qazia Abad Layyah 1 24905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 24905 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
3531 HSKTBSC04 7876_Kotli Loharan (Gold) Sialkot (3G-CI-1071) 145 17876_Kotli Loharan (Gold) Sialkot (3G-CI-1071) GSM900_DCS1800 410 3 53331 17876 2 ... 4 1 NO NO NO 0 - UNLOCK ACTIVATED opname
1709 HSWLBSC03 4672_Chak No 3 EB Pakpattan 17 34672_Chak No 3 EB Pakpattan GSM900_DCS1800 410 3 54440 34672 1 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3968 HLHRBSC10 CI-1800_Johar block Bahria Town Lahore (3G-CI-... 342 CI-1800-3_Johar block Bahria Town Lahore (3G-C... GSM900_DCS1800 410 3 53311 31800 5 ... 4 1 NO NO NO 2 - UNLOCK ACTIVATED opname
5736 HKASBSC03 7616_Parnawa Kasur (3G-CI-3916) 24 17616_Parnawa Kasur (3G-CI-3916) GSM900_DCS1800 410 3 53309 17616 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
83 HDIKBSC02 CII-2016_Mahala Shah Jahaniya Layyah (3G-CII-4... 87 CII-2016-1_Mahala Shah Jahaniya Layyah (3G-CII... GSM900_DCS1800 410 3 54438 12016 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
5097 HLHRBSC15 CI-1135_Tajpura New Lahore-Z2 (3G-CI-1787) 77 CI-1135-3_Tajpura New Lahore-Z2 (3G-CI-1787) GSM900_DCS1800 410 3 57708 31135 2 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
2554 HLHRBSC11 7676_Balarkay Sheikhupura (3G-CI-2167) 229 17676_Balarkay Sheikhupura (3G-CI-2167) GSM900_DCS1800 410 3 53305 17676 4 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

5848 rows × 34 columns

12.3.5 Random Sampling using skiprows

df = pd.read_csv('GCELL.txt',header=1)
df.shape
(11696, 34)
df=pd.read_csv('GCELL.txt',header=1,skiprows=lambda x:x>0 and np.random.rand()>0.01)
df.shape
(120, 34)

How it works

  • skiprows accepts a function that is evaluated against integer index.
  • x>0 ensures that the header row is not skipped.
  • np.random.rand()>0.01 return True 99% of the time, thus skipping 99% of the rows.

12.4 Shape of the Data Frame

  • Reference URL
  • Return a tuple representing the dimensionality of the DataFrame

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.shape
(11696, 34)

Example-2

df = pd.read_csv('GCELL.txt',header=1)
rows,columns=df.shape
print(rows,columns)
11696 34

Example-3

df = pd.read_csv('GCELL.txt',header=1)
rows,columns=df.shape
print("No of rows in the WHO Data Set:")
print(rows)
print("No of columns in the WHO Data Set:")
print(columns)
No of rows in the WHO Data Set:
11696
No of columns in the WHO Data Set:
34

12.5 Dimensions of the Data Frame

  • Reference URL
  • Return an int representing the number of axes / array dimensions.
  • Return 1 if Series. Otherwise return 2 if DataFrame.

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.ndim
2

Example-2

df = pd.Series({'a': 1, 'b': 2, 'c': 3})
df.ndim
1

12.6 Size of the Data Frame

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.size
397664

Example-2

df = pd.Series({'a': 1, 'b': 2, 'c': 3})
df.size
3

12.7 Get Variables Name of the Data Frame

12.7.1 columns

  • The column labels of the DataFrame.
df = pd.read_csv('GCELL.txt',header=1)
df.columns
Index(['BSC Name', 'BTS Name', 'Cell Index', 'Cell Name', 'Freq. Band', 'MCC',
       'MNC', 'Cell LAC', 'Cell CI', 'NCC', 'BCC', 'Cell Extension Type',
       'Cell IUO Type', 'Enhanced Concentric Allowed',
       'Cell Inner/Extra Property', 'Same Group Cell Index',
       'BCCH IUO of Double Frequency Cell', 'Start Flex MAIO Switch',
       'HSN Modification Switch', 'CS Voice Service PRI',
       'CS Data Service PRI', 'PS High PRI Service PRI',
       'PS Low PRI Service PRI', 'Number of PBCCH Blocks',
       'Number of PAGCH Blocks', 'Number of PRACH Blocks', 'VIP Cell',
       'MOCN Sharing Cell', 'Support Dual High Frequency Bands',
       'Local Cell ID', 'Remark', 'Administrative State', 'active status',
       'Operator Name'],
      dtype='object')

12.7.2 keys

  • Pandas dataframe.keys() function returns the ‘info axis’ for the pandas object.
  • If the pandas object is series then it returns index.
  • If the pandas object is dataframe then it returns columns.
  • Reference URL
df = pd.read_csv('GCELL.txt',header=1)
df.keys()
Index(['BSC Name', 'BTS Name', 'Cell Index', 'Cell Name', 'Freq. Band', 'MCC',
       'MNC', 'Cell LAC', 'Cell CI', 'NCC', 'BCC', 'Cell Extension Type',
       'Cell IUO Type', 'Enhanced Concentric Allowed',
       'Cell Inner/Extra Property', 'Same Group Cell Index',
       'BCCH IUO of Double Frequency Cell', 'Start Flex MAIO Switch',
       'HSN Modification Switch', 'CS Voice Service PRI',
       'CS Data Service PRI', 'PS High PRI Service PRI',
       'PS Low PRI Service PRI', 'Number of PBCCH Blocks',
       'Number of PAGCH Blocks', 'Number of PRACH Blocks', 'VIP Cell',
       'MOCN Sharing Cell', 'Support Dual High Frequency Bands',
       'Local Cell ID', 'Remark', 'Administrative State', 'active status',
       'Operator Name'],
      dtype='object')

12.8 Index of the Data Frame

  • The index (row labels) of the DataFrame.
df = pd.read_csv('GCELL.txt',header=1)
df.index
RangeIndex(start=0, stop=11696, step=1)

12.9 axes of the Data Frame

  • Return a list representing the axes of the DataFrame.
  • It has the row axis labels and column axis labels as the only members.
  • They are returned in that order.
df = pd.read_csv('GCELL.txt',header=1)
df.axes
[RangeIndex(start=0, stop=11696, step=1),
 Index(['BSC Name', 'BTS Name', 'Cell Index', 'Cell Name', 'Freq. Band', 'MCC',
        'MNC', 'Cell LAC', 'Cell CI', 'NCC', 'BCC', 'Cell Extension Type',
        'Cell IUO Type', 'Enhanced Concentric Allowed',
        'Cell Inner/Extra Property', 'Same Group Cell Index',
        'BCCH IUO of Double Frequency Cell', 'Start Flex MAIO Switch',
        'HSN Modification Switch', 'CS Voice Service PRI',
        'CS Data Service PRI', 'PS High PRI Service PRI',
        'PS Low PRI Service PRI', 'Number of PBCCH Blocks',
        'Number of PAGCH Blocks', 'Number of PRACH Blocks', 'VIP Cell',
        'MOCN Sharing Cell', 'Support Dual High Frequency Bands',
        'Local Cell ID', 'Remark', 'Administrative State', 'active status',
        'Operator Name'],
       dtype='object')]

12.10 Set Index to the specfic Column

  • Set the DataFrame index (row labels) using one or more existing columns

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.set_index('BSC Name',inplace=True)
df.head(3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BTS Name Cell Index Cell Name Freq. Band MCC MNC Cell LAC Cell CI NCC BCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
BSC Name
HDIKBSC02 4905_Qazia Abad Layyah 0 14905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 14905 7 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
HDIKBSC02 4905_Qazia Abad Layyah 1 24905_Qazia Abad Layyah GSM900_DCS1800 410 3 54438 24905 0 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
HDIKBSC02 6939_Chak No 90 Mor Layyah (3G-CII-4281) 2 16939_Chak No 90 Mor Layyah (3G-CII-4281) GSM900_DCS1800 410 3 54438 16939 0 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

3 rows × 33 columns

Example-2

df = pd.read_csv('GCELL.txt',header=1)
df.set_index(['BSC Name','Cell Name'],inplace=True)
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BTS Name Cell Index Freq. Band MCC MNC Cell LAC Cell CI NCC BCC Cell Extension Type ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
BSC Name Cell Name
HDIKBSC02 14905_Qazia Abad Layyah 4905_Qazia Abad Layyah 0 GSM900_DCS1800 410 3 54438 14905 7 0 Normal_cell ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
24905_Qazia Abad Layyah 4905_Qazia Abad Layyah 1 GSM900_DCS1800 410 3 54438 24905 0 0 Normal_cell ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

2 rows × 32 columns

12.11 Re-Set Index

df = pd.read_csv('GCELL.txt',header=1)
df.set_index(['BSC Name','Cell Name'],inplace=True)
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BTS Name Cell Index Freq. Band MCC MNC Cell LAC Cell CI NCC BCC Cell Extension Type ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
BSC Name Cell Name
HDIKBSC02 14905_Qazia Abad Layyah 4905_Qazia Abad Layyah 0 GSM900_DCS1800 410 3 54438 14905 7 0 Normal_cell ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
24905_Qazia Abad Layyah 4905_Qazia Abad Layyah 1 GSM900_DCS1800 410 3 54438 24905 0 0 Normal_cell ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

2 rows × 32 columns

df.reset_index(['BSC Name','Cell Name'],inplace=True)
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
BSC Name Cell Name BTS Name Cell Index Freq. Band MCC MNC Cell LAC Cell CI NCC ... Number of PAGCH Blocks Number of PRACH Blocks VIP Cell MOCN Sharing Cell Support Dual High Frequency Bands Local Cell ID Remark Administrative State active status Operator Name
0 HDIKBSC02 14905_Qazia Abad Layyah 4905_Qazia Abad Layyah 0 GSM900_DCS1800 410 3 54438 14905 7 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname
1 HDIKBSC02 24905_Qazia Abad Layyah 4905_Qazia Abad Layyah 1 GSM900_DCS1800 410 3 54438 24905 0 ... 4 1 NO NO NO 4294967295 - UNLOCK ACTIVATED opname

2 rows × 34 columns

12.12 set_axis

  • Assign desired index to given axis.
  • Indexes for column or row labels can be changed by assigning a list-like or Index

Example-1

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B
0 1 4
1 2 5
2 3 6
df.set_axis(['a', 'b', 'c'], axis='rows', inplace=True)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B
a 1 4
b 2 5
c 3 6

Example-2

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B
0 1 4
1 2 5
2 3 6
df.set_axis(['A1', 'B1'], axis='columns', inplace=True)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A1 B1
0 1 4
1 2 5
2 3 6

12.13 first_valid_index

  • Return index for first non-NA/null value.
df = pd.DataFrame({
                   "A1":[np.NaN,np.NaN,np.NaN,5,6,np.NaN,np.NaN], 
                   })
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A1
0 NaN
1 NaN
2 NaN
3 5.0
4 6.0
5 NaN
6 NaN
df.first_valid_index()
3

12.14 last_valid_index

  • Return index for last non-NA/null value.
df = pd.DataFrame({
                   "A1":[np.NaN,np.NaN,np.NaN,5,6,np.NaN,np.NaN], 
                   })
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A1
0 NaN
1 NaN
2 NaN
3 5.0
4 6.0
5 NaN
6 NaN
df.last_valid_index()
4

12.15 Why do some pandas commands end with parentheses (and others don't)?