Chapter 12: General Operations on Pandas Data Frames

Import Required Libraries

import os
import numpy as np
import pandas as pd

Set Working Directory

working_directory = 'E:/Python_For_DS_V2/Chapter12'
os.chdir(working_directory)

12.1 Head

Reference URL
This function returns the first n rows for the object based on position.
It is useful for quickly testing if your object has the right type of data in it.

Example

df = pd.read_csv('GCELL.txt',header=1)
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
0	HDIKBSC02	4905_Qazia Abad Layyah	0	14905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	14905	7	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
1	HDIKBSC02	4905_Qazia Abad Layyah	1	24905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	24905	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
2	HDIKBSC02	6939_Chak No 90 Mor Layyah (3G-CII-4281)	2	16939_Chak No 90 Mor Layyah (3G-CII-4281)	GSM900_DCS1800	410	3	54438	16939	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
3	HDIKBSC02	6939_Chak No 90 Mor Layyah (3G-CII-4281)	3	26939_Chak No 90 Mor Layyah (3G-CII-4281)	GSM900_DCS1800	410	3	54438	26939	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
4	HDIKBSC02	6939_Chak No 90 Mor Layyah (3G-CII-4281)	4	36939_Chak No 90 Mor Layyah (3G-CII-4281)	GSM900_DCS1800	410	3	54438	36939	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

5 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.head(n=2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
0	HDIKBSC02	4905_Qazia Abad Layyah	0	14905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	14905	7	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
1	HDIKBSC02	4905_Qazia Abad Layyah	1	24905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	24905	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

2 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.head(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
0	HDIKBSC02	4905_Qazia Abad Layyah	0	14905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	14905	7	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
1	HDIKBSC02	4905_Qazia Abad Layyah	1	24905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	24905	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

2 rows × 34 columns

12.2 Tail

Reference URL
Return the last n rows.

Example

df = pd.read_csv('GCELL.txt',header=1)
df.tail()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
11691	HKNWBSC02	CII-2553_Chak No 141 M Bahawalnagar Sahiwal (3...	201	CII-2553-2_Chak No 141 M Bahawalnagar Sahiwal ...	GSM900_DCS1800	410	3	54436	22553	5	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11692	HKNWBSC02	CII-2553_Chak No 141 M Bahawalnagar Sahiwal (3...	202	CII-2553-3_Chak No 141 M Bahawalnagar Sahiwal ...	GSM900_DCS1800	410	3	54436	32553	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11693	HKNWBSC02	CII-2560_Din Pur Khanewal (3G-CII-4423)	229	CII-2560-1_Din Pur Khanewal (3G-CII-4423)	GSM900_DCS1800	410	3	54436	12560	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11694	HKNWBSC02	CII-2560_Din Pur Khanewal (3G-CII-4423)	249	CII-2560-2_Din Pur Khanewal (3G-CII-4423)	GSM900_DCS1800	410	3	54436	22560	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11695	HKNWBSC02	CII-2560_Din Pur Khanewal (3G-CII-4423)	253	CII-2560-3_Din Pur Khanewal (3G-CII-4423)	GSM900_DCS1800	410	3	54436	32560	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

5 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.tail(n=2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
11694	HKNWBSC02	CII-2560_Din Pur Khanewal (3G-CII-4423)	249	CII-2560-2_Din Pur Khanewal (3G-CII-4423)	GSM900_DCS1800	410	3	54436	22560	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11695	HKNWBSC02	CII-2560_Din Pur Khanewal (3G-CII-4423)	253	CII-2560-3_Din Pur Khanewal (3G-CII-4423)	GSM900_DCS1800	410	3	54436	32560	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

2 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.tail(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
11694	HKNWBSC02	CII-2560_Din Pur Khanewal (3G-CII-4423)	249	CII-2560-2_Din Pur Khanewal (3G-CII-4423)	GSM900_DCS1800	410	3	54436	22560	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11695	HKNWBSC02	CII-2560_Din Pur Khanewal (3G-CII-4423)	253	CII-2560-3_Din Pur Khanewal (3G-CII-4423)	GSM900_DCS1800	410	3	54436	32560	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

2 rows × 34 columns

12.3 Sample

Reference URL
Return a random sample of items from an axis of object.

Example

df = pd.read_csv('GCELL.txt',header=1)
df.sample()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
6804	HBWPBSC03	4176_Moza Sui Wala Samma Satta Lodhran	69	14176_Moza Sui Wala Samma Satta Lodhran	GSM900	410	3	54437	14176	2	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

1 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.sample(n=3)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
2133	HLHRBSC09	CI-1611_Tariq Road Near Airport Lahore-Z2 (3G-...	302	CI-1611-3_Tariq Road Near Airport Lahore-Z2 (3...	GSM900_DCS1800	410	3	57705	31611	2	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	DEACTIVATED	opname
9134	HMLTBSC05	4122_Bosan Road Multan (3G-CII-3128)	120	34122_Bosan Road Multan (3G-CII-3128)	GSM900_DCS1800	410	3	54402	34122	0	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
3426	HHFZBSC01	3803_Pindi Bhattian Sargodha (3G-CI-2132)	75	33803_Pindi Bhattian Sargodha (3G-CI-2132)	GSM900_DCS1800	410	3	53330	33803	2	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

3 rows × 34 columns

Example

df = pd.read_csv('GCELL.txt',header=1)
df.sample(3)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
1359	HGJWBSC04	3371_Haji Street Kamonki Gujranwala (3G-CI-4921)	11	33371_Haji Street Kamonki Gujranwala (3G-CI-4921)	GSM900	410	3	53340	33371	7	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11595	HKNWBSC02	4550_Zamir Colony Kassowal Sahiwal (3G-CII-4422)	64	24550_Zamir Colony Kassowal Sahiwal (3G-CII-4422)	GSM900_DCS1800	410	3	54436	24550	0	...	4	1	NO	NO	NO	1	-	UNLOCK	ACTIVATED	opname
11444	HBRWBSC03	4574_Gulshan e Ghani Colony Vehari (3G-CII-3521)	30	34574_Gulshan e Ghani Colony Vehari (3G-CII-3521)	GSM900_DCS1800	410	3	54456	34574	3	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname

3 rows × 34 columns

12.3.1 random_state

Extract 3 random elements from the Data Frame: Note that we use random_state to ensure the reproducibility of the examples.

df = pd.read_csv('GCELL.txt',header=1)
df.sample(n=3, random_state=1)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
2897	HMWLBSC03	4309_Skaseer (Gold) Mianwali (3G-CII-3827)	32	34309_Skaseer (Gold) Mianwali (3G-CII-3827)	GSM900_DCS1800	410	3	54407	34309	4	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
5465	HFSDBSC07	CII-2190_Chak No 78 GB Jawddi Faisalabad-Z2	174	CII-2190-2_Chak No 78 GB Jawddi Faisalabad-Z2	GSM900_DCS1800	410	3	54404	22190	3	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
6811	HBWPBSC03	5572_Jalapur Pirwala Multan	79	25572_Jalapur Pirwala Multan	GSM900	410	3	54437	25572	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

3 rows × 34 columns

12.3.2 fract

df = pd.read_csv('GCELL.txt',header=1)
df.sample(frac=0.001,random_state=1)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
2897	HMWLBSC03	4309_Skaseer (Gold) Mianwali (3G-CII-3827)	32	34309_Skaseer (Gold) Mianwali (3G-CII-3827)	GSM900_DCS1800	410	3	54407	34309	4	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
5465	HFSDBSC07	CII-2190_Chak No 78 GB Jawddi Faisalabad-Z2	174	CII-2190-2_Chak No 78 GB Jawddi Faisalabad-Z2	GSM900_DCS1800	410	3	54404	22190	3	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
6811	HBWPBSC03	5572_Jalapur Pirwala Multan	79	25572_Jalapur Pirwala Multan	GSM900	410	3	54437	25572	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
4288	HLHRBSC08	7069_Chamra Mandi Lahore-Z1 (3G-CI-4322)	249	37069_Chamra Mandi Lahore-Z1 (3G-CI-4322)	GSM900_DCS1800	410	3	57704	37069	3	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
11384	HHFZBSC02	3780_Bugga Hafizabad	111	33780_Bugga Hafizabad	GSM900_DCS1800	410	3	53329	33780	2	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
5800	HKASBSC03	7621_Rehman Pura (Gold) Kasur (3G-CI-4983)	88	37621_Rehman Pura (Gold) Kasur (3G-CI-4983)	GSM900_DCS1800	410	3	53309	37621	2	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
11654	HKNWBSC02	5501_Kot Islam Khanewal (3G-CII-4359)	123	35501_Kot Islam Khanewal (3G-CII-4359)	GSM900_DCS1800	410	3	54436	35501	5	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
5169	HLHRBSC15	7085_Salamat Pura Lahore-Z1 (3G-CI-4330)	179	37085_Salamat Pura Lahore-Z1 (3G-CI-4330)	GSM900	410	3	57708	37085	3	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
5660	HGJTBSC06	3766_Wazirabad Sialkot (3G-CI-3925)	260	23766_Wazirabad Sialkot (3G-CI-3925)	GSM900_DCS1800	410	3	53308	23766	3	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
2717	HFSDBSC06	4035_Officer Colony Faisalabad-Z2 (3G-CII-3041)	43	24035_Officer Colony Faisalabad-Z2 (3G-CII-3041)	GSM900_DCS1800	410	3	54424	24035	1	...	4	1	NO	NO	NO	1	-	UNLOCK	ACTIVATED	opname
2391	HLHRBSC11	3542_Sheer e Rabbani Sheikhupura (3G-CI-3900)	66	33542_Sheer e Rabbani Sheikhupura (3G-CI-3900)	GSM900_DCS1800	410	3	53316	33542	3	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
6520	HDIKBSC03	4324_Imamia Road (Gold) DI Khan (3G-CII-3536)	100	24324_Imamia Road (Gold) DI Khan (3G-CII-3536)	GSM900_DCS1800	410	3	54427	24324	2	...	4	1	NO	NO	NO	1	-	UNLOCK	ACTIVATED	opname

12 rows × 34 columns

12.3.3 replace

An upsample sample of the DataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

df = pd.read_csv('GCELL.txt',header=1)
df.sample(frac=1.5, replace=True,random_state=1)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
235	HDIKBSC02	5713_Muhalah Qazianwala Layyah (3G-CII-4151)	272	35713_Muhalah Qazianwala Layyah (3G-CII-4151)	GSM900_DCS1800	410	3	54438	35713	2	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
5192	HLHRBSC15	7088_PTCL Exchange Gulgasht Town Lahore-Z2 (3G...	250	27088_PTCL Exchange Gulgasht Town Lahore-Z2 (3...	GSM900_DCS1800	410	3	57708	27088	5	...	4	1	NO	NO	NO	1	-	UNLOCK	ACTIVATED	opname
905	HLHRBSC14	CI-1629_Rehman Plaza Queen Road Lahore-Z2 (3G-...	179	CI-1629-1_Rehman Plaza Queen Road Lahore-Z2 (3...	GSM900_DCS1800	410	3	57702	11629	5	...	4	1	NO	NO	NO	0	-	UNLOCK	ACTIVATED	opname
10955	HGJWBSC03	CI-1693_Sector Y Peoples Colony Gujranwala (3G...	325	CI-1693-3_Sector Y Peoples Colony Gujranwala (...	GSM900_DCS1800	410	3	53303	31693	7	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
7813	HMLTBSC08	5851_Vehari Road Multan (3G-CII-4039)	242	25851_Vehari Road Multan (3G-CII-4039)	GSM900_DCS1800	410	3	54418	25851	2	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2748	HFSDBSC06	4079_Tech Society Faisalabad-Z2 (3G-CII-2000)	74	34079_Tech Society Faisalabad-Z2 (3G-CII-2000)	GSM900_DCS1800	410	3	54424	34079	1	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
934	HLHRBSC14	7662_Mochi Gate Lahore-Z0 (3G-CI-1761)	208	37662_Mochi Gate Lahore-Z0 (3G-CI-1761)	GSM900_DCS1800	410	3	57702	37662	3	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
11011	HGJTBSC04	7870_Fateh Pur Gujrat (3G-CI-2191)	39	27870_Fateh Pur Gujrat (3G-CI-2191)	GSM900	410	3	53313	27870	7	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
7784	HMLTBSC08	4262_Chowk Metla Bahawalpur Road Vehari (3G-CI...	213	34262_Chowk Metla Bahawalpur Road Vehari (3G-C...	GSM900_DCS1800	410	3	51126	34262	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
1096	HFSDBSC08	4086_Pul Dhengro Sargodha Road Jhang	63	34086_Pul Dhengro Sargodha Road Jhang	GSM900	410	3	54423	34086	6	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

17544 rows × 34 columns

12.3.4 weights

Example-7

Using a DataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]},
                  index=['falcon', 'dog', 'spider', 'fish'])
df.sample(frac=0.5,replace=True,random_state=1, weights='num_specimen_seen')

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	num_legs	num_wings	num_specimen_seen
falcon	2	2	10
fish	0	0	8

df = pd.read_csv('GCELL.txt',header=1)
df.sample(frac=0.5,random_state=1, weights='Cell LAC')

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
4841	HLHRBSC12	CI-1383_Village karbath Lahore-Z2 (3G-CI-1798)	165	CI-1383-3_Village karbath Lahore-Z2 (3G-CI-1798)	GSM900_DCS1800	410	3	57701	31383	1	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
8384	HKNWBSC01	6996_Kohiwala Khanewal (3G-CII-4074)	51	36996_Kohiwala Khanewal	GSM900_DCS1800	410	3	54421	36996	3	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
1	HDIKBSC02	4905_Qazia Abad Layyah	1	24905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	24905	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
3531	HSKTBSC04	7876_Kotli Loharan (Gold) Sialkot (3G-CI-1071)	145	17876_Kotli Loharan (Gold) Sialkot (3G-CI-1071)	GSM900_DCS1800	410	3	53331	17876	2	...	4	1	NO	NO	NO	0	-	UNLOCK	ACTIVATED	opname
1709	HSWLBSC03	4672_Chak No 3 EB Pakpattan	17	34672_Chak No 3 EB Pakpattan	GSM900_DCS1800	410	3	54440	34672	1	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3968	HLHRBSC10	CI-1800_Johar block Bahria Town Lahore (3G-CI-...	342	CI-1800-3_Johar block Bahria Town Lahore (3G-C...	GSM900_DCS1800	410	3	53311	31800	5	...	4	1	NO	NO	NO	2	-	UNLOCK	ACTIVATED	opname
5736	HKASBSC03	7616_Parnawa Kasur (3G-CI-3916)	24	17616_Parnawa Kasur (3G-CI-3916)	GSM900_DCS1800	410	3	53309	17616	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
83	HDIKBSC02	CII-2016_Mahala Shah Jahaniya Layyah (3G-CII-4...	87	CII-2016-1_Mahala Shah Jahaniya Layyah (3G-CII...	GSM900_DCS1800	410	3	54438	12016	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
5097	HLHRBSC15	CI-1135_Tajpura New Lahore-Z2 (3G-CI-1787)	77	CI-1135-3_Tajpura New Lahore-Z2 (3G-CI-1787)	GSM900_DCS1800	410	3	57708	31135	2	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
2554	HLHRBSC11	7676_Balarkay Sheikhupura (3G-CI-2167)	229	17676_Balarkay Sheikhupura (3G-CI-2167)	GSM900_DCS1800	410	3	53305	17676	4	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

5848 rows × 34 columns

12.3.5 Random Sampling using skiprows

df = pd.read_csv('GCELL.txt',header=1)
df.shape

(11696, 34)

df=pd.read_csv('GCELL.txt',header=1,skiprows=lambda x:x>0 and np.random.rand()>0.01)
df.shape

(120, 34)

How it works

skiprows accepts a function that is evaluated against integer index.
x>0 ensures that the header row is not skipped.
np.random.rand()>0.01 return True 99% of the time, thus skipping 99% of the rows.

12.4 Shape of the Data Frame

Reference URL
Return a tuple representing the dimensionality of the DataFrame

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.shape

(11696, 34)

Example-2

df = pd.read_csv('GCELL.txt',header=1)
rows,columns=df.shape
print(rows,columns)

11696 34

Example-3

df = pd.read_csv('GCELL.txt',header=1)
rows,columns=df.shape
print("No of rows in the WHO Data Set:")
print(rows)
print("No of columns in the WHO Data Set:")
print(columns)

No of rows in the WHO Data Set:
11696
No of columns in the WHO Data Set:
34

12.5 Dimensions of the Data Frame

Reference URL
Return an int representing the number of axes / array dimensions.
Return 1 if Series. Otherwise return 2 if DataFrame.

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.ndim

Example-2

df = pd.Series({'a': 1, 'b': 2, 'c': 3})
df.ndim

12.6 Size of the Data Frame

Reference URL
Number of elements in the array.

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.size

Example-2

df = pd.Series({'a': 1, 'b': 2, 'c': 3})
df.size

12.7 Get Variables Name of the Data Frame

12.7.1 columns

The column labels of the DataFrame.

df = pd.read_csv('GCELL.txt',header=1)
df.columns

Index(['BSC Name', 'BTS Name', 'Cell Index', 'Cell Name', 'Freq. Band', 'MCC',
       'MNC', 'Cell LAC', 'Cell CI', 'NCC', 'BCC', 'Cell Extension Type',
       'Cell IUO Type', 'Enhanced Concentric Allowed',
       'Cell Inner/Extra Property', 'Same Group Cell Index',
       'BCCH IUO of Double Frequency Cell', 'Start Flex MAIO Switch',
       'HSN Modification Switch', 'CS Voice Service PRI',
       'CS Data Service PRI', 'PS High PRI Service PRI',
       'PS Low PRI Service PRI', 'Number of PBCCH Blocks',
       'Number of PAGCH Blocks', 'Number of PRACH Blocks', 'VIP Cell',
       'MOCN Sharing Cell', 'Support Dual High Frequency Bands',
       'Local Cell ID', 'Remark', 'Administrative State', 'active status',
       'Operator Name'],
      dtype='object')

12.7.2 keys

Pandas dataframe.keys() function returns the ‘info axis’ for the pandas object.
If the pandas object is series then it returns index.
If the pandas object is dataframe then it returns columns.
Reference URL

df = pd.read_csv('GCELL.txt',header=1)
df.keys()

Index(['BSC Name', 'BTS Name', 'Cell Index', 'Cell Name', 'Freq. Band', 'MCC',
       'MNC', 'Cell LAC', 'Cell CI', 'NCC', 'BCC', 'Cell Extension Type',
       'Cell IUO Type', 'Enhanced Concentric Allowed',
       'Cell Inner/Extra Property', 'Same Group Cell Index',
       'BCCH IUO of Double Frequency Cell', 'Start Flex MAIO Switch',
       'HSN Modification Switch', 'CS Voice Service PRI',
       'CS Data Service PRI', 'PS High PRI Service PRI',
       'PS Low PRI Service PRI', 'Number of PBCCH Blocks',
       'Number of PAGCH Blocks', 'Number of PRACH Blocks', 'VIP Cell',
       'MOCN Sharing Cell', 'Support Dual High Frequency Bands',
       'Local Cell ID', 'Remark', 'Administrative State', 'active status',
       'Operator Name'],
      dtype='object')

12.8 Index of the Data Frame

The index (row labels) of the DataFrame.

df = pd.read_csv('GCELL.txt',header=1)
df.index

RangeIndex(start=0, stop=11696, step=1)

12.9 axes of the Data Frame

Return a list representing the axes of the DataFrame.
It has the row axis labels and column axis labels as the only members.
They are returned in that order.

df = pd.read_csv('GCELL.txt',header=1)
df.axes

[RangeIndex(start=0, stop=11696, step=1),
 Index(['BSC Name', 'BTS Name', 'Cell Index', 'Cell Name', 'Freq. Band', 'MCC',
        'MNC', 'Cell LAC', 'Cell CI', 'NCC', 'BCC', 'Cell Extension Type',
        'Cell IUO Type', 'Enhanced Concentric Allowed',
        'Cell Inner/Extra Property', 'Same Group Cell Index',
        'BCCH IUO of Double Frequency Cell', 'Start Flex MAIO Switch',
        'HSN Modification Switch', 'CS Voice Service PRI',
        'CS Data Service PRI', 'PS High PRI Service PRI',
        'PS Low PRI Service PRI', 'Number of PBCCH Blocks',
        'Number of PAGCH Blocks', 'Number of PRACH Blocks', 'VIP Cell',
        'MOCN Sharing Cell', 'Support Dual High Frequency Bands',
        'Local Cell ID', 'Remark', 'Administrative State', 'active status',
        'Operator Name'],
       dtype='object')]

12.10 Set Index to the specfic Column

Set the DataFrame index (row labels) using one or more existing columns

Example-1

df = pd.read_csv('GCELL.txt',header=1)
df.set_index('BSC Name',inplace=True)
df.head(3)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BTS Name	Cell Index	Cell Name	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	BCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
BSC Name
HDIKBSC02	4905_Qazia Abad Layyah	0	14905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	14905	7	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
HDIKBSC02	4905_Qazia Abad Layyah	1	24905_Qazia Abad Layyah	GSM900_DCS1800	410	3	54438	24905	0	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
HDIKBSC02	6939_Chak No 90 Mor Layyah (3G-CII-4281)	2	16939_Chak No 90 Mor Layyah (3G-CII-4281)	GSM900_DCS1800	410	3	54438	16939	0	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

3 rows × 33 columns

Example-2

df = pd.read_csv('GCELL.txt',header=1)
df.set_index(['BSC Name','Cell Name'],inplace=True)
df.head(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

		BTS Name	Cell Index	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	BCC	Cell Extension Type	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
BSC Name	Cell Name
HDIKBSC02	14905_Qazia Abad Layyah	4905_Qazia Abad Layyah	0	GSM900_DCS1800	410	3	54438	14905	7	0	Normal_cell	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
HDIKBSC02	24905_Qazia Abad Layyah	4905_Qazia Abad Layyah	1	GSM900_DCS1800	410	3	54438	24905	0	0	Normal_cell	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

2 rows × 32 columns

12.11 Re-Set Index

df = pd.read_csv('GCELL.txt',header=1)
df.set_index(['BSC Name','Cell Name'],inplace=True)
df.head(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

		BTS Name	Cell Index	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	BCC	Cell Extension Type	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
BSC Name	Cell Name
HDIKBSC02	14905_Qazia Abad Layyah	4905_Qazia Abad Layyah	0	GSM900_DCS1800	410	3	54438	14905	7	0	Normal_cell	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
HDIKBSC02	24905_Qazia Abad Layyah	4905_Qazia Abad Layyah	1	GSM900_DCS1800	410	3	54438	24905	0	0	Normal_cell	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

2 rows × 32 columns

df.reset_index(['BSC Name','Cell Name'],inplace=True)
df.head(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	BSC Name	Cell Name	BTS Name	Cell Index	Freq. Band	MCC	MNC	Cell LAC	Cell CI	NCC	...	Number of PAGCH Blocks	Number of PRACH Blocks	VIP Cell	MOCN Sharing Cell	Support Dual High Frequency Bands	Local Cell ID	Remark	Administrative State	active status	Operator Name
0	HDIKBSC02	14905_Qazia Abad Layyah	4905_Qazia Abad Layyah	0	GSM900_DCS1800	410	3	54438	14905	7	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname
1	HDIKBSC02	24905_Qazia Abad Layyah	4905_Qazia Abad Layyah	1	GSM900_DCS1800	410	3	54438	24905	0	...	4	1	NO	NO	NO	4294967295	-	UNLOCK	ACTIVATED	opname

2 rows × 34 columns

12.12 set_axis

Assign desired index to given axis.
Indexes for column or row labels can be changed by assigning a list-like or Index

Example-1

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	A	B
0	1	4
1	2	5
2	3	6

df.set_axis(['a', 'b', 'c'], axis='rows', inplace=True)
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	A	B
a	1	4
b	2	5
c	3	6

Example-2

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	A	B
0	1	4
1	2	5
2	3	6

df.set_axis(['A1', 'B1'], axis='columns', inplace=True)
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	A1	B1
0	1	4
1	2	5
2	3	6

12.13 first_valid_index

Return index for first non-NA/null value.

df = pd.DataFrame({
                   "A1":[np.NaN,np.NaN,np.NaN,5,6,np.NaN,np.NaN], 
                   })
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	A1
0	NaN
1	NaN
2	NaN
3	5.0
4	6.0
5	NaN
6	NaN

df.first_valid_index()

12.14 last_valid_index

Return index for last non-NA/null value.

df = pd.DataFrame({
                   "A1":[np.NaN,np.NaN,np.NaN,5,6,np.NaN,np.NaN], 
                   })
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	A1
0	NaN
1	NaN
2	NaN
3	5.0
4	6.0
5	NaN
6	NaN

df.last_valid_index()

12.15 Why do some pandas commands end with parentheses (and others don't)?

Reference Video

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 12: General Operations on Pandas Data Frames

12.1 Head

12.2 Tail

12.3 Sample

12.3.1 random_state

12.3.2 fract

12.3.3 replace

12.3.4 weights

12.3.5 Random Sampling using skiprows

12.4 Shape of the Data Frame

12.5 Dimensions of the Data Frame

12.6 Size of the Data Frame

12.7 Get Variables Name of the Data Frame

12.7.1 columns

12.7.2 keys

12.8 Index of the Data Frame

12.9 axes of the Data Frame

12.10 Set Index to the specfic Column

12.11 Re-Set Index

12.12 set_axis

12.13 first_valid_index

12.14 last_valid_index

12.15 Why do some pandas commands end with parentheses (and others don't)?

FilesExpand file tree

Chapter12.md

Latest commit

History

Chapter12.md

File metadata and controls

Chapter 12: General Operations on Pandas Data Frames

12.1 Head

12.2 Tail

12.3 Sample

12.3.1 random_state

12.3.2 fract

12.3.3 replace

12.3.4 weights

12.3.5 Random Sampling using skiprows

12.4 Shape of the Data Frame

12.5 Dimensions of the Data Frame

12.6 Size of the Data Frame

12.7 Get Variables Name of the Data Frame

12.7.1 columns

12.7.2 keys

12.8 Index of the Data Frame

12.9 axes of the Data Frame

12.10 Set Index to the specfic Column

12.11 Re-Set Index

12.12 set_axis

12.13 first_valid_index

12.14 last_valid_index

12.15 Why do some pandas commands end with parentheses (and others don't)?