python_01/Kaggle_Pandas_Chapter_1_Creating_Reading_Writing at main · rjplus/python_01 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Kaggle - Pandas Course - Chapter 1 Creating, Reading and Writing

DataFrame

import Pandas as pd
pd.DataFrame({'Bob' :'I liked it', 'It was awful'','Sue':['Pretty awful','Bland']},index=['Product A', 'Product B'],name='products')

import Pandas as pd
pd.DataFrame({'Yes' :[50, 21], 'No':[31,2]})

import Pandas as pd
pd.DataFrame({"Bob":[11,23,44],"Matt":[45,11,56]})

import Pandas as pd
pd.DataFrame({
'Bob' :['I liked it', 'It was awful''],
'Sue':['Pretty awful','Bland']},
index=['Product A', 'Product B']
name='Products')

import Pandas as pd
pd.DataFrame({"Bob":[125,56,234],"Ken":[147,14,78]})

fruit_sales = pd.DataFrame({"Apples":[35,41],"Bananas":[21,34]},index=["2017 Sales","2018 Sales"])

animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals

animals.to_csv("cows_and_goats.csv")

--------------------------------------------------------------------------------------------------

Series
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

pd.Series([1, 2, 3, 4, 5])

A Series is, in essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:

pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". We'll see more of this in the next section of this tutorial.

items = ["Flour","Milk","Eggs","Spam"]
quantities = ["4 cups", "1 cup","2 large","1 can"]
recipe = pd.Series(quantities, index=items, name="Dinner")

--------------------------------------------------------------------------------------------------

CSV
Let's now set aside our toy datasets and see what a real dataset looks like when we read it into a DataFrame. We'll use the pd.read_csv() function to read the data into a DataFrame. This goes thusly:

wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv")

We can use the shape attribute to check how large the resulting DataFrame is:

wine_reviews.shape
output:
(129971, 14)

So our new DataFrame has 130,000 records split across 14 different columns. That's almost 2 million entries!

We can examine the contents of the resultant DataFrame using the head() command, which grabs the first five rows:

wine_reviews.head()

The pd.read_csv() function is well-endowed, with over 30 optional parameters you can specify. For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col.

wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

reviews = pd.read_csv('../input/wine-reviews/winemag-data_first150k.csv', index_col=0)

--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------

next - Kaggle - Pandas Course - Chapter 2 Indexing, Selecting & Assigning