Skip to content

Latest commit

 

History

History
561 lines (398 loc) · 23.4 KB

File metadata and controls

561 lines (398 loc) · 23.4 KB

Fire up GraphLab Create

We always start with this line before using any part of GraphLab Create. It can take up to 30 seconds to load the GraphLab library - be patient!

The first time you use GraphLab create, you must enter a product key to license the software for non-commerical academic use. To register for a free one-year academic license and obtain your key, go to dato.com.

graphlab.get_dependencies()
# import graphlab
# Set product key on this computer. After running this cell, you will not need to re-enter your product key. 
graphlab.product_key.set_product_key('YOUR_PRODUCT_KEY')

# Limit number of worker processes. This preserves system memory, which prevents hosted notebooks from crashing.
graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)

Load a tabular data set

sf = graphlab.SFrame('people-example.csv')
Parsing completed. Parsed 7 lines in 0.04311 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,str,long]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Parsing completed. Parsed 7 lines in 0.01905 secs.

SFrame basics

sf # we can view first few lines of table
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]
sf.tail()  # view end of the table
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

GraphLab Canvas

# .show() visualizes any data structure in GraphLab Create
# If you want Canvas visualization to show up on this notebook, 
# add this line:
graphlab.canvas.set_target('ipynb')
sf['age'].show(view='Categorical')

Inspect columns of dataset

sf['Country']
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']
sf['age']
dtype: int
Rows: 7
[24L, 23L, 22L, 23L, 23L, 22L, 25L]

Some simple columnar operations

sf['age'].mean()
23.142857142857146
sf['age'].max()
25L

Create new columns in our SFrame

sf
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']
sf
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown USA 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]
sf['age'] * sf['age']
dtype: int
Rows: 7
[576L, 529L, 484L, 529L, 529L, 484L, 625L]

Use the apply function to do a advance transformation of our data

sf['Country']
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']
sf['Country'].show()
def transform_country(country):
    if country == 'USA':
        return 'United States'
    else:
        return country
transform_country('Brazil')
'Brazil'
transform_country('Brasil')
'Brasil'
transform_country('USA')
'United States'
sf['Country'].apply(transform_country)
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']
sf['Country'] = sf['Country'].apply(transform_country)
sf
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown United States 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]