A Berkeley library for introductory data science.
written by Professor John DeNero, Professor David Culler, Sam Lau, and Alvin Wan
For an example of usage, see the Berkeley Data 8 class.
Use pip to install the package:
pip install datascience
To verify that the package is installed correctly, run:
python -c "import datascience; print(datascience.__version__)"
After installing the package, you can start using datascience by importing it in Python:
from datascience import Table
# Create a simple table
data = Table().with_columns(
"Name", ["Alice", "Bob", "Charlie"],
"Age", [25, 30, 35]
)
# Display the table
data.show()
data = data.with_column("Height (cm)", [165, 180, 175])
sorted_data = data.sort("Age", descending=True)
sorted_data.show()
- Table() : Creates an empty table
- Table.with_columns(column_name, values, ...) : Adds multiple columns to a table
- Table.with_column(column_name, values) : Adds a single column
- Table.drop(column_name) : Removes a column from the table.
- Table.sort(column_name, descending=False) : Sorts rows based on a column.
- Table.plot(column_x, column_y) : Plots a graph using two columns.
- Table.hist(column) : Generates a histogram.
- Table.scatter(column_x, column_y) : Creates a scatter plot.
Problem: ModuleNotFoundError: No module named 'datascience' Solution: Ensure the package is installed using:
pip install --upgrade datascience
Problem: ImportError: cannot import name 'Table' from 'datascience' Solution: Try the following:
Verify installation by running:
python -c "import datascience; print(datascience.__version__)"
Problem: Tables are not displaying correctly in Jupyter Notebook. Solution:
pip install ipython notebook