Skip to content

Commit 3aae942

Browse files
authored
use_packages.rst more packages
1 parent 50fc71e commit 3aae942

1 file changed

Lines changed: 109 additions & 3 deletions

File tree

docs/day2/use_packages.rst

Lines changed: 109 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,33 @@ Why Python packages are important
1919
Python packages are pieces of tested Python code.
2020
Prefer using a Python package over writing your own code.
2121

22-
Why software modules are important
23-
----------------------------------
22+
.. admonition:: Some definitions
23+
24+
- Library: A collection of code used by a program.
25+
- Package: A library that has been made easily installable and reusable. Often published on public repositories such as the Python Package Index
26+
- Dependency: A requirement of another program, not included in that program.
27+
28+
What packages are out there
29+
---------------------------
30+
31+
- Core numerics libraries: Ex ``numpy``
32+
- Plotting: Ex ``matplotlib`` and ``seaborn``
33+
- Data analysis and other important core packages: Ex ``pandas``, ``dask``, ``xarray``
34+
- Interactive computing and human interface: Ex ``Jupyter``, ``spyder``
35+
- Data format support and data ingestion: Ex ``h5py``
36+
- Speeding up code and parallelism: Ex ``mpi4py``, ``numba``, ``dask``
37+
- Machine learning: Ex ``scikit-learn``
38+
- Deep learning: Ex ``pytorch``, ``tensorflow``, ``keras``
39+
40+
Plan of the week:
41+
42+
- Cover the use of the above packages in more or less detail
43+
44+
Why software modules are important on a HPC cluster
45+
---------------------------------------------------
2446

2547
Software modules allows users of any HPC cluster
26-
to activate their favorite software of any version.
48+
to activate their favorite software and/or packages of any version.
2749
This helps to assure reproducible research.
2850

2951
How to see which Python packages are installed
@@ -297,6 +319,8 @@ Exercise 1: using Python packages
297319
In all cases, the package is now installed.
298320
Well done!
299321

322+
323+
300324
.. admonition:: **Done?**
301325

302326
When done, and if you haven't done so yet,
@@ -307,3 +331,85 @@ Well done!
307331
You can easily navigate there by pressing the 'Next' arrow
308332
at the bottom of this page, at the right-hand side
309333

334+
.. admonition:: Core numerics libraries
335+
336+
- numpy - Arrays and array math.
337+
- scipy - Software for math, science, and engineering.
338+
339+
.. admonition:: Plotting
340+
341+
- matplotlib - Base plotting package, somewhat low level but almost everything builds on it.
342+
- seaborn - Higher level plotting interface; statistical graphics.
343+
- Vega-Altair - Declarative Python plotting.
344+
- mayavi - 3D plotting
345+
- Plotly - Big graphing library.
346+
347+
.. admonition:: Data analysis and other important core packages
348+
349+
- pandas - Columnar data analysi.
350+
- polars <https://pola.rs/> - Alternative to pandas that uses similar API, but is re-imagined for more speed.
351+
- Vaex - Alternative for pandas that uses similar API for lazy-loading and processing huge DataFrames.
352+
- Dask - Alternative to Pandas that uses similar API and can do analysis in parallel.
353+
- xarrray - Framework for working with mutli-dimensional arrays.
354+
- statsmodels - Statistical models and tests.
355+
- SymPy - Symbolic math.
356+
- networkx - Graph and network analysis.
357+
- graph-tool - Graph and network analysis toolkit implemented in C++.
358+
359+
.. admonition:: Interactive computing and human interface
360+
361+
- Interactive computing
362+
- IPython - Nicer interactive interpreter
363+
- Jupyter - Web-based interface to IPython and other languages (includes projects such as jupyter notebook, lab, hub, …)
364+
- Testing
365+
- pytest - Automated testing interface
366+
- Documentation
367+
- Sphinx - Documentation generator (also used for this lesson…)
368+
- Development environments
369+
- Spyder - Interactive Python development environment.
370+
- Visual Studio Code - Microsoft’s flagship code editor.
371+
- PyCharm - JetBrains’s Python IDE.
372+
- Binder - load any git repository in Jupyter automatically, good for reproducible research
373+
374+
.. admonition:: Data format support and data ingestion
375+
376+
- pillow - Image manipulation. The original PIL is no longer maintained, the new “Pillow” is a drop-in replacement.
377+
- h5py and PyTables - Interfaces to the HDF5 file format.
378+
379+
.. admonition:: Speeding up code and parallelism
380+
381+
- MPI for Python (mpi4py) - Message Passing Interface (MPI) in Python for parallelizing jobs.
382+
- cython - easily make C extensions for Python, also interface to C libraries
383+
- numba - just in time compiling of functions for speed-up
384+
- PyPy - Python written in Python so that it can internally optimize more.
385+
- Dask - Distributed array data structure for distributed computation
386+
- Joblib - Easy embarrassingly parallel computing
387+
- IPyParallel - Easy parallel task engine.
388+
- numexpr - Fast evaluation of array expressions by automatically compiling the arithmetic.
389+
390+
.. admonition:: Machine learning
391+
392+
- nltk - Natural language processing toolkit.
393+
- scikit-learn - Traditional machine learning toolkit.
394+
- xgboost - Toolkit for gradient boosting algorithms.
395+
396+
.. admonition:: Deep learning
397+
398+
- tensorflow - Deep learning library by Google.
399+
- pytorch - Currently the most popular deep learning library.
400+
- keras - Simple libary for doing deep learning.
401+
- huggingface - Ecosystem for sharing and running deep learning models and datasets. Incluses packages like transformers, datasets, accelerate, etc.
402+
- jax - Google’s Python library for running NumPy and automatic differentiation on GPUs.
403+
- flax - Neural network framework built on Jax.
404+
- equinox - Another neural network framework built on Jax.
405+
- DeepSpeed - Algorithms for running massive scale trainings. Included in many of the frameworks.
406+
- PyTorch Lightning - Framework for creating and training PyTorch models.
407+
- Tensorboard <https://www.tensorflow.org/tensorboard/> - Tool for visualizing model training on a web page.
408+
409+
.. admonition:: Other packages for special cases
410+
411+
- dateutil and pytz - Date arithmetic and handling, timezone database and conversion.
412+
413+
414+
415+

0 commit comments

Comments
 (0)