@@ -19,11 +19,33 @@ Why Python packages are important
1919Python packages are pieces of tested Python code.
2020Prefer using a Python package over writing your own code.
2121
22- Why software modules are important
23- ----------------------------------
22+ .. admonition :: Some definitions
23+
24+ - Library: A collection of code used by a program.
25+ - Package: A library that has been made easily installable and reusable. Often published on public repositories such as the Python Package Index
26+ - Dependency: A requirement of another program, not included in that program.
27+
28+ What packages are out there
29+ ---------------------------
30+
31+ - Core numerics libraries: Ex ``numpy ``
32+ - Plotting: Ex ``matplotlib `` and ``seaborn ``
33+ - Data analysis and other important core packages: Ex ``pandas ``, ``dask ``, ``xarray ``
34+ - Interactive computing and human interface: Ex ``Jupyter ``, ``spyder ``
35+ - Data format support and data ingestion: Ex ``h5py ``
36+ - Speeding up code and parallelism: Ex ``mpi4py ``, ``numba ``, ``dask ``
37+ - Machine learning: Ex ``scikit-learn ``
38+ - Deep learning: Ex ``pytorch ``, ``tensorflow ``, ``keras ``
39+
40+ Plan of the week:
41+
42+ - Cover the use of the above packages in more or less detail
43+
44+ Why software modules are important on a HPC cluster
45+ ---------------------------------------------------
2446
2547Software modules allows users of any HPC cluster
26- to activate their favorite software of any version.
48+ to activate their favorite software and/or packages of any version.
2749This helps to assure reproducible research.
2850
2951How to see which Python packages are installed
@@ -297,6 +319,8 @@ Exercise 1: using Python packages
297319In all cases, the package is now installed.
298320Well done!
299321
322+
323+
300324.. admonition :: **Done?**
301325
302326 When done, and if you haven't done so yet,
@@ -307,3 +331,85 @@ Well done!
307331 You can easily navigate there by pressing the 'Next' arrow
308332 at the bottom of this page, at the right-hand side
309333
334+ .. admonition :: Core numerics libraries
335+
336+ - numpy - Arrays and array math.
337+ - scipy - Software for math, science, and engineering.
338+
339+ .. admonition :: Plotting
340+
341+ - matplotlib - Base plotting package, somewhat low level but almost everything builds on it.
342+ - seaborn - Higher level plotting interface; statistical graphics.
343+ - Vega-Altair - Declarative Python plotting.
344+ - mayavi - 3D plotting
345+ - Plotly - Big graphing library.
346+
347+ .. admonition :: Data analysis and other important core packages
348+
349+ - pandas - Columnar data analysi.
350+ - polars <https://pola.rs/> - Alternative to pandas that uses similar API, but is re-imagined for more speed.
351+ - Vaex - Alternative for pandas that uses similar API for lazy-loading and processing huge DataFrames.
352+ - Dask - Alternative to Pandas that uses similar API and can do analysis in parallel.
353+ - xarrray - Framework for working with mutli-dimensional arrays.
354+ - statsmodels - Statistical models and tests.
355+ - SymPy - Symbolic math.
356+ - networkx - Graph and network analysis.
357+ - graph-tool - Graph and network analysis toolkit implemented in C++.
358+
359+ .. admonition :: Interactive computing and human interface
360+
361+ - Interactive computing
362+ - IPython - Nicer interactive interpreter
363+ - Jupyter - Web-based interface to IPython and other languages (includes projects such as jupyter notebook, lab, hub, …)
364+ - Testing
365+ - pytest - Automated testing interface
366+ - Documentation
367+ - Sphinx - Documentation generator (also used for this lesson…)
368+ - Development environments
369+ - Spyder - Interactive Python development environment.
370+ - Visual Studio Code - Microsoft’s flagship code editor.
371+ - PyCharm - JetBrains’s Python IDE.
372+ - Binder - load any git repository in Jupyter automatically, good for reproducible research
373+
374+ .. admonition :: Data format support and data ingestion
375+
376+ - pillow - Image manipulation. The original PIL is no longer maintained, the new “Pillow” is a drop-in replacement.
377+ - h5py and PyTables - Interfaces to the HDF5 file format.
378+
379+ .. admonition :: Speeding up code and parallelism
380+
381+ - MPI for Python (mpi4py) - Message Passing Interface (MPI) in Python for parallelizing jobs.
382+ - cython - easily make C extensions for Python, also interface to C libraries
383+ - numba - just in time compiling of functions for speed-up
384+ - PyPy - Python written in Python so that it can internally optimize more.
385+ - Dask - Distributed array data structure for distributed computation
386+ - Joblib - Easy embarrassingly parallel computing
387+ - IPyParallel - Easy parallel task engine.
388+ - numexpr - Fast evaluation of array expressions by automatically compiling the arithmetic.
389+
390+ .. admonition :: Machine learning
391+
392+ - nltk - Natural language processing toolkit.
393+ - scikit-learn - Traditional machine learning toolkit.
394+ - xgboost - Toolkit for gradient boosting algorithms.
395+
396+ .. admonition :: Deep learning
397+
398+ - tensorflow - Deep learning library by Google.
399+ - pytorch - Currently the most popular deep learning library.
400+ - keras - Simple libary for doing deep learning.
401+ - huggingface - Ecosystem for sharing and running deep learning models and datasets. Incluses packages like transformers, datasets, accelerate, etc.
402+ - jax - Google’s Python library for running NumPy and automatic differentiation on GPUs.
403+ - flax - Neural network framework built on Jax.
404+ - equinox - Another neural network framework built on Jax.
405+ - DeepSpeed - Algorithms for running massive scale trainings. Included in many of the frameworks.
406+ - PyTorch Lightning - Framework for creating and training PyTorch models.
407+ - Tensorboard <https://www.tensorflow.org/tensorboard/> - Tool for visualizing model training on a web page.
408+
409+ .. admonition :: Other packages for special cases
410+
411+ - dateutil and pytz - Date arithmetic and handling, timezone database and conversion.
412+
413+
414+
415+
0 commit comments