Skip to content

Commit 49ef3d9

Browse files
Fix spelling
1 parent 325a268 commit 49ef3d9

16 files changed

Lines changed: 34 additions & 32 deletions

File tree

.wordlist.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1116,3 +1116,5 @@ VZhV
11161116
whitespaces
11171117
wp
11181118
Ctrl
1119+
rescaled
1120+
UIs

docs/day2/IDEs.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -923,7 +923,7 @@ VS Code
923923
.. figure:: ../img/vscode_connected_to_rackham.png
924924

925925
When you first establish the ssh connection to Rackham, your VSCode server directory .vscode-server will be created in your home folder /home/[username].
926-
This also where VS Code will install all your extentions that can quickly fill up your home directory.
926+
This also where VS Code will install all your extensions that can quickly fill up your home directory.
927927

928928
Features
929929
########

docs/day2/IDEs_cmd.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -569,7 +569,7 @@ Principles
569569

570570
Spyder is not available on Dardel.
571571

572-
- Use the conda env you created in Exercise 2 in `Use isolated environemnts <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
572+
- Use the conda env you created in Exercise 2 in `Use isolated environments <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
573573

574574
.. code-block:: console
575575
@@ -585,7 +585,7 @@ Principles
585585

586586
Spyder is not available centrally on Rackham.
587587

588-
- Use the conda env you created in Exercise 2 in `Use isolated environemnts <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
588+
- Use the conda env you created in Exercise 2 in `Use isolated environments <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
589589

590590
.. code-block:: console
591591
@@ -697,7 +697,7 @@ Install VS Code on your local machine and follow the steps below to connect to t
697697
.. figure:: ../img/vscode_connected_to_rackham.png
698698

699699
When you first establish the ssh connection to the cluster, your VSCode server directory .vscode-server will be created in your home folder /home/[username].
700-
This also where VS Code will install all your extentions that can quickly fill up your home directory.
700+
This also where VS Code will install all your extensions that can quickly fill up your home directory.
701701

702702
Exercises with step-by-step instructions
703703
----------------------------------------

docs/day2/install_packages.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ There are 2 ways to install missing python packages at a HPC cluster.
2121
- Local installation, always available for the version of Python you had active when doing the installation
2222
- ``pip install --user [package name]``
2323
- Isolated environment. See next session.
24-
- virtual environents provided by python
24+
- virtual environments provided by python
2525
- conda
2626

2727
Normally you want reproducibility and the safe way to go is with isolated environments specific to your different projects.

docs/day2/may2024/install_packages.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -623,7 +623,7 @@ More info
623623

624624
- With a virtual environment you can tailor an environment with specific versions for Python and packages, not interfering with other installed python versions and packages.
625625
- Make it for each project you have for reproducibility.
626-
- There are different tools to create virtual environemnts.
626+
- There are different tools to create virtual environments.
627627

628628
- UPPMAX has ``conda`` and ``venv`` and ``virtualenv``
629629
- HPC2N has ``venv`` and ``virtualenv``

docs/day2/use_isolated_environments_old.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ What happens at activation?
5252
- Check with ``which python``, should show at path to the environment.
5353
- In conda you can define python version as well
5454
- Since ``venv`` is part of Python you will get the python version used when running the ``venv`` command.
55-
- Packages are defined by the environent.
55+
- Packages are defined by the environment.
5656
- Check with ``pip list``
5757
- Conda can only see what you installed for it.
5858
- venv and virtualenv also see other packages if you allowed for that when creating the environment (``--system-site-packages``).
@@ -196,7 +196,7 @@ The next points will be the same for all clusters
196196
.. note::
197197

198198
- You can use "pip list" on the command line (after loading the python module) to see which packages are available and which versions.
199-
- Some packaegs may be inhereted from the moduels yopu have loaded
199+
- Some packages may be inherited from the modules yopu have loaded
200200
- You can do ``pip list --local`` to see what is installed by you in the environment.
201201
- Some IDE:s like Spyder may only find those "local" packages
202202

@@ -238,7 +238,7 @@ Conda
238238

239239
.. tip::
240240

241-
- The conda environemnts inclusing many small files are by default stored in ``~/.conda`` folder that is in your $HOME directory with limited storage.
241+
- The conda environments including many small files are by default stored in ``~/.conda`` folder that is in your $HOME directory with limited storage.
242242
- Move your ``.conda`` directory to your project folder and make a soft link to it from ``$HOME``
243243
- Do the following (``mkdir -p`` ignores error output and will not recfreate anothe folder if it already exists):
244244
- (replace what is inside ``<>`` with relevant path)

docs/day3/big_data.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ High-Performance Data Analytics (HPDA)
3333
.. admonition:: What is it?
3434
:class: dropdown
3535

36-
- **High-performace data analytics (HPDA)**, a subset of high-performance computing which focuses on working with **large data**.
36+
- **High-performance data analytics (HPDA)**, a subset of high-performance computing which focuses on working with **large data**.
3737

3838
- The data can come from either computer models and simulations or from experiments and observations, and the goal is to preprocess, analyse and visualise it to generate scientific results.
3939

@@ -102,7 +102,7 @@ Allocating RAM
102102

103103
.. important::
104104

105-
- You do not have to explicitely run threads or other parallelism.
105+
- You do not have to explicitly run threads or other parallelism.
106106
- Allocating several nodes for one one big problem is not useful.
107107
- Note that shared memory among the cores works within node only.
108108

@@ -216,7 +216,7 @@ Exercise: Memory allocation (10 min)
216216

217217
- Slurm flag ``-n <number of cores>``
218218

219-
.. challenge:: Actually start an interactive sesion with 4 cores for 3 hours.
219+
.. challenge:: Actually start an interactive session with 4 cores for 3 hours.
220220

221221
- We will use it for the exercises later.
222222
- Since it may take some time to get the allocation we do it now already!
@@ -635,7 +635,7 @@ Xarray package
635635
- It also **borrows heavily from the Pandas package for labelled tabular data** and integrates tightly with dask for parallel computing.
636636

637637
- Xarray is particularly tailored to working with NetCDF files.
638-
- But work for aother files as well
638+
- But work for another files as well
639639

640640
- Explore it a bit in the (optional) exercise below!
641641

@@ -699,7 +699,7 @@ Big file → split into chunks → parallel workers → results combined.
699699

700700
.. admonition:: To think of
701701

702-
- Chunk size and number of them affect the performance due to overhad/administration of the chunking and combination.
702+
- Chunk size and number of them affect the performance due to overhead/administration of the chunking and combination.
703703
- Briefly explain what happens when a Dask job runs on multiple cores.
704704

705705

@@ -720,7 +720,7 @@ Big file → split into chunks → parallel workers → results combined.
720720
Polars package
721721
..............
722722

723-
- ``polars`` is a Python package that presnts itself as **Blazingly Fast DataFrame Library**
723+
- ``polars`` is a Python package that presents itself as **Blazingly Fast DataFrame Library**
724724
- Utilizes all available cores on your machine.
725725
- Optimizes queries to reduce unneeded work/memory allocations.
726726
- Handles datasets much larger than your available RAM.
@@ -988,7 +988,7 @@ Set up the environment
988988

989989
- https://stackoverflow.com/questions/72155514/when-to-use-xarray-over-numpy-for-medium-rank-multidimensional-data
990990

991-
- Browse: https://docs.xarray.dev/en/v2024.11.0/getting-started-guide/why-xarray.html or change to more applicabe version in drop-down menu to lower right.
991+
- Browse: https://docs.xarray.dev/en/v2024.11.0/getting-started-guide/why-xarray.html or change to more applicable version in drop-down menu to lower right.
992992
- find something interesting for you! Test some lines if you want to!
993993
- tips:
994994
- Pandas: https://docs.xarray.dev/en/v2024.11.0/getting-started-guide/faq.html#why-is-pandas-not-enough

docs/day3/big_data_old.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ High-Performance Data Analytics (HPDA)
1717
.. admonition:: What is it?
1818
:class: dropdown
1919

20-
- **High-performace data analytics (HPDA)**, a subset of high-performance computing which focuses on working with large data.
20+
- **High-performance data analytics (HPDA)**, a subset of high-performance computing which focuses on working with large data.
2121

2222
- The data can come from either computer models and simulations or from experiments and observations, and the goal is to preprocess, analyse and visualise it to generate scientific results.
2323

@@ -351,7 +351,7 @@ Allocating RAM
351351
.. important::
352352

353353
- Allocate many cores or a full node!
354-
- You do not have to explicitely run threads or other parallelism.
354+
- You do not have to explicitly run threads or other parallelism.
355355

356356
- Note that shared memory among the cores works within node only.
357357

@@ -622,7 +622,7 @@ Exercises
622622
ssh nid001057
623623
624624
625-
Use the conda env you created in Exercise 2 in `Use isolated environemnts <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
625+
Use the conda env you created in Exercise 2 in `Use isolated environments <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
626626

627627
.. code-block:: console
628628

docs/day3/not_used/Seaborn-Intro.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -353,7 +353,7 @@ For the ``map_`` commands, the kwargs depend on the type of plot that was passed
353353
Heatmap and Clustermap
354354
^^^^^^^^^^^^^^^^^^^^^^
355355

356-
Sometimes you have too many variables to look at with pairplots or corner plots, and the best you can do is map the correlation coeffcients between different parameters. Alternatively, you might have a DataFrame with a comparable number of numeric rows and columns, and you want to see how the rows and columns correlate. Either way, the DataFrame must be able to be coerced to ``ndarray``.
356+
Sometimes you have too many variables to look at with pairplots or corner plots, and the best you can do is map the correlation coefficients between different parameters. Alternatively, you might have a DataFrame with a comparable number of numeric rows and columns, and you want to see how the rows and columns correlate. Either way, the DataFrame must be able to be coerced to ``ndarray``.
357357

358358
Once again, this type of plot is extremely tedious to make in pure Matplotlib, but in Seaborn, it can require as little as one line of code. There are two functions that do this: ``sb.heatmap()`` and ``sb.clustermap()``. The main difference between the two is that the latter attempts to rearrange variables such that those that are correlated are positioned next to each other on the plot, while the former simply lists the variables in the order they were given in the DataFrame.
359359

docs/day3/not_used/old-pandas.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Intro to Pandas
1212
* A simple interface with the Seaborn plotting library, and increasingly also Matplotlib.
1313
* Easy multi-threading with Numba.
1414

15-
**Limitations.** Pandas alone has somewhat limited support for parallelization, N-dimensional data structures, and datasets much larger than 3 GiB. Fortunately, there are packages like ``dask`` and ``polars`` that can help. In partcular, ``dask`` will be covered in a later lecture in this workshop. There is also the ``xarray`` package that provides many similar functions to Pandas for higher-dimensional data structures, but that is outside the scope of this workshop.
15+
**Limitations.** Pandas alone has somewhat limited support for parallelization, N-dimensional data structures, and datasets much larger than 3 GiB. Fortunately, there are packages like ``dask`` and ``polars`` that can help. In particular, ``dask`` will be covered in a later lecture in this workshop. There is also the ``xarray`` package that provides many similar functions to Pandas for higher-dimensional data structures, but that is outside the scope of this workshop.
1616

1717
.. admonition:: Get today's tarball!
1818

@@ -197,7 +197,7 @@ Load and Run
197197
ssh nid001057
198198
199199
200-
Use the conda env you created in Exercise 2 in `Use isolated environemnts <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
200+
Use the conda env you created in Exercise 2 in `Use isolated environments <https://uppmax.github.io/HPC-python/day2/use_isolated_environments.html#exercises>`_
201201

202202
.. code-block:: console
203203
@@ -579,7 +579,7 @@ Iteration over DataFrames, Series, and GroupBy objects is slow and should be avo
579579
* ``.str.upper()``/``.lower()``
580580
* ``.str.<r>strip()``
581581
* ``.str.<r>split(' ', n=None, expand=False)`` can return outputs of several different shapes depending on ``expand`` (bool, whether to return split strings as lists in 1 column or substrings in multiple columns) and ``n`` (maximum number of columns to return).
582-
* Unlike for regular strings, ``df.str.replace()`` does not accept dict-type input where keys are existing substrings and values are replacements. For multiple simulataneous replacements via dictionary input, use ``df.replace()`` without the ``.str``.
582+
* Unlike for regular strings, ``df.str.replace()`` does not accept dict-type input where keys are existing substrings and values are replacements. For multiple simultaneous replacements via dictionary input, use ``df.replace()`` without the ``.str``.
583583

584584
**Statistics.** Nearly all NumPy statistical functions and a few ``scipy.mstats`` functions can be called as aggregate methods of DataFrames, Series, any subsets thereof, or GroupBy objects. All of them ignore NaNs by default. For DataFrames and GroupBy objects, you must set ``numeric_only=True`` to exclude non-numeric data, and specify whether to aggregate along rows (``axis=0``) or columns (``axis=1``) .
585585

0 commit comments

Comments
 (0)