applying the minimal template

derpetermann · derpetermann · commit 875193555c84 · 2025-09-02T17:20:54.000+02:00
diff --git a/curation/index.md b/curation/index.md
@@ -1,16 +1,9 @@
-<p align="center">&nbsp;&nbsp;&nbsp;
-  <a href="../glottocodes/index.md">Glottocode tutorial &nbsp; ⬅ &nbsp;</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-  <a href="../README.md">Overview</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-</p>
-
 # Data curation
 
 This tutorial shows how to turn the language polygons from the [Digitising tutorial](../digitising/index.md), along with their attributes and metadata from the [Attributes and Metadata tutorial](../metadata/index.md), into a dataset ready for [Glottography](https://github.com/Glottography). Data curation aggregates the polygons into languages and language families according to Glottolog.
 
 
-## Requirements
+### Requirements
 
 **Software:** 
 
@@ -29,7 +22,7 @@ A BibTeX file containing a reference to the source publication in BibTeX format.
 A local clone of the latest [Glottolog](https://github.com/glottolog) repository (see below). This data will be used to assign the language polygons to languoids classified as languages or families according to Glottolog.
 
 
-## Overview 
+### Overview 
 
 Before we can run the `pyglottography` scripts and curate the language polygons, a bit of housekeeping and data prep is needed. This tutorial covers the following steps:
 
@@ -40,7 +33,7 @@ Before we can run the `pyglottography` scripts and curate the language polygons,
 - [Run the data curation script](#running-the-data-curation-script).
 
 
-## Installing the required packages
+### Installing the required packages
 
 Creating a Glottography dataset requires the [`pyglottography`](https://pypi.org/project/pyglottography/) package, which can be installed from command line or terminal:
 
@@ -62,8 +55,8 @@ Finally, the GDAL library for handling different geospatial data formats is also
 
 
 
-## Gathering data in proper format
-###  Converting the language polygons to GeoJSON format
+### Gathering data in proper format
+####  Converting the language polygons to GeoJSON format
 
 When digitising the language polygons, we stored them in GeoPackage format, a well-supported format in QGIS for handling spatial data. `pyglottography`, however, requires GeoJSON, a lightweight, human-readable format for representing geographic features, so we need to convert the GeoPackage. This task that takes little more than a line of Python code:
 
@@ -77,7 +70,7 @@ You might be wondering why we don't use GeoJSON from the start. GeoPackages allo
 
 Note that the script above assumes the GeoPackage data is already in the `EPSG:4326` CRS, which is true for our dataset. If a different CRS was used during digitisation, the data must be reprojected to `EPSG:4326` first. Note also that the output file name (`dataset.geojson`) already hints at an important aspect of running the `pyglottography` data curation: the script expects all input data to follow specific naming conventions and be placed in designated locations on your computer.
 
-### Cloning the Glottolog data 
+#### Cloning the Glottolog data 
 
 The `pyglottography` package uses Glottolog to align the polygons with languages and language families. To do this, it requires a local copy of the Glottolog raw data, which can be cloned from GitHub. Cloning creates a full local copy of the Glottolog repository on a local computer. Navigate to a suitable folder and clone the current release of the Glottolog raw data from GitHub using the command line or terminal:
 
@@ -95,7 +88,7 @@ git pull
 This checks the status of your local repository and pulls the latest changes from GitHub.
 
 
-## Initiating a Glottography dataset
+### Initiating a Glottography dataset
 
 Next, we initiate a new Glottography dataset from the command line or terminal:
 
@@ -128,7 +121,7 @@ The three main folders are still mostly empty:
 - `raw`: in this folder, the curation script expects the (raw) language polygons in GeoJSON format 
 - `cldf`: in this folder the curation script stores the CLDF datasets, i.e. the polygons aggragated to the Glottolog languages and language families
 
-## Distributing the data into their designated folders
+### Distributing the data into their designated folders
 
 Next, we distribute the language polygons, attribute data, and reference into their designated folders. The `pyglottography` curation script requires the data to follow specific file-naming conventions and to be stored in the correct folders:
 
@@ -150,7 +143,7 @@ The screenshot below shows the `raw` and the `etc` folder after distributing the
 &nbsp;
 
 
-## Running the data curation script
+### Running the data curation script
 
 With all data in place, we can now run the curation process. From a command-line terminal, navigate into the Glottography dataset folder and invoke the `makecldf` command, pointing it to the dataset script. The `--glottolog` flag specifies the path to your local clone of the Glottolog data:
 
@@ -160,7 +153,7 @@ cldfbench makecldf cldfbench_schapper2020papuan.py --glottolog PATH_TO_GLOTTOLOG
 ```
 The `makecldf` command is part of the cldfbench workflow. It takes care of assembling the CLDF dataset from the language polygons in the `raw` folder and attributes and reference in the `etc` folder.
    
-## Output
+### Output
 
 The CLDF folder includes three sets of vector geometries enriched with Glottocodes at three levels of aggregation in GeoJSON format:
 
@@ -172,13 +165,6 @@ The `cldf` folder includes three sets of vector geometries, each enriched with G
 
 **Family areas:** Speaker areas aggregated at the language family level according to Glottolog's classification (`families.geojson`). The Family areas GeoJSON file of the Alor–Pantar languages map can be downloaded [here](out/families.geojson).
 
----------
-<p align="center">&nbsp;&nbsp;&nbsp;
-  <a href="../glottocodes/index.md">Glottocode tutorial &nbsp; ⬅ &nbsp;</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-  <a href="../README.md">Overview</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-</p>
 
 
 
diff --git a/glottocodes/index.md b/glottocodes/index.md
@@ -1,18 +1,9 @@
-<p align="center">&nbsp;&nbsp;&nbsp;
-  <a href="../metadata/index.md">Attributes & Metadata tutorial &nbsp; ⬅ &nbsp;</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-  <a href="../README.md">Overview</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-  <a href="../curation/index.md">&nbsp; ➡ &nbsp; Data curation tutorial</a>
-  &nbsp;&nbsp;&nbsp;
-</p>
-
-# Finding Glottocodes
+# Glottocodes
 
 In this tutorial, we will find the Glottocodes for the language areas shown on the Alor-Pantar map (Schapper, 2020).  We georeferenced the map in the [Georeferencing tutorial](../georeferencing/index.md), digitised the language areas in the [Digitising tutorial](../digitising/index.md) and recorded attributes and metadata in the [Attributes and Metadata tutorial](../metadata/index.md). 
 
 
-## Requirements 
+### Requirements 
 
 **Software:** [Python 3](https://www.python.org/) is a high-level free and open-source programming language. This tutorial uses version 3.12 with the `guess_glottocode` package installed. For installation instructions, see below.
 
@@ -21,7 +12,7 @@ In this tutorial, we will find the Glottocodes for the language areas shown on t
 **API keys:** The `guess_glottocode` package sends requests to a large language model (LLM) provider to find Glottocodes. Currently supported providers are [Google Gemini](https://aistudio.google.com/apikey) and [Anthropic](https://console.anthropic.com/settings/keys). To use either service, you must first create an API key from the provider (see below).
 
 
-## What is a Glottocode? 
+### What is a Glottocode? 
 
 Glottocodes are unique identifiers for languages, dialects, and language families, maintained by [Glottolog](https://glottolog.org).  
 The simplest way to find a Glottocode is to look it up manually:
@@ -34,7 +25,7 @@ This works fine for a few languages, but it quickly becomes tedious when you nee
 Instead, we can add Glottocodes programmatically using the [`guess_glottocode` package] (https://github.com/derpetermann/guess_glottocode) in Python. The package can guess a language's Glottocode either using a large language model (LLM) via an API or by querying Wikipedia. This tutorial focuses on finding Glottocodes with an LLM.  
 
 
-## Install the `guess_glottocode` package
+### Install the `guess_glottocode` package
 
 The package requires Python 3.12+ and depends on several other packages, including:
 
@@ -51,20 +42,20 @@ pip install git+https://github.com/derpetermann/guess_glottocode.git
 
 Full installation guidelines are available in the project's [README file](https://github.com/derpetermann/guess_glottocode/blob/main/README.md).
 
-## API keys
+### API keys
 
-When using a large language model (LLM) to find a Glottocode, the package sends a request to an LLM provider.  Currently supported providers are **Google Gemini** and **Anthropic**. To use either service, you must first create an API key.
+When using a large language model (LLM) to find a Glottocode, the package sends a request to an LLM provider.  Currently supported providers are Google Gemini and Anthropic. To use either service, you must first create an API key.
 
-- **Google Gemini** - Create an API key at [https://aistudio.google.com/apikey](https://aistudio.google.com/apikey).  
+- Google Gemini - Create an API key at [https://aistudio.google.com/apikey](https://aistudio.google.com/apikey).  
   You'll need a Google account and must be logged in.
-- **Anthropic** - Create an API key at [https://console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys).  
+- Anthropic - Create an API key at [https://console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys).  
   You'll need to sign up for and log in to an Anthropic account.
 
 Moderate use of the package should not incur third-party API costs, but heavy usage may.
 
 The first time you call `llm.guess_glottocode` with Gemini or Anthropic, the package will prompt you to enter your API key.  The key is stored securely on your local machine via the `keyring` package, so you won't need to enter it again in future sessions.
 
-## Load the data
+### Load the data
 
 First, we load the Alor-Pantar GeoPackage file using GeoPandas' `read_file()` function. GeoPandas is a Python library for working with geospatial data in tabular form. Its `read_file()` function imports spatial data into a GeoDataFrame, preserving both attribute data and geometry, including the coordinate reference system (CRS).
 
@@ -93,7 +84,7 @@ print(polygons.head(10))
     9  10      Kamang          Map3  2020      MULTIPOLYGON (((124.8879 -8.16338, 124.8897 -8... 
     
                                                 
-## Finding a Suitable Glottocode Using an LLM
+### Finding a Suitable Glottocode Using an LLM
 
 While large language models (LLMs) can sometimes guess a language's Glottocode from its name, this approach is unreliable. LLMs may hallucinate nonexistent codes or confuse languages with similar names.  A more reliable approach is to 
 
@@ -124,7 +115,7 @@ def glottocode_per_row(row):
 polygons['unverified_glottocode'] = polygons.apply(glottocode_per_row, axis=1)
 ```
 
-## Verify the Glottocde match
+### Verify the Glottocde match
 
 We can verify Glottocode matches using additional information maintained by Glottolog. Each Glottocode is linked to a GitHub page containing the language's primary name and any alternative names. The `verify_glottocode_guess` function queries that page and checks whether the language name appears as the primary name or among the alternatives. If it does, the function returns `True`; otherwise, it returns `False`.  
 
@@ -179,7 +170,7 @@ print(polygons.head[['name', 'glottocode']])
 The approach successfully identified and verified Glottocodes for 18 out of 25 languages. The entries with `None` indicate that no verified Glottocode was found automatically, so these will need to be added manually or through further refinement, such as a larger buffer size.
 
 
-## Export to file 
+### Export to file 
 
 Once the Glottocodes are verified, we remove the column `unverified_glottocode` and export the GeoDataFrame to a `GeoPackage` file.
 
@@ -194,18 +185,8 @@ We also export the attribute information as a `CSV` file. For details on why thi
 polygons.drop(columns="geometry").to_csv("schapper2020papuan.csv", index=False)
 ```
 
-## Output
+### Output
 
 A GeoPackage file containing the language polygons (see the [Digitising tutorial](../digitising/index.md)), attributes, and Glottocodes (see also the [Attributes and metadata tutorial](../metadata/index.md)).  The Alor–Pantar language polygons, including attribute data and Glottocodes, can be downloaded [here](../digitising/out/schapper2020papuan.gpkg). Note that in this file some Glottocodes were added manually.  
 
 A CSV file containing the attribute and Glottocode data, linked to the digitised polygons via the `id` column. The CSV file for the Alor–Pantar language polygons can be downloaded [here](../metadata/out/schapper2020papuan.csv).
-
---------------
-<p align="center">&nbsp;&nbsp;&nbsp;
-  <a href="../metadata/index.md">Attributes & Metadata tutorial &nbsp; ⬅ &nbsp;</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-  <a href="../README.md">Overview</a>
-  &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
-  <a href="../curation/index.md">&nbsp; ➡ &nbsp; Data curation tutorial</a>
-  &nbsp;&nbsp;&nbsp;
-</p>
diff --git a/metadata/index.md b/metadata/index.md
@@ -2,7 +2,7 @@
 
 This tutorial introduces the attributes and metadate required when digitising Glottography language areas from source publications. Glottography uses BibTeX entries to uniquely reference each source publication, and Glottocodes to identify the languages depicted in their maps. Because Glottocodes were introduced only relatively recently, many source publications — especially older ones — likely do not include them. As a result, assigning the correct Glottocodes to a language area can be time-consuming and may require additional effort. To assist with this process, a separate [Glottocode tutorial](../glottocodes/index.md)  explains how to automatically query and assign Glottocodes to a language area based on language name and geographic location.
 
-## Requirements 
+### Requirements 
 **Software**: [QGIS](https://qgis.org) is a free and open-source geographic information system (GIS). This tutorial uses version QGIS 3.34.4-Prizren.
 
 **Data:** Digitised language polygons in GeoPackage format (`.gpkg`).  In this tutorial, we use the digitised Alor–Pantar language polygons from the [Digitising tutorial](../digitising/index.md), which can be downloaded [here](../digitising/out/schapper2020papuan.gpkg).