Skip to content

Commit a4eb457

Browse files
authored
Release preparation for v0.5 (#10)
1 parent e86adf3 commit a4eb457

16 files changed

Lines changed: 180 additions & 152 deletions

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ raw/othanieljen/
8787
raw/polyglottaafricana/
8888
raw/robbeetstriangulation/
8989
raw/robinsonap/
90+
raw/rutulbasiclexicon/
9091
raw/servamalagasy/
9192
raw/sohartmannchin/
9293
raw/spagnolmaltese/
@@ -105,3 +106,6 @@ raw/yuchinese/
105106
raw/zgraggenmadang/
106107
raw/zhoubizic/
107108

109+
.DS_Store
110+
.idea/
111+

.zenodo.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"title": "CLICS\u2074",
2+
"title": "CLICS 4",
33
"access_right": "open",
44
"keywords": [
55
"cldf:Wordlist",
@@ -10,10 +10,10 @@
1010
"name": "Annika Tjuka"
1111
},
1212
{
13-
"name": "Christoph Rzymski"
13+
"name": "Robert Forkel"
1414
},
1515
{
16-
"name": "Robert Forkel"
16+
"name": "Christoph Rzymski"
1717
},
1818
{
1919
"name": "Johann-Mattis List"
@@ -29,5 +29,5 @@
2929
"license": {
3030
"id": "CC-BY-4.0"
3131
},
32-
"description": "<p>Cite the source of the dataset as:</p>\n\n<blockquote>\n<p>Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS\u2074: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.4]. Passau: MCL Chair at the University of Passau.</p>\n</blockquote>"
32+
"description": "<p>Cite the source of the dataset as:</p>\n\n<blockquote>\n<p>Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS 4: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.5]. Passau: MCL Chair at the University of Passau.</p>\n</blockquote>"
3333
}

NOTES.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ All you need to install the packages required is to install the current package
3131
```
3232
$ git clone https://github.com/clics/clics4.git
3333
$ cd clics4
34-
$ git checkout v0.4
34+
$ git checkout v0.5
3535
$ pip install -e .
3636
```
3737

README.md

Lines changed: 30 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,75 +1,85 @@
1-
# CLICS
1+
# CLICS 4
22

33
## How to cite
44

55
If you use these data please cite
66
- the original source
7-
> Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.4]. Passau: MCL Chair at the University of Passau.
7+
> Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS 4: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.5]. Passau: MCL Chair at the University of Passau.
88
- the derived dataset using the DOI of the [particular released version](../../releases/) you were using
99

1010
## Description
1111

1212

1313
This dataset is licensed under a CC-BY-4.0 license
1414

15-
Available online at https://clics.clld.org
15+
Available online at https://github.com/clics/clics4
1616

1717
## Notes
1818

19-
# CLICS4: Workflow
19+
# CLICS 4: Workflow for Data Generation
2020

21-
The CLICS4 workflow differs slightly from the workflow we have used in [CLICS3](https://github.com/clics/clics3). We now have drastically increased the number of datasets, but we have also made sure to use stricter selection criteria for the languages to be included. This also results in different numbers with respect to the number of concepts and the number of language varieties.
21+
The CLICS 4 workflow differs slightly from the workflow we have used in [CLICS3](https://github.com/clics/clics3). We now have drastically increased the number of datasets, but we have also made sure to use stricter selection criteria for the languages to be included. This also results in different numbers with respect to the number of concepts and the number of language varieties.
2222

23-
## How to Cite CLICS?
23+
## How to Cite CLICS 4?
2424

25-
> Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS⁴: An Improved Database of Cross-Linguistic Colexifications [Dataset Version 0.4]. Passau: MCL Chair at the University of Passau.
25+
If you use the data in your work, make sure to cite the correct version that you are using. For the currently most recent version, we recommend to cite it as follows:
2626

27-
## W1: What is New in Comparison with CLICS³?
27+
> Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS 4: An Improved Database of Cross-Linguistic Colexifications [Dataset Version 0.5]. Passau: MCL Chair at the University of Passau. https://github.com/clics/clics4/
28+
29+
Since the whole workflow underlying CLICS 4 regardless of the individual versions will be presented in a freely available publication, we also appreciate if you cite this forthcoming paper (already available as preprint):
30+
31+
> Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (forthcoming): Advancing the Database of Cross-Linguistic Colexifications with New Workflows and Data. Proceedings of the 16th International Conference on Computational Semantics (IWCS). Düsseldorf: Association for Computational Linguistics. 1-15. Preprint: https://doi.org/10.48550/arXiv.2503.11377
32+
33+
## What is New in Comparison with CLICS³?
2834

2935
The following points summarize major differences between CLICS³ and CLICS⁴:
3036

31-
- more datasets in CLICS⁴: CLICS⁴ now uses 98 datasets, while CLICS³ used 30
32-
- fully transcribed data instead of data in orthography: CLICS⁴ now uses data fully transcribed to IPA, ignoring all datasets that only offer orthography (this results in fewer languages at times, despite the increase in datasets)
33-
-
37+
- more datasets in CLICS 4: CLICS 4 now uses 98 datasets, while CLICS³ used 30
38+
- fully transcribed data instead of data in orthography: CLICS 4 now uses data fully transcribed to IPA, ignoring all datasets that only offer orthography (this results in fewer languages at times, despite the increase in datasets)
39+
- treatment of concepts: we now model some "hidden" colexifications that have been ignored before, since the concept identifiers in Concepticon cover two separate concepts that are frequently colexified as one single concept, as separate concepts (these are marked in the CLDF representation)
40+
- we provide a full-fledged CLDF dataset now, in which the concept network is also modeled with the help of CLDF
3441

42+
## Workflow for Data Aggregation
3543

36-
## W.1: Install Packages
44+
In the following, we run those interested in trying the workflow that we applied to construct CLICS 4 on their own machines through the workflow in due detail. To run the workflow, we assume that users have enough experience with Python in order to know how to create their own fresh virtual environment and know how to run commands in the terminal.
45+
46+
### W1: Install Packages
3747

3848
All you need to install the packages required is to install the current package with [PIP](https://pypi.org/project/pip) as follows (using a fresh virtual environment), after having downloaded the `clics4` package with [GIT](https://git-scm.com). The following lines also obtain the version that we used in this demo.
3949
```
4050
$ git clone https://github.com/clics/clics4.git
4151
$ cd clics4
42-
$ git checkout v0.4
52+
$ git checkout v0.5
4353
$ pip install -e .
4454
```
4555

4656
## W2: Download Data
4757

48-
In order to do a fresh download of all the data that we use in CLICS, you need to run the following command:
58+
In order to do a fresh download of all the data that we use in CLICS 4, you need to run the following command:
4959

5060
```
5161
$ cldfbench download lexibank_clics4.py
5262
```
5363

54-
## W3: Create CLICS4 Dataset
64+
## W3: Create CLICS 4 Dataset
5565

5666
Before you can run the code, you must make sure to have downloaded all data and also obtained actual copies of Glottolog, Concepticon, and CLTS. An easy way to obtain these with the help of `cldfbench` is to run the command `cldfbench catconfig` and follow instructions there. If you use a Windows machine, you will need some additional preparations (see [Snee 2024](https://calc.hypotheses.org/7852)), so we kindly ask you to follow the respective instructions in Snee (2024).
5767

5868
If you have successfully run the `catconfig` subcommand, just type:
5969

6070
```
61-
$ cldfbench lexibank.makecldf --glottolog-version=v5.1 --concepticon-version=v3.3.0 --clts-version=v2.3.0 lexibank_clics4.py
71+
$ cldfbench lexibank.makecldf --glottolog-version=v5.2.1 --concepticon-version=v3.4.0 --clts-version=v2.3.0 lexibank_clics4.py
6272
```
6373

6474
In the other case, specify the explicit locations of the repositories for Glottolog, Concepticon, and CLTS as follwo.
6575

6676
```
67-
cldfbench lexibank.makecldf --glottolog-repos=Path2Glottolog --concepticon-repos=Path2Concepticon --clts-repos=Path2Clics --glottolog-version=v5.1 --concepticon-version=v3.3.0 --clts-version=v2.3.0 lexibank_clics4.py
77+
cldfbench lexibank.makecldf --glottolog-repos=Path2Glottolog --concepticon-repos=Path2Concepticon --clts-repos=Path2Clics --glottolog-version=v5.2.1 --concepticon-version=v3.4.0 --clts-version=v2.3.0 lexibank_clics4.py
6878
```
6979

70-
## W4: What needs to be done before we publish CLICS4 as Version 1.0
80+
## W4: CLLD Version of CLICS 4
7181

72-
This release is a CLICSdataset that we consider generally good enough with respect to the data to be used in publications (small errors would always be possible with such large numbers of data aggregated from different sources). However, we emphasize that there are a couple of shortcomings for now that we will try to handle before publishing a new version of CLICS that succeeds the current version 3.0 at https://clics.clld.org. Before publishing this new CLLD version of CLICS, we will implement a new representation of the data in order to adhere to the representation of ParameterNetworks in the new [CLDF specification](https://cldf.clld.org).
82+
This release is a CLICS 4 dataset that we consider generally good enough with respect to the data to be used in publications (small errors would always be possible with such large numbers of data aggregated from different sources). However, we emphasize that there are a couple of shortcomings for now that we will try to handle before publishing a new web-based version of CLICS that succeeds the current version 3.0 at https://clics.clld.org. Before publishing this new CLLD version of CLICS 4, we will implement a new representation of the data in order to adhere to the representation of ParameterNetworks in the new [CLDF specification](https://cldf.clld.org).
7383

7484

7585

@@ -112,8 +122,8 @@ This release is a CLICS⁴ dataset that we consider generally good enough with r
112122
Name | GitHub user | Description | Role
113123
--- | --- | --- | ---
114124
Annika Tjuka | @annikatjuka | maintainer | Author
115-
Christoph Rzymski | @chrzyki | maintainer | Author
116125
Robert Forkel | @xrotwang | maintainer | Author
126+
Christoph Rzymski | @chrzyki | maintainer | Author
117127
Johann-Mattis List | @LinguList | maintainer | Author
118128

119129

cldf/README.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
11
# CLDF datasets
22

3-
- [Wordlist: CLICS](#ds-wordlistmetadatajson)
4-
- [StructureDataset: CLICS](#ds-structuredatasetmetadatajson)
3+
- [Wordlist: CLICS 4](#ds-wordlistmetadatajson)
4+
- [StructureDataset: CLICS 4](#ds-structuredatasetmetadatajson)
55

66
<a name="ds-wordlistmetadatajson"> </a>
77

8-
# Wordlist CLICS
8+
# Wordlist CLICS 4
99

1010
**CLDF Metadata**: [Wordlist-metadata.json](./Wordlist-metadata.json)
1111

1212
**Sources**: [sources.bib](./sources.bib)
1313

1414
property | value
1515
--- | ---
16-
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.4]. Passau: MCL Chair at the University of Passau.
16+
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS 4: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.5]. Passau: MCL Chair at the University of Passau.
1717
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF Wordlist](http://cldf.clld.org/v1.0/terms.rdf#Wordlist)
18-
[dc:identifier](http://purl.org/dc/terms/identifier) | https://clics.clld.org
18+
[dc:identifier](http://purl.org/dc/terms/identifier) | https://github.com/clics/clics4
1919
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
20-
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | git@github.com:clics/clics4
21-
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="git@github.com:clics/clics4/tree/dd245b2">git@github.com:clics/clics4 v0.3-10-gdd245b2</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.2.1">Glottolog v5.2.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.4.0">Concepticon v3.4.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
22-
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.13.3</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
20+
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/clics/clics4
21+
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/clics/clics4/tree/e86adf3">clics/clics4 v0.4-10-ge86adf3</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v5.2.1">Glottolog v5.2.1</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/v3.4.0">Concepticon v3.4.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
22+
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.13.6</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
2323
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | clics4
2424
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
2525

@@ -134,21 +134,21 @@ Name/Property | Datatype | Description
134134

135135
<a name="ds-structuredatasetmetadatajson"> </a>
136136

137-
# StructureDataset CLICS
137+
# StructureDataset CLICS 4
138138

139139
**CLDF Metadata**: [StructureDataset-metadata.json](./StructureDataset-metadata.json)
140140

141141
**Sources**: [sources.bib](./sources.bib)
142142

143143
property | value
144144
--- | ---
145-
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.4]. Passau: MCL Chair at the University of Passau.
145+
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS 4: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.5]. Passau: MCL Chair at the University of Passau.
146146
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF StructureDataset](http://cldf.clld.org/v1.0/terms.rdf#StructureDataset)
147-
[dc:identifier](http://purl.org/dc/terms/identifier) | https://clics.clld.org
147+
[dc:identifier](http://purl.org/dc/terms/identifier) | https://github.com/clics/clics4
148148
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
149-
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | git@github.com:clics/clics4
150-
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="git@github.com:clics/clics4/tree/dd245b2">git@github.com:clics/clics4 v0.3-10-gdd245b2</a></li><li><a href="git@github.com:glottolog/glottolog/tree/v5.2.1">Glottolog v5.2.1</a></li><li><a href="git@github.com:concepticon/concepticon-data/tree/v3.4.0">Concepticon v3.4.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
151-
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.13.3</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
149+
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/clics/clics4
150+
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/clics/clics4/tree/e86adf3">clics/clics4 v0.4-10-ge86adf3</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v5.2.1">Glottolog v5.2.1</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/v3.4.0">Concepticon v3.4.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
151+
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.13.6</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
152152
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | clics4
153153
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
154154

cldf/StructureDataset-metadata.json

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,30 +6,30 @@
66
}
77
],
88
"aboutUrl": null,
9-
"dc:bibliographicCitation": "Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS\u2074: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.4]. Passau: MCL Chair at the University of Passau.",
9+
"dc:bibliographicCitation": "Tjuka, Annika; Forkel, Robert; Rzymski, Christoph; and List, Johann-Mattis (2025): CLICS 4: An Improved Database of Cross-Linguistic Colexifications [Dataset, Version 0.5]. Passau: MCL Chair at the University of Passau.",
1010
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#StructureDataset",
11-
"dc:identifier": "https://clics.clld.org",
11+
"dc:identifier": "https://github.com/clics/clics4",
1212
"dc:isVersionOf": null,
1313
"dc:license": "https://creativecommons.org/licenses/by/4.0/",
1414
"dc:related": null,
1515
"dc:source": "sources.bib",
16-
"dc:title": "CLICS\u2074",
17-
"dcat:accessURL": "git@github.com:clics/clics4",
16+
"dc:title": "CLICS 4",
17+
"dcat:accessURL": "https://github.com/clics/clics4",
1818
"prov:wasDerivedFrom": [
1919
{
20-
"rdf:about": "git@github.com:clics/clics4",
20+
"rdf:about": "https://github.com/clics/clics4",
2121
"rdf:type": "prov:Entity",
22-
"dc:created": "v0.3-10-gdd245b2",
22+
"dc:created": "v0.4-10-ge86adf3",
2323
"dc:title": "Repository"
2424
},
2525
{
26-
"rdf:about": "git@github.com:glottolog/glottolog",
26+
"rdf:about": "https://github.com/glottolog/glottolog",
2727
"rdf:type": "prov:Entity",
2828
"dc:created": "v5.2.1",
2929
"dc:title": "Glottolog"
3030
},
3131
{
32-
"rdf:about": "git@github.com:concepticon/concepticon-data",
32+
"rdf:about": "https://github.com/concepticon/concepticon-data",
3333
"rdf:type": "prov:Entity",
3434
"dc:created": "v3.4.0",
3535
"dc:title": "Concepticon"
@@ -44,7 +44,7 @@
4444
"prov:wasGeneratedBy": [
4545
{
4646
"dc:title": "python",
47-
"dc:description": "3.13.3"
47+
"dc:description": "3.13.6"
4848
},
4949
{
5050
"dc:title": "python-packages",

0 commit comments

Comments
 (0)