|
1 | | -[](https://zenodo.org/badge/latestdoi/187497611) |
2 | 1 |
|
3 | 2 | # FormulaCloudData |
| 3 | + |
| 4 | +This repository contains the results of the distributional analysis of Mathematical Objects of Interest (MOI) for the datasets [arXMLiv 08/2018](https://sigmathling.kwarc.info/resources/arxmliv-dataset-082018/) and [zbMATH](https://zbmath.org/). |
| 5 | + |
| 6 | +## Download the Data |
| 7 | + |
| 8 | +For downloading the data either use `wget` or `curl` or go to the [releases of this GitHub repository](https://github.com/ag-gipp/FormulaCloudData/releases) and download it manually. |
| 9 | + |
| 10 | +#### arXMLiv |
| 11 | +Unzipped data requires 6.2GB free disk space. |
| 12 | +``` sh |
| 13 | +user@pc:~/zbmath$ wget https://github.com/ag-gipp/FormulaCloudData/releases/download/2.0-arxiv/arxmliv-distributions.zip |
| 14 | +user@pc:~/zbmath$ unzip arxmliv-distributions.zip |
| 15 | +``` |
| 16 | + |
| 17 | +#### zbMATH |
| 18 | +Unzipped data requires 1.1GB free disk space. |
| 19 | +```sh |
| 20 | +user@pc:~/zbmath$ wget https://github.com/ag-gipp/FormulaCloudData/releases/download/1.0-zb/zbmath-distributions.zip |
| 21 | +user@pc:~/zbmath$ unzip zbmath-distributions.zip |
| 22 | +``` |
| 23 | + |
| 24 | +## Explore the Data |
| 25 | + |
| 26 | +Each dataset contains multiple numbered files without file extensions. You simply can peek into one of the files to explore the general structure. Each entry contains the string representation (SR) of the unique expression in the dataset, the complexity value (C), the total term frequency (TF) in the dataset, and the document frequency (DF) of the expression. All files are CSV files (separated by colons). For example, if you look at the first line of the file `1` in zbMATH you would see the following |
| 27 | +``` sh |
| 28 | +user@pc:~/zbmath$ head -1 1 |
| 29 | +"mfrac(mi:d,mrow(mn:1,mo:+,mi:d))";3;1;1 |
| 30 | +``` |
| 31 | + |
| 32 | +If you want so search for specific expressions, say the mass-energy equivalence, we recommend to use `grep`. Here is an example to search for the entry in zbMATH: |
| 33 | +``` sh |
| 34 | +user@pc:~/zbmath$ grep '"mrow(mi:E,mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))"' * |
| 35 | +12:"mrow(mi:E,mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))";4;63;49 |
| 36 | +``` |
| 37 | +As we can see, `E=mc^2` is in file `12`, has a complexity of 4, a total term frequency of 63, and a document frequency of 49. |
| 38 | + |
| 39 | +With `grep` you can also use simple regular expressions to search for patterns. Let's check if the dataset contains expressions that substitutes `E` on the left-hand side by something else. |
| 40 | +``` sh |
| 41 | +user@pc:~/zbmath$ grep '"mrow(.*,mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))"' * |
| 42 | +1:"mrow(msub(mi:E,mn:0),mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))";4;1;1 |
| 43 | +11:"mrow(msup(mi:β,mrow(mo:-,mn:1)),mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))";4;1;1 |
| 44 | +12:"mrow(mi:E,mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))";4;63;49 |
| 45 | +2:"mrow(mi:ℰ,mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))";4;1;1 |
| 46 | +9:"mrow(mi:e,mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))";4;3;3 |
| 47 | +9:"mrow(mrow(mi:h,mo:ivt,mi:ν),mo:=,mrow(mi:m,mo:ivt,msup(mi:c,mn:2)))";4;1;1 |
| 48 | +``` |
| 49 | +We can see there are actually 6 distinguished left-hand sides in zbMATH: |
| 50 | +1) `E_0 = mc^2` |
| 51 | +2) `\beta^{-1} = mc^2` |
| 52 | +3) `E = mc^2` |
| 53 | +4) `\varepsilon = mc^2` |
| 54 | +5) `e = mc^2` |
| 55 | +6) `hv = mc^2` |
0 commit comments