korpling
diff --git a/‎CHANGELOG.md‎
Lines changed: 24 additions & 16 deletions b/‎CHANGELOG.md‎
Lines changed: 24 additions & 16 deletions
diff --git a/‎cli/tests/cli.rs‎
Lines changed: 1 addition & 1 deletion b/‎cli/tests/cli.rs‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cli/tests/snapshots/cli__list_corpora_fully_loaded.snap‎
Lines changed: 1 addition & 3 deletions b/‎cli/tests/snapshots/cli__list_corpora_fully_loaded.snap‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎cli/tests/snapshots/cli__list_corpora_partially_loaded.snap‎
Lines changed: 1 addition & 3 deletions b/‎cli/tests/snapshots/cli__list_corpora_partially_loaded.snap‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎cli/tests/snapshots/cli__show_corpus_info.snap‎
Lines changed: 1 addition & 3 deletions b/‎cli/tests/snapshots/cli__show_corpus_info.snap‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎core/src/graph/mod.rs‎
Lines changed: 10 additions & 3 deletions b/‎core/src/graph/mod.rs‎
Lines changed: 10 additions & 3 deletions
diff --git a/‎docs/src/rest/configuration.md‎
Lines changed: 13 additions & 7 deletions b/‎docs/src/rest/configuration.md‎
Lines changed: 13 additions & 7 deletions
@@ -5,10 +5,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
-## Fixed
+### Added
+
+- New optional `file` option for the `[logging]` section in the webservice
+configuration. Can be used to additionally output all log messages to the given
+file.
+- `Graph:ensure_loaded_parallel` returns the actually loaded components that did
+exist.
+
+### Fixed
 
-- Crash could occur when finding inversed connected nodes in PrePost graph
-  storage due to a subtraction resulting in negative number.
+- Less frequent corpus cache status updates in log. Before, every corpus access
+could trigger an entry into the log which is not desired under heavy load.
 
 ## [3.7.1] - 2025-04-14
 
@@ -53,7 +61,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed
 
 - Fixed out of bounds error parsing legacy meta queries with multiple
-  alternatives (https://github.com/korpling/graphANNIS/pull/308) 
+  alternatives (https://github.com/korpling/graphANNIS/pull/308)
 
 ## [3.5.0] - 2024-09-02
 
@@ -75,7 +83,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed
 
 - Do not use recursion to calculate the indirect coverage edges in the model
-  index, since this could fail for deeply nested structures. 
+  index, since this could fail for deeply nested structures.
 
 ## [3.3.3] - 2024-07-12
 
@@ -85,7 +93,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   importer.
 - Fix `FileTooLarge` error when searching for token precedence where the
   statistics indicate that this search is impossible.
-  
+
 ## [3.3.2] - 2024-07-04
 
 ### Fixed
@@ -295,7 +303,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Compile releases on Ubuntu 20.04 instead of 18.04, which means the minimal
   GLIBC version is 2.31. This is necessary, since GitHub actions deprecated this
   Ubuntu version.
-  
+
 
 ### Fixed
 
@@ -375,7 +383,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   first token of context regions in `subgraph` when the returned context regions
   do not overlap. This allows sorting the context regions that belong to the
   same data source but are not connected by ordinary `Ordering/annis/` edges.
-  
+
 
 ## [2.2.2] - 2022-07-26
 
@@ -487,7 +495,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   C-API), so this release is not technically backwards-compatible. Adapting to
   the updated API should be restricted to handle the errors returned by the
   functions.
-- The changes to the error handling also affects the C-API. These following 
+- The changes to the error handling also affects the C-API. These following
   functions have now a `ErrorList` argument:
   * `annis_cs_list_node_annotations`
   * `annis_cs_list_edge_annotations`
@@ -525,7 +533,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - RelANNIS version 3.3 files with segmentation might also have a missing "span" column.
   In case the "span" column is null, always attempt to reconstruct the actual value from
   the corresponding node annotation instead of failing directly.
-  
+
 ### Changed
 
 - Avoid unnecessary compacting of disk tables when collecting graph updates during import.
@@ -542,7 +550,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   adjacency lists. This improves search for tokens because the Coverage components
   are typically adjacency lists, and we need to make sure the token nodes don't
   have any outgoing edges.
-- Fixed miscalculation of whitespace string capacity which could lead to 
+- Fixed miscalculation of whitespace string capacity which could lead to
   `memory allocation failed` error.
 
 ## [1.4.0] - 2021-12-03
@@ -554,9 +562,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed
 
 - Limit the used main memory cache per `DiskTable` by only using a disk block cache for the C1 table.
-  Since we use a lot of disk-based maps during import of relANNIS files, the previous behavior could 
+  Since we use a lot of disk-based maps during import of relANNIS files, the previous behavior could
   add up to > 1GB easily, wich amongst other issues caused #205 to happen.
-  With this change, during relANNIS import the main memory usage should be limited to be less than 4GB, 
+  With this change, during relANNIS import the main memory usage should be limited to be less than 4GB,
   which seams more reasonable than the previous 20+GB
 - Reduce memory footprint during import when corpus contains a lot of escaped strings (as in #205)
 - Avoid creating small fragmented main memory when importing corpora from relANNIS to help to fix #205
@@ -592,7 +600,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
-- Added generic operator negation without existence assumption, 
+- Added generic operator negation without existence assumption,
   if only one side of the negated operator is optional  (#187).
 
 ## [1.1.0] - 2021-09-09
@@ -674,7 +682,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Removed
 
-- Replaced the `update_statistics` function in `CorpusStorage` with the more general `reoptimize_implementation` function. 
+- Replaced the `update_statistics` function in `CorpusStorage` with the more general `reoptimize_implementation` function.
   The new function is available via the `re-optimize` command in the CLI.
 
 ### Added
@@ -684,7 +692,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Fixed
 
-- Importing a relANNIS corpus could fail because the integer would wrap around from negative to a large value when calculating the `tok-whitespace-after` annotation value. This large value would then be used to allocate memory, which will fail. 
+- Importing a relANNIS corpus could fail because the integer would wrap around from negative to a large value when calculating the `tok-whitespace-after` annotation value. This large value would then be used to allocate memory, which will fail.
 - Adding `\$` to the escaped input sequence in the relANNIS import, fixing issues with some old SFB 632 corpora
 - Unbound near-by-operator (`^*`) was not limited to 50 in quirks mode
 - Workaround for duplicated document names when importing invalid relANNIS corpora
 
@@ -13,7 +13,7 @@ fn standard_filter() -> Settings {
     // Filter out the time stamps
     settings.add_filter("[0-9]+:[0-9]+:[0-9]+ ", "12:00:00");
     // The loaded and also total available RAM size can vary
-    settings.add_filter("[0-9.]+ [MG]B / [0-9.]+ [MG]B", "100 / 300 MB");
+    settings.add_filter("[0-9.]+[MG]B / [0-9.]+[MG]B", "100MB / 300MB");
     // The loading and time can vary
     settings.add_filter("in [0-9]+ ms", "in 10 ms");
     settings
 
@@ -15,8 +15,7 @@ success: true
 exit_code: 0
 ----- stdout -----
 12:00:00[INFO] Loaded corpus sample-disk-based-3.3
-12:00:00[INFO] Total cache size is 100 / 300 MB and loaded corpora are: sample-disk-based-3.3.
-12:00:00[INFO] Total cache size is 100 / 300 MB and loaded corpora are: sample-disk-based-3.3.
+12:00:00[INFO] Corpus cache after preloading sample-disk-based-3.3: 100MB / 300MB - loaded corpora [sample-disk-based-3.3]
 12:00:00[INFO] Preloaded corpus in 10 ms
 sample-disk-based-1.5 (not loaded)
 sample-disk-based-3.2 (not loaded)
@@ -27,4 +26,3 @@ sample-memory-based-3.3 (not loaded)
 graphANNIS says good-bye!
 
 ----- stderr -----
-
@@ -15,8 +15,7 @@ success: true
 exit_code: 0
 ----- stdout -----
 12:00:00[INFO] Loaded corpus sample-disk-based-3.3
-12:00:00[INFO] Total cache size is 100 / 300 MB and loaded corpora are: sample-disk-based-3.3.
-12:00:00[INFO] Total cache size is 100 / 300 MB and loaded corpora are: sample-disk-based-3.3.
+12:00:00[INFO] Corpus cache after loading components: 100MB / 300MB - loaded corpora [sample-disk-based-3.3]
 12:00:00[INFO] Executed query in 10 ms
 result: 44 matches in 4 documents
 sample-disk-based-1.5 (not loaded)
@@ -28,4 +27,3 @@ sample-memory-based-3.3 (not loaded)
 graphANNIS says good-bye!
 
 ----- stderr -----
-
@@ -15,8 +15,7 @@ success: true
 exit_code: 0
 ----- stdout -----
 12:00:00[INFO] Loaded corpus sample-disk-based-3.3
-12:00:00[INFO] Total cache size is 100 / 300 MB and loaded corpora are: sample-disk-based-3.3.
-12:00:00[INFO] Total cache size is 100 / 300 MB and loaded corpora are: sample-disk-based-3.3.
+12:00:00[INFO] Corpus cache after preloading sample-disk-based-3.3: 100MB / 300MB - loaded corpora [sample-disk-based-3.3]
 12:00:00[INFO] Preloaded corpus in 10 ms
 Status: "fully loaded"
 Token search shortcut possible: true
@@ -75,4 +74,3 @@ Status: "fully loaded"
 graphANNIS says good-bye!
 
 ----- stderr -----
-
@@ -927,8 +927,13 @@ impl<CT: ComponentType> Graph<CT> {
     }
 
     /// Ensure that the graph storage for a the given component is loaded and ready to use.
-    /// Loading is done in paralell.
-    pub fn ensure_loaded_parallel(&mut self, components_to_load: &[Component<CT>]) -> Result<()> {
+    /// Loading is done in parallel.
+    ///
+    /// Returns the list of actually loaded (and existing) components.
+    pub fn ensure_loaded_parallel(
+        &mut self,
+        components_to_load: &[Component<CT>],
+    ) -> Result<Vec<Component<CT>>> {
         // We only load known components, so check the map if the entry exists
         // and that is not loaded yet.
         let components_to_load: Vec<_> = components_to_load
@@ -959,11 +964,13 @@ impl<CT: ComponentType> Graph<CT> {
             .collect();
 
         // insert all the loaded components
+        let mut result = Vec::with_capacity(loaded_components.len());
         for (c, gs) in loaded_components {
             let gs = gs?;
             self.components.insert(c.clone(), Some(gs));
+            result.push(c.clone());
         }
-        Ok(())
+        Ok(result)
     }
 
     pub fn optimize_impl(&mut self, disk_based: bool) -> Result<()> {
 
@@ -17,6 +17,9 @@ cache = {PercentOfFreeMemory = 25.0}
 
 [logging]
 debug = false
+# Optional path to a logging file.
+# If not given, only log to stdout/stderr
+file = "/var/log/graphannis.log"
 
 [auth]
 anonymous_access_all_corpora = false
@@ -38,25 +41,28 @@ A new database file will be created at this path when the service is started and
 Also, you can decide if you want to prefer disk-based storage of annotations by setting the value for the `disk_based` key to `true`.
 
 You can configure how much memory is used by the service for caching loaded corpora with the `cache` key.
-There are two types of strategies: 
+There are two types of strategies:
 
-- `PercentOfFreeMemory` estimates the free space of memory for the system during startup and only uses the given value (as percent) of the available free space. 
+- `PercentOfFreeMemory` estimates the free space of memory for the system during startup and only uses the given value (as percent) of the available free space.
 - `FixedMaxMemory` will use at most the given value in Megabytes.
 
 For example, setting the configuration value to
 ```toml
 cache = {PercentOfFreeMemory = 80.0}
-``` 
-will use 80% of the available free memory and 
+```
+will use 80% of the available free memory and
 ```toml
 cache = {FixedMaxMemory = 8000}
-``` 
+```
 at most 8 GB of RAM.
 
 ## [logging] section
 
-Per default, graphANNIS will only output information, warning and error messages.
-To also enable debug output, set the value for the `debug` field to `true`.
+Per default, graphANNIS will only output information, warning and error
+messages. To also enable debug output, set the value for the `debug` field to
+`true`. You can set the optional value `file` to a file path to also add the log
+messages to the given file. **The log file is not emptied automatically, you
+have to clean it regulary**, e.g. with `logrotate` on a Linux server.
 
 ## [auth] section