SQLite-Based Binary Logging by alex-maiorov · Pull Request #10633 · The-OpenROAD-Project/OpenROAD

alex-maiorov · 2026-06-10T21:15:37Z

Summary

I added a way to dump large amounts of data from various components into an SQLite database. The logger works by calling a logging function analogous to the existing logger functions, and then the data is processed by a background thread for actual writing. The actual database has separate tables for each numeric log(logToDb or logToDbBulk), and those methods are designed for low latency. There is also logToDbMetadata, which is a simple and non-optimized text to text key-value logging mechanism that goes into a single table.

Type of Change

New Feature: SQLite Binary Logging.

Impact

If turned on and called, qualitatively I have observed the performance impact to be fairly minimal for what I was doing. Given the use-case of large dumps, I would like suggestions for writing reasonable performance and correctness tests.

At runtime, this is an entirely opt-in feature. However, this does still require sqlite3 as a compile time dependency, though SQLite is almost universally available across different platforms. I also modified find_messages.py so that it enforces the uniqueness of the tool id/message id for those methods.

Note on Incompleteness

I am aware that this code is not quite ready for merging. I originally wrote it as a debugging and data analysis tool on the side for a research project I was working on, however it ended up being very useful so I would like to get it to a merge-ready state eventually.

Verification

I have verified that the local build succeeds (./etc/Build.sh).
I have run the relevant tests and they pass.
My code follows the repository's formatting guidelines.
I have included tests to prevent regressions.
I have signed my commits (DCO).

Related Issues

None that I know of, but @povik was interested in this functionality.

Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

… ones Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

github-actions

Welcome to OpenROAD! Thanks for opening your first PR.
Before we review:

Contribution Guide: https://openroad.readthedocs.io/en/latest/contrib/contributing.html
Build Instructions: https://openroad.readthedocs.io/en/latest/contrib/BuildWithCMake.html

Please ensure:

CI passes
Code is properly formatted
Tests are included where applicable
A maintainer will review shortly!

gemini-code-assist

Code Review

This pull request introduces a parallel logging mechanism to stream large quantities of numeric data into an SQLite database, featuring a background thread, type-erased queues, memory pressure management, and corresponding TCL/C++ APIs. The code review identifies several critical issues and improvement opportunities, including an uncommitted transaction bug when draining empty queues, a database connection leak during shutdown due to retained shared pointers in the schema registry, a SQL injection risk from unvalidated table names, the use of deprecated atomic shared pointer functions, a potential race condition in concurrent calls to stopLogDb, and a regex limitation in the message finder script for multi-line template headers.

gemini-code-assist · 2026-06-10T21:19:32Z

+  size_t drain_to_db(sqlite3* db, size_t max_records) override
+  {
+    // The latter means the statement could not be built, so fail the whole thing. Otherwise use cached stmt.
+    if (!stmt_ && !build_insert_stmt(db)) {
+      return 0;
+    }
+
+    sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr);
+
+    size_t count = 0;
+    bool ok = true;
+
+    while (count < max_records) {
+      row_type row;
+      {
+        std::lock_guard<std::mutex> lock(mutex_);
+        if (queue_.empty()) {
+          break;
+        }
+        row = std::move(queue_.front());
+        queue_.pop_front();
+        item_count_.fetch_sub(1, std::memory_order_release);
+      }
+
+      // C++17 template magic to pass the column name strings as column keys for sqlite
+      bind_row(stmt_, row, std::index_sequence_for<Args...>{});
+      int rc = sqlite3_step(stmt_);
+      sqlite3_reset(stmt_);
+      if (rc != SQLITE_DONE) {
+        ok = false;
+        break;
+      }
+      ++count;
+    }
+
+    // Rollback if there was a failure.
+    if (count > 0) {
+      sqlite3_exec(db, ok ? "COMMIT" : "ROLLBACK",
+                   nullptr, nullptr, nullptr);
+    }


If drain_to_db is called when the queue is empty, sqlite3_exec(db, "BEGIN", ...) is executed, but the loop terminates immediately with count == 0. Consequently, the matching COMMIT or ROLLBACK is never executed because of the if (count > 0) guard. This leaves an uncommitted transaction open on the database connection, which will cause subsequent database operations to fail or behave unexpectedly.

We should only start the transaction (BEGIN) when we are about to process the first item in the loop.

size_t drain_to_db(sqlite3* db, size_t max_records) override { // The latter means the statement could not be built, so fail the whole thing. Otherwise use cached stmt. if (!stmt_ && !build_insert_stmt(db)) { return 0; } size_t count = 0; bool ok = true; while (count < max_records) { row_type row; { std::lock_guard<std::mutex> lock(mutex_); if (queue_.empty()) { break; } row = std::move(queue_.front()); queue_.pop_front(); item_count_.fetch_sub(1, std::memory_order_release); } if (count == 0) { sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr); } // C++17 template magic to pass the column name strings as column keys for sqlite bind_row(stmt_, row, std::index_sequence_for<Args...>{}); int rc = sqlite3_step(stmt_); sqlite3_reset(stmt_); if (rc != SQLITE_DONE) { ok = false; break; } ++count; } // Rollback if there was a failure. if (count > 0) { sqlite3_exec(db, ok ? "COMMIT" : "ROLLBACK", nullptr, nullptr, nullptr); } last_flush_ms_.store( std::chrono::duration_cast<std::chrono::milliseconds>( std::chrono::steady_clock::now().time_since_epoch()) .count(), std::memory_order_release); return count; }

gemini-code-assist · 2026-06-10T21:19:32Z

+  // Destroy queues before closing so TypedQueue destructors can finalize
+  // their prepared statements.
+  local_registry.clear();


During shutdown, local_registry.clear() is called to destroy the local queues and trigger their destructors (which finalize the prepared statements). However, schema_registry_ still holds std::shared_ptr references to these same queues. Because of this, the reference count of the queues does not drop to zero, their destructors are not called, and the prepared statements remain active. As a result, sqlite3_close(db_) will fail and return SQLITE_BUSY, leading to a leaked database connection.

To fix this, we must clear schema_registry_ as well before clearing local_registry.

Suggested change

// Destroy queues before closing so TypedQueue destructors can finalize

// their prepared statements.

local_registry.clear();

// Destroy queues before closing so TypedQueue destructors can finalize

// their prepared statements.

schema_registry_.clear();

local_registry.clear();

gemini-code-assist · 2026-06-10T21:19:32Z

+SchemaInfo Logger::logToDbBuildSchemaInfo(sqlite3* db,
+                                          SchemaKey key,
+                                          const std::string& table_name,
+                                          std::string_view header,
+                                          const std::vector<SQLiteType>& types)
+{
+  SchemaInfo info;


The table_name parameter passed to logToDb is a runtime string that is directly concatenated into SQL statements (e.g., CREATE TABLE IF NOT EXISTS " + info.table_name and INSERT INTO " + info_.table_name). If a developer passes a table name containing spaces, hyphens, or SQL keywords, it will cause SQL syntax errors. If the table name is dynamically constructed from user/design inputs, it could also present a SQL injection risk.

We should validate that table_name contains only alphanumeric characters and underscores. If validation fails, we can report a terminal error using this->error which will terminate the program.

SchemaInfo Logger::logToDbBuildSchemaInfo(sqlite3* db, SchemaKey key, const std::string& table_name, std::string_view header, const std::vector<SQLiteType>& types) { for (char c : table_name) { if (!std::isalnum(c) && c != '_') { this->error(UTL, 111, "Invalid database table name '{}'. Only alphanumeric characters and underscores are allowed.", table_name); } } SchemaInfo info;

References

Calls to logger_->error are expected to terminate the program, so additional error handling like returning an error code or throwing an exception is not necessary.

gemini-code-assist · 2026-06-10T21:19:32Z

+class SchemaRegistry {
+public:
+  using RegistryMap
+      = std::unordered_map<SchemaKey, std::shared_ptr<AbstractQueue>, SchemaKeyHasher>;
+
+  SchemaRegistry() : registry_ptr_(std::make_shared<const RegistryMap>()) {}
+
+  std::shared_ptr<const RegistryMap> get_map() const {
+    return std::atomic_load(&registry_ptr_);
+  }
+
+  std::shared_ptr<AbstractQueue> register_schema(
+      SchemaKey key,
+      std::shared_ptr<AbstractQueue> queue)
+  {
+    std::lock_guard<std::mutex> lock(registration_mutex_);
+    auto current_map = get_map();
+    auto it = current_map->find(key);
+    if (it != current_map->end()) {
+      return it->second;
+    }
+    auto new_map = std::make_shared<RegistryMap>(*current_map);
+    (*new_map)[key] = queue;
+    std::atomic_store(
+        &registry_ptr_,
+        std::const_pointer_cast<const RegistryMap>(new_map));
+    return queue;
+  }
+
+  void remove_schema(SchemaKey key) {
+    std::lock_guard<std::mutex> lock(registration_mutex_);
+    auto current_map = get_map();
+    auto it = current_map->find(key);
+    if (it == current_map->end()) {
+      return;
+    }
+    auto new_map = std::make_shared<RegistryMap>(*current_map);
+    new_map->erase(key);
+    std::atomic_store(
+        &registry_ptr_,
+        std::const_pointer_cast<const RegistryMap>(new_map));
+  }
+
+private:
+  std::shared_ptr<const RegistryMap> registry_ptr_;
+  std::mutex registration_mutex_;
+};


In C++20, std::atomic_load and std::atomic_store for std::shared_ptr are deprecated. We should modernize SchemaRegistry to use std::atomic<std::shared_ptr<const RegistryMap>> which is standard in C++20. Additionally, we need to add a clear() method to allow releasing all registered queues during database shutdown to prevent SQLITE_BUSY on close.

// unordered map with some helpers class SchemaRegistry { public: using RegistryMap = std::unordered_map<SchemaKey, std::shared_ptr<AbstractQueue>, SchemaKeyHasher>; SchemaRegistry() : registry_ptr_(std::make_shared<const RegistryMap>()) {} std::shared_ptr<const RegistryMap> get_map() const { return registry_ptr_.load(); } std::shared_ptr<AbstractQueue> register_schema( SchemaKey key, std::shared_ptr<AbstractQueue> queue) { std::lock_guard<std::mutex> lock(registration_mutex_); auto current_map = get_map(); auto it = current_map->find(key); if (it != current_map->end()) { return it->second; } auto new_map = std::make_shared<RegistryMap>(*current_map); (*new_map)[key] = queue; registry_ptr_.store(std::const_pointer_cast<const RegistryMap>(new_map)); return queue; } void remove_schema(SchemaKey key) { std::lock_guard<std::mutex> lock(registration_mutex_); auto current_map = get_map(); auto it = current_map->find(key); if (it == current_map->end()) { return; } auto new_map = std::make_shared<RegistryMap>(*current_map); new_map->erase(key); registry_ptr_.store(std::const_pointer_cast<const RegistryMap>(new_map)); } void clear() { std::lock_guard<std::mutex> lock(registration_mutex_); registry_ptr_.store(std::make_shared<const RegistryMap>()); } private: std::atomic<std::shared_ptr<const RegistryMap>> registry_ptr_; std::mutex registration_mutex_; };

gemini-code-assist · 2026-06-10T21:19:32Z

+void Logger::stopLogDb()
+{
+  if (!log_db_running_) {
+    return;
+  }
+  log_db_running_ = false;
+  if (log_db_thread_.joinable()) {
+    log_db_thread_.join();
+  }


If stopLogDb is called concurrently by multiple threads (or if a user-triggered stop races with the destructor), both threads might attempt to call log_db_thread_.join(). Calling join() concurrently on the same std::thread object is undefined behavior and typically throws std::system_error.

We can make this completely thread-safe by using compare_exchange_strong on log_db_running_ to ensure only one thread performs the shutdown and join.

Suggested change

void Logger::stopLogDb()

{

if (!log_db_running_) {

return;

}

log_db_running_ = false;

if (log_db_thread_.joinable()) {

log_db_thread_.join();

}

void Logger::stopLogDb()

{

bool expected = true;

if (!log_db_running_.compare_exchange_strong(expected, false)) {

return;

}

if (log_db_thread_.joinable()) {

log_db_thread_.join();

}

gemini-code-assist · 2026-06-10T21:19:32Z

+logtodb_regexp_c = re.compile(
+    r"""
+      (?:->|\.)                                        # deref
+      logToDb(?:Bulk)?                                  # logToDb or logToDbBulk
+      <.+?>                                            # template header
+      \s*\(\s*                                         # (
+      (?:utl::)?(?P<tool>[A-Z]{3})                     # tool
+      \s*,\s*                                          # ,
+      (?P<id>\d+)                                      # id
+    """,
+    re.VERBOSE | re.MULTILINE,
+)


The regex logtodb_regexp_c uses .+? to match the template header. Since the script does not compile the regex with re.DOTALL, it will fail to match any logToDb or logToDbBulk calls where the template header spans multiple lines (e.g., if a developer formats the column names list with newlines).

Using [\s\S]+? instead of .+? is a robust way to match any character including newlines without needing to change the global regex flags.

Suggested change

logtodb_regexp_c = re.compile(

r"""

(?:->|\.) # deref

logToDb(?:Bulk)? # logToDb or logToDbBulk

<.+?> # template header

\s*\(\s* # (

(?:utl::)?(?P<tool>[A-Z]{3}) # tool

\s*,\s* # ,

(?P<id>\d+) # id

""",

re.VERBOSE | re.MULTILINE,

)

logtodb_regexp_c = re.compile(

r"""

(?:->|\\.) # deref

logToDb(?:Bulk)? # logToDb or logToDbBulk

<[\\s\\S]+?> # template header

\\s*\\(\\s* # (

(?:utl::)?(?P<tool>[A-Z]{3}) # tool

\\s*,\\s* # ,

(?P<id>\\d+) # id

""",

re.VERBOSE | re.MULTILINE,

)

povik · 2026-06-10T21:42:27Z

@alex-maiorov Thanks. It's worth opening the PR for so it's preserved if anyone needs it. I would hold off on adding tests or polishing unless @maliberty signals interest in merging.

Can you post example of use from the C++ side? I.e. what a method call emitting into the database looks like.

maliberty · 2026-06-10T21:53:37Z

Please say more about what this feature is useful for that is not provided by text (or even more structured text) logging.

alex-maiorov · 2026-06-11T00:47:22Z

Please say more about what this feature is useful for that is not provided by text (or even more structured text) logging.

There are a few reasons, first I wanted to avoid any possible latency costs of writing data especially if I was doing it very often, avoiding both serialization overhead and possible disk flushes at the point where the log functions are called by buffering it(though the same method can be used for text logging). For my usecase the main benefit is that I got a lot of functionality for free from SQLite in post-analysis which made it much more efficient, such as having one file with an easily readable schema, performant cross indexing with joins and other SQL features, and the ability to build indeces over tables for fast lookups, and also to be able to build derived data tables if necessary and store them easily.

There are also some disk size and bandwidth reductions, though SQLite does not optimally pack everything. To be honest, for pure speed and space efficiency there might be better solutions, but SQLite is nice for being easy to analyze later without needing to provide separate schema files for something like Protobufs or similar. Besides lower overhead, the benefit over more advanced text logging is we avoid the tradeoff of file clutter and performance. With text logging we would have to either create many files for all data sources(of which there might be many, even within one tool), or we would need to somehow collect all relevant information in a large batch. Largely, text logging becomes constrained by the requirement to keep all files in append-only mode as storing everything in memory is impractical. I will upload some example C++ code tomorrow that will hopefully more clearly demonstrate this point.

alex-maiorov added 6 commits June 10, 2026 13:41

SQLite-based logging mechanism added to logger

625b3e8

Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

Fixed installer script

85904a6

Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

Fixed bazel and refactored large worker function into three different…

55000d9

… ones Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

updated find_messages.py for logtodb

72583f6

Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

Updated documentation and did minor refactoring

bad1af2

Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

fixed typo in documentation

31df979

Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>

alex-maiorov requested review from a team as code owners June 10, 2026 21:15

alex-maiorov requested a review from maliberty June 10, 2026 21:15

github-actions Bot added the size/XL label Jun 10, 2026

github-actions Bot reviewed Jun 10, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLite-Based Binary Logging #10633

SQLite-Based Binary Logging #10633
alex-maiorov wants to merge 6 commits into
The-OpenROAD-Project:masterfrom
alex-maiorov:binary_logging_signed

alex-maiorov commented Jun 10, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

povik commented Jun 10, 2026

Uh oh!

maliberty commented Jun 10, 2026

Uh oh!

alex-maiorov commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alex-maiorov commented Jun 10, 2026

Summary

Type of Change

Impact

Note on Incompleteness

Verification

Related Issues

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

povik commented Jun 10, 2026

Uh oh!

maliberty commented Jun 10, 2026

Uh oh!

alex-maiorov commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants