SQLite-Based Binary Logging #10633
Conversation
Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>
Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>
… ones Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>
Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>
Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>
Signed-off-by: Alex Maiorov <alexander.mmaiorov@gmail.com>
There was a problem hiding this comment.
Welcome to OpenROAD! Thanks for opening your first PR.
Before we review:
- Contribution Guide: https://openroad.readthedocs.io/en/latest/contrib/contributing.html
- Build Instructions: https://openroad.readthedocs.io/en/latest/contrib/BuildWithCMake.html
Please ensure:
- CI passes
- Code is properly formatted
- Tests are included where applicable
A maintainer will review shortly!
There was a problem hiding this comment.
Code Review
This pull request introduces a parallel logging mechanism to stream large quantities of numeric data into an SQLite database, featuring a background thread, type-erased queues, memory pressure management, and corresponding TCL/C++ APIs. The code review identifies several critical issues and improvement opportunities, including an uncommitted transaction bug when draining empty queues, a database connection leak during shutdown due to retained shared pointers in the schema registry, a SQL injection risk from unvalidated table names, the use of deprecated atomic shared pointer functions, a potential race condition in concurrent calls to stopLogDb, and a regex limitation in the message finder script for multi-line template headers.
| size_t drain_to_db(sqlite3* db, size_t max_records) override | ||
| { | ||
| // The latter means the statement could not be built, so fail the whole thing. Otherwise use cached stmt. | ||
| if (!stmt_ && !build_insert_stmt(db)) { | ||
| return 0; | ||
| } | ||
|
|
||
| sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr); | ||
|
|
||
| size_t count = 0; | ||
| bool ok = true; | ||
|
|
||
| while (count < max_records) { | ||
| row_type row; | ||
| { | ||
| std::lock_guard<std::mutex> lock(mutex_); | ||
| if (queue_.empty()) { | ||
| break; | ||
| } | ||
| row = std::move(queue_.front()); | ||
| queue_.pop_front(); | ||
| item_count_.fetch_sub(1, std::memory_order_release); | ||
| } | ||
|
|
||
| // C++17 template magic to pass the column name strings as column keys for sqlite | ||
| bind_row(stmt_, row, std::index_sequence_for<Args...>{}); | ||
| int rc = sqlite3_step(stmt_); | ||
| sqlite3_reset(stmt_); | ||
| if (rc != SQLITE_DONE) { | ||
| ok = false; | ||
| break; | ||
| } | ||
| ++count; | ||
| } | ||
|
|
||
| // Rollback if there was a failure. | ||
| if (count > 0) { | ||
| sqlite3_exec(db, ok ? "COMMIT" : "ROLLBACK", | ||
| nullptr, nullptr, nullptr); | ||
| } |
There was a problem hiding this comment.
If drain_to_db is called when the queue is empty, sqlite3_exec(db, "BEGIN", ...) is executed, but the loop terminates immediately with count == 0. Consequently, the matching COMMIT or ROLLBACK is never executed because of the if (count > 0) guard. This leaves an uncommitted transaction open on the database connection, which will cause subsequent database operations to fail or behave unexpectedly.
We should only start the transaction (BEGIN) when we are about to process the first item in the loop.
size_t drain_to_db(sqlite3* db, size_t max_records) override
{
// The latter means the statement could not be built, so fail the whole thing. Otherwise use cached stmt.
if (!stmt_ && !build_insert_stmt(db)) {
return 0;
}
size_t count = 0;
bool ok = true;
while (count < max_records) {
row_type row;
{
std::lock_guard<std::mutex> lock(mutex_);
if (queue_.empty()) {
break;
}
row = std::move(queue_.front());
queue_.pop_front();
item_count_.fetch_sub(1, std::memory_order_release);
}
if (count == 0) {
sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr);
}
// C++17 template magic to pass the column name strings as column keys for sqlite
bind_row(stmt_, row, std::index_sequence_for<Args...>{});
int rc = sqlite3_step(stmt_);
sqlite3_reset(stmt_);
if (rc != SQLITE_DONE) {
ok = false;
break;
}
++count;
}
// Rollback if there was a failure.
if (count > 0) {
sqlite3_exec(db, ok ? "COMMIT" : "ROLLBACK",
nullptr, nullptr, nullptr);
}
last_flush_ms_.store(
std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::steady_clock::now().time_since_epoch())
.count(),
std::memory_order_release);
return count;
}| // Destroy queues before closing so TypedQueue destructors can finalize | ||
| // their prepared statements. | ||
| local_registry.clear(); |
There was a problem hiding this comment.
During shutdown, local_registry.clear() is called to destroy the local queues and trigger their destructors (which finalize the prepared statements). However, schema_registry_ still holds std::shared_ptr references to these same queues. Because of this, the reference count of the queues does not drop to zero, their destructors are not called, and the prepared statements remain active. As a result, sqlite3_close(db_) will fail and return SQLITE_BUSY, leading to a leaked database connection.
To fix this, we must clear schema_registry_ as well before clearing local_registry.
| // Destroy queues before closing so TypedQueue destructors can finalize | |
| // their prepared statements. | |
| local_registry.clear(); | |
| // Destroy queues before closing so TypedQueue destructors can finalize | |
| // their prepared statements. | |
| schema_registry_.clear(); | |
| local_registry.clear(); |
| SchemaInfo Logger::logToDbBuildSchemaInfo(sqlite3* db, | ||
| SchemaKey key, | ||
| const std::string& table_name, | ||
| std::string_view header, | ||
| const std::vector<SQLiteType>& types) | ||
| { | ||
| SchemaInfo info; |
There was a problem hiding this comment.
The table_name parameter passed to logToDb is a runtime string that is directly concatenated into SQL statements (e.g., CREATE TABLE IF NOT EXISTS " + info.table_name and INSERT INTO " + info_.table_name). If a developer passes a table name containing spaces, hyphens, or SQL keywords, it will cause SQL syntax errors. If the table name is dynamically constructed from user/design inputs, it could also present a SQL injection risk.
We should validate that table_name contains only alphanumeric characters and underscores. If validation fails, we can report a terminal error using this->error which will terminate the program.
SchemaInfo Logger::logToDbBuildSchemaInfo(sqlite3* db,
SchemaKey key,
const std::string& table_name,
std::string_view header,
const std::vector<SQLiteType>& types)
{
for (char c : table_name) {
if (!std::isalnum(c) && c != '_') {
this->error(UTL, 111, "Invalid database table name '{}'. Only alphanumeric characters and underscores are allowed.", table_name);
}
}
SchemaInfo info;References
- Calls to
logger_->errorare expected to terminate the program, so additional error handling like returning an error code or throwing an exception is not necessary.
| class SchemaRegistry { | ||
| public: | ||
| using RegistryMap | ||
| = std::unordered_map<SchemaKey, std::shared_ptr<AbstractQueue>, SchemaKeyHasher>; | ||
|
|
||
| SchemaRegistry() : registry_ptr_(std::make_shared<const RegistryMap>()) {} | ||
|
|
||
| std::shared_ptr<const RegistryMap> get_map() const { | ||
| return std::atomic_load(®istry_ptr_); | ||
| } | ||
|
|
||
| std::shared_ptr<AbstractQueue> register_schema( | ||
| SchemaKey key, | ||
| std::shared_ptr<AbstractQueue> queue) | ||
| { | ||
| std::lock_guard<std::mutex> lock(registration_mutex_); | ||
| auto current_map = get_map(); | ||
| auto it = current_map->find(key); | ||
| if (it != current_map->end()) { | ||
| return it->second; | ||
| } | ||
| auto new_map = std::make_shared<RegistryMap>(*current_map); | ||
| (*new_map)[key] = queue; | ||
| std::atomic_store( | ||
| ®istry_ptr_, | ||
| std::const_pointer_cast<const RegistryMap>(new_map)); | ||
| return queue; | ||
| } | ||
|
|
||
| void remove_schema(SchemaKey key) { | ||
| std::lock_guard<std::mutex> lock(registration_mutex_); | ||
| auto current_map = get_map(); | ||
| auto it = current_map->find(key); | ||
| if (it == current_map->end()) { | ||
| return; | ||
| } | ||
| auto new_map = std::make_shared<RegistryMap>(*current_map); | ||
| new_map->erase(key); | ||
| std::atomic_store( | ||
| ®istry_ptr_, | ||
| std::const_pointer_cast<const RegistryMap>(new_map)); | ||
| } | ||
|
|
||
| private: | ||
| std::shared_ptr<const RegistryMap> registry_ptr_; | ||
| std::mutex registration_mutex_; | ||
| }; |
There was a problem hiding this comment.
In C++20, std::atomic_load and std::atomic_store for std::shared_ptr are deprecated. We should modernize SchemaRegistry to use std::atomic<std::shared_ptr<const RegistryMap>> which is standard in C++20. Additionally, we need to add a clear() method to allow releasing all registered queues during database shutdown to prevent SQLITE_BUSY on close.
// unordered map with some helpers
class SchemaRegistry {
public:
using RegistryMap
= std::unordered_map<SchemaKey, std::shared_ptr<AbstractQueue>, SchemaKeyHasher>;
SchemaRegistry() : registry_ptr_(std::make_shared<const RegistryMap>()) {}
std::shared_ptr<const RegistryMap> get_map() const {
return registry_ptr_.load();
}
std::shared_ptr<AbstractQueue> register_schema(
SchemaKey key,
std::shared_ptr<AbstractQueue> queue)
{
std::lock_guard<std::mutex> lock(registration_mutex_);
auto current_map = get_map();
auto it = current_map->find(key);
if (it != current_map->end()) {
return it->second;
}
auto new_map = std::make_shared<RegistryMap>(*current_map);
(*new_map)[key] = queue;
registry_ptr_.store(std::const_pointer_cast<const RegistryMap>(new_map));
return queue;
}
void remove_schema(SchemaKey key) {
std::lock_guard<std::mutex> lock(registration_mutex_);
auto current_map = get_map();
auto it = current_map->find(key);
if (it == current_map->end()) {
return;
}
auto new_map = std::make_shared<RegistryMap>(*current_map);
new_map->erase(key);
registry_ptr_.store(std::const_pointer_cast<const RegistryMap>(new_map));
}
void clear() {
std::lock_guard<std::mutex> lock(registration_mutex_);
registry_ptr_.store(std::make_shared<const RegistryMap>());
}
private:
std::atomic<std::shared_ptr<const RegistryMap>> registry_ptr_;
std::mutex registration_mutex_;
};| void Logger::stopLogDb() | ||
| { | ||
| if (!log_db_running_) { | ||
| return; | ||
| } | ||
| log_db_running_ = false; | ||
| if (log_db_thread_.joinable()) { | ||
| log_db_thread_.join(); | ||
| } |
There was a problem hiding this comment.
If stopLogDb is called concurrently by multiple threads (or if a user-triggered stop races with the destructor), both threads might attempt to call log_db_thread_.join(). Calling join() concurrently on the same std::thread object is undefined behavior and typically throws std::system_error.
We can make this completely thread-safe by using compare_exchange_strong on log_db_running_ to ensure only one thread performs the shutdown and join.
| void Logger::stopLogDb() | |
| { | |
| if (!log_db_running_) { | |
| return; | |
| } | |
| log_db_running_ = false; | |
| if (log_db_thread_.joinable()) { | |
| log_db_thread_.join(); | |
| } | |
| void Logger::stopLogDb() | |
| { | |
| bool expected = true; | |
| if (!log_db_running_.compare_exchange_strong(expected, false)) { | |
| return; | |
| } | |
| if (log_db_thread_.joinable()) { | |
| log_db_thread_.join(); | |
| } |
| logtodb_regexp_c = re.compile( | ||
| r""" | ||
| (?:->|\.) # deref | ||
| logToDb(?:Bulk)? # logToDb or logToDbBulk | ||
| <.+?> # template header | ||
| \s*\(\s* # ( | ||
| (?:utl::)?(?P<tool>[A-Z]{3}) # tool | ||
| \s*,\s* # , | ||
| (?P<id>\d+) # id | ||
| """, | ||
| re.VERBOSE | re.MULTILINE, | ||
| ) |
There was a problem hiding this comment.
The regex logtodb_regexp_c uses .+? to match the template header. Since the script does not compile the regex with re.DOTALL, it will fail to match any logToDb or logToDbBulk calls where the template header spans multiple lines (e.g., if a developer formats the column names list with newlines).
Using [\s\S]+? instead of .+? is a robust way to match any character including newlines without needing to change the global regex flags.
| logtodb_regexp_c = re.compile( | |
| r""" | |
| (?:->|\.) # deref | |
| logToDb(?:Bulk)? # logToDb or logToDbBulk | |
| <.+?> # template header | |
| \s*\(\s* # ( | |
| (?:utl::)?(?P<tool>[A-Z]{3}) # tool | |
| \s*,\s* # , | |
| (?P<id>\d+) # id | |
| """, | |
| re.VERBOSE | re.MULTILINE, | |
| ) | |
| logtodb_regexp_c = re.compile( | |
| r""" | |
| (?:->|\\.) # deref | |
| logToDb(?:Bulk)? # logToDb or logToDbBulk | |
| <[\\s\\S]+?> # template header | |
| \\s*\\(\\s* # ( | |
| (?:utl::)?(?P<tool>[A-Z]{3}) # tool | |
| \\s*,\\s* # , | |
| (?P<id>\\d+) # id | |
| """, | |
| re.VERBOSE | re.MULTILINE, | |
| ) |
|
@alex-maiorov Thanks. It's worth opening the PR for so it's preserved if anyone needs it. I would hold off on adding tests or polishing unless @maliberty signals interest in merging. Can you post example of use from the C++ side? I.e. what a method call emitting into the database looks like. |
|
Please say more about what this feature is useful for that is not provided by text (or even more structured text) logging. |
There are a few reasons, first I wanted to avoid any possible latency costs of writing data especially if I was doing it very often, avoiding both serialization overhead and possible disk flushes at the point where the log functions are called by buffering it(though the same method can be used for text logging). For my usecase the main benefit is that I got a lot of functionality for free from SQLite in post-analysis which made it much more efficient, such as having one file with an easily readable schema, performant cross indexing with joins and other SQL features, and the ability to build indeces over tables for fast lookups, and also to be able to build derived data tables if necessary and store them easily. There are also some disk size and bandwidth reductions, though SQLite does not optimally pack everything. To be honest, for pure speed and space efficiency there might be better solutions, but SQLite is nice for being easy to analyze later without needing to provide separate schema files for something like Protobufs or similar. Besides lower overhead, the benefit over more advanced text logging is we avoid the tradeoff of file clutter and performance. With text logging we would have to either create many files for all data sources(of which there might be many, even within one tool), or we would need to somehow collect all relevant information in a large batch. Largely, text logging becomes constrained by the requirement to keep all files in append-only mode as storing everything in memory is impractical. I will upload some example C++ code tomorrow that will hopefully more clearly demonstrate this point. |
Summary
I added a way to dump large amounts of data from various components into an SQLite database. The logger works by calling a logging function analogous to the existing logger functions, and then the data is processed by a background thread for actual writing. The actual database has separate tables for each numeric log(logToDb or logToDbBulk), and those methods are designed for low latency. There is also logToDbMetadata, which is a simple and non-optimized text to text key-value logging mechanism that goes into a single table.
Type of Change
New Feature: SQLite Binary Logging.
Impact
If turned on and called, qualitatively I have observed the performance impact to be fairly minimal for what I was doing. Given the use-case of large dumps, I would like suggestions for writing reasonable performance and correctness tests.
At runtime, this is an entirely opt-in feature. However, this does still require sqlite3 as a compile time dependency, though SQLite is almost universally available across different platforms. I also modified find_messages.py so that it enforces the uniqueness of the tool id/message id for those methods.
Note on Incompleteness
I am aware that this code is not quite ready for merging. I originally wrote it as a debugging and data analysis tool on the side for a research project I was working on, however it ended up being very useful so I would like to get it to a merge-ready state eventually.
Verification
./etc/Build.sh).Related Issues
None that I know of, but @povik was interested in this functionality.