HPCC-36047 hthor refactor temporary file handling#21128
HPCC-36047 hthor refactor temporary file handling#21128streeterd wants to merge 1 commit intohpcc-systems:candidate-10.2.xfrom
Conversation
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ❌ Conflicts detected How To Resolve❗: Create a PR against conflicting branches and resolve the conflict manually, after the conflicts have been resolved rerun the Upmerge test. ❌ Conflicting Branches (1)
|
🔄 Upmerge Test ResultsStatus: ❌ Conflicts detected How To Resolve❗: Create a PR against conflicting branches and resolve the conflict manually, after the conflicts have been resolved rerun the Upmerge test. ❌ Conflicting Branches (1)
|
🔄 Upmerge Test ResultsStatus: ❌ Conflicts detected How To Resolve❗: Create a PR against conflicting branches and resolve the conflict manually, after the conflicts have been resolved rerun the Upmerge test. ❌ Conflicting Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
04683e3 to
ad2255d
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
4225fb8 to
5987f12
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
Testing 26th March 2026:ecl was executed to generate spill files which were generated and removed as expected, see Spill test 26th March 2026 10.04 |
5987f12 to
4e029c2
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
4e029c2 to
978d4b5
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
jakesmith
left a comment
There was a problem hiding this comment.
@streeterd - please see feedback.
| private: | ||
| friend class EclAgentWorkflowMachine; | ||
|
|
||
| struct CTempFileInfo : public CInterfaceOf<IInterface> |
There was a problem hiding this comment.
somewhat defunct havnig a struct of 1 element.
The tempFileInfoMap could be naturally expressed as:
std::map<std::string, std::strrng> tempFileInfoMap
``
| return tempFiles.item(idx); | ||
| } | ||
|
|
||
| Linked<CTempFileInfo> tempFileInfo = tempFileInfoMap.getValue(fname); |
There was a problem hiding this comment.
can be simplified/avoid object etc. - see map comment.
| base.append(PATHSEPCHAR); | ||
| StringBuffer tail; | ||
| splitFilename(fname, nullptr, nullptr, &tail, nullptr); | ||
| tempFileInfo->tempFileName.append(base).append(tail).append(tempOwnerId); |
There was a problem hiding this comment.
can be significantly simplied I think.
| if (tempFileName) | ||
| { | ||
| Owned<IFile> file = createIFile(tempFileName); | ||
| if (file->isFile() != fileBool::foundYes) |
There was a problem hiding this comment.
probably a bit OTT, we know it's a 'file' already.
I thnk should just call file->remove() (NB: createIFile can't return nullptr).
| { | ||
| remove(tempFiles.item(idx)); | ||
| HashIterator it(tempFileInfoMap); | ||
| ForEach(it) |
There was a problem hiding this comment.
again, would be quite a bit simpler with a std::map, for auto, std::strnig etc.
| interface ITempFileHandler | ||
| { | ||
| virtual const char *noteTemporaryFile(const char * fname) = 0; | ||
| virtual const char *queryTemporaryFile(const char * fname) = 0; |
There was a problem hiding this comment.
minor: could be a const method.
There was a problem hiding this comment.
fail is not a const, so could not easily change to const, see your comment: #21128 (comment)
| { | ||
| StringBuffer tempfilename; | ||
| getTempfileBase(tempfilename).append(PATHSEPCHAR).append(fname); | ||
| if (isEmptyString(fname)) |
There was a problem hiding this comment.
highly improbably, so if anything, this check can be a dbgassertex(!isEmptyString(fname)
| getTempfileBase(base); | ||
| base.append(PATHSEPCHAR); | ||
| StringBuffer tail; | ||
| splitFilename(fname, nullptr, nullptr, &tail, nullptr); |
There was a problem hiding this comment.
hmm, passing in a fully qualified name (presumably because might already have temp dir as prefix?), splittnig it, then reconstituing a fully qualified..
The callers should not pass in a fully qualified name, they should pass in a name, and no nothing about the temp dir..
The return is the fully qualified name unique name, which the callers can pass back in to removeTemporaryFile
| interface ITempFileHandler | ||
| { | ||
| virtual const char *noteTemporaryFile(const char * fname) = 0; | ||
| virtual const char *queryTemporaryFile(const char * fname) = 0; |
There was a problem hiding this comment.
noteTemporaryFile and queryTemporaryFile should take a name , think of it as id. The result is a fully qualified path.
They (the callers) then later call back to removeTemporaryFile with that resulting filename.
So param names and semantics should be clear.
| { | ||
| unsigned pid = (unsigned)GetCurrentProcessId(); | ||
| unsigned __int64 startEpoch = (unsigned __int64)time(nullptr); | ||
| tempOwnerId.append('.').append(pid).append('.').append(startEpoch); |
There was a problem hiding this comment.
does it make sense for all temp files to have the same date stamp, irrespective of when noteTemporaryFile is called?
Not sure it does, or make it unique enough.. a simple counter is probably better.
There was a problem hiding this comment.
We need the pid and startEpoch are required to find out if the process is still alive. pid on it's own is not good enough as the pid is reused. It is also more convienient to have both pid and startEpoch at the end of the file name so that they can be checked easily with an ends with test in the event of file removal after a crash.
| MapStringToMyClass<CTempFileInfo> tempFileInfoMap; // Map of original temp file name to the actual temp file name on disk | ||
| StringBuffer tempOwnerId; // Holds the current pid and start epoch of the process for temp files in map tempFileInfoMap | ||
| CriticalSection tfsect; | ||
| TempFilePathByIdMap tempFileNameById; // Map of original temp id to the actual temp file name on disk |
There was a problem hiding this comment.
minor: don't think 'hiding' std::map<std::string, std::string> here with a typedef actually improved readability...
| const char *EclAgent::queryTemporaryFile(const char *name) const | ||
| { | ||
| dbgassertex(!isEmptyString(fname)); | ||
| DISLOG("DJPS: queryTemporaryFile(%s)", name); |
There was a problem hiding this comment.
you prob. meant to use DBGLOG :
not disaster operator logging:
* 2) For fatal errors or unrecoverable errors: *
* DISLOG([LogMsgCode,] format,..) - uses MCoperatorDisaster
There was a problem hiding this comment.
Only for testing as the PR was a WIP, now removed.
| StringBuffer errmsg; | ||
| errmsg.append("Attempt to read temp file that has not yet been registered: ").append(tempFileInfo->tempFileName.str()); | ||
| fail(0, errmsg.str()); | ||
| errmsg.append("Attempt to read temp file that has not yet been registered: ").append(name); |
There was a problem hiding this comment.
trivial: VStringBuffer is a nicer pattern for things lke this:
VStringBuffer errmsg("Attempt to read temp file that has not yet been registered: %s", name);
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
b5f3457 to
591f31b
Compare
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (2)
|
591f31b to
38f3547
Compare
jakesmith
left a comment
There was a problem hiding this comment.
@streeterd - looks close. Please see comments.
| return nullptr; | ||
| } | ||
|
|
||
| return inserted.first->c_str(); |
There was a problem hiding this comment.
This is thread-safe afaics, as long as tempFileSet entries are not removed or altered (which should be the case). It is probably worth adding a clarifying comment to make it clear to the casual reader of the semantics.
| agentMachineCost = getMachineCostRate(); | ||
|
|
||
| getTempfileBase(tempFilePrefix); | ||
| tempFilePrefix.append(PATHSEPCHAR); |
There was a problem hiding this comment.
minor: not sure what point of this tempFilePrefix is now, it's same as getTempFileBase, except it has a trailing PATHSEPCHAR ..
getTempFileBase could add a trailing PATHSEPCHAR, and all the places that call it and add one, wouldn't need to (and this code wouldn't be needed).
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
| } | ||
|
|
||
| // The lock is intentionally scoped to insertion; the checks below use only | ||
| // the local insertion result and do not access shared state. |
There was a problem hiding this comment.
This is thread-safe afaics, as long as tempFileSet entries are not removed or altered (which should be the case). It is probably worth adding a clarifying comment to make it clear to the casual reader of the semantics.
the new comment doesn't really cover the point I was maknig, but my comment was a bit misleading too, it isn't a thread safety issue as such.
The point I was trying to make was that after a call to noteTemporaryFile the returned const char * is not guaranteed to be valid, e.g. if you call it, then call removeTemporaryFile for the file, the const char * will have been free and using it will likely crash.
Please revise the comment to make it clear that the returned value is only valid if the entry isn't removed.
| StringBuffer &EclAgent::getTempfileBase(StringBuffer &buff) | ||
| { | ||
| return buff.append(queryTempfilePath()).append(PATHSEPCHAR).appendLower(wuid); | ||
| return buff.append(queryTempfilePath()).append(PATHSEPCHAR).appendLower(wuid).append(PATHSEPCHAR); |
There was a problem hiding this comment.
you haven't checked all uses of getTempFileBase to make sure they don't add double PATHSEPCHAR's now it has been added here.
hint: look at the 'agent.getTempfileBase(tempBase);' use cases in hthor.cpp
jakesmith
left a comment
There was a problem hiding this comment.
@streeterd - please see comments.
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
| RoxieSortAlgorithm sortAlgorithm = isStable ? stableSpillingQuickSortAlgorithm : spillingQuickSortAlgorithm; | ||
| StringBuffer tempBase; | ||
| agent.getTempfileBase(tempBase); | ||
| tempBase.setLength(tempBase.length() - 1); // remove the path separator, as the sort does not want it |
There was a problem hiding this comment.
this is a fragile approach - blindly removing last character ((as in things will break if assumption change).
It would be better to use removeTrailingPathSepChar
jakesmith
left a comment
There was a problem hiding this comment.
@streeterd - If buildTempFilename is the only call to getTempfileBase now.. then it would be more sensible to add the PATHSEPCHAR there I think, and revert the other changes.
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
@jakesmith fixed in 013d676 |
ecd4149 to
af8b227
Compare
jakesmith
left a comment
There was a problem hiding this comment.
@streeterd - looks good. Please squash.
New temporary file handling interface created. hthor changed to use the new interface. Signed-off-by: Dave Streeter <dave.streeter@lexisnexisrisk.com>
af8b227 to
1d09e97
Compare
@jakesmith squashed. |
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
🔄 Upmerge Test ResultsStatus: ✅ All branches merged successfully ✅ Successful Branches (1)
|
Change
A new interface for handling temporary files was created.
hthor was changed to use the new temporary file handler.
Type of change:
Checklist:
Smoketest: