Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix local memory address aliasing in simx simulator
When I tried to run OpenCL kernels that use
__localbuffers larger than 2048 bytes, I found that writes to addresses beyond the 2048-byte boundary silently overwrote data at lower addresses. This Pull Request fixes a bug insim/simx/local_mem.cppwhere the address mask used to index the local memory RAM was too narrow.Bug:
to_local_addr()uses line count instead of capacity for address bitsThe function
to_local_addr()extracts the lower N bits of the incoming address to produce a RAM index. N was computed aslog2ceil(capacity / line_size)(the number of cache lines) instead oflog2ceil(capacity)(the number of addressable bytes).With the default configuration (
capacity = 16384,line_size = 8):total_lines = 16384 / 8 = 2048,line_bits_ = log2ceil(2048) = 11, mask =0x7FFaddr_bits_ = log2ceil(16384) = 14, mask =0x3FFFThe 11-bit mask means any two byte addresses that differ by exactly 2048 map to the same RAM location. For example,
A[0](offset 0) andA[512](offset 2048) alias each other, so writingA[512]silently overwritesA[0].Fix
Compute
addr_bits_directly fromconfig.capacityinstead of from the number of lines:I also added a test which reproduces the aliasing:
each thread
twritesA[t] = t+1, thenA[t+512] = -(t+1).With the bug,
A[t+512]at byte offset2048 + t*4aliasesA[t], so readingA[t]returns-(t+1)instead oft+1.With the fix, all reads return the correct values.