Skip to content

Stim compiler#3305

Open
joao-boechat wants to merge 61 commits into
mainfrom
joaoboechat/stim-compiler
Open

Stim compiler#3305
joao-boechat wants to merge 61 commits into
mainfrom
joaoboechat/stim-compiler

Conversation

@joao-boechat

@joao-boechat joao-boechat commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This is an initial PR for the Stim compiler, which includes the setup for supporting the language. We still will need to add tests, better error handling, more language features, among other things, but the compiler introduced by this PR is supposed to be fully functional and minimally faulty.

All of the code was designed around the stim language, which is mostly defined by these two documents:

Stim/doc/file_format_stim_circuit.md at main · quantumlib/Stim
Stim/doc/gates.md at main · quantumlib/Stim

Comment thread source/compiler/qsc_stim_parser/src/lex.rs Dismissed
Comment thread source/compiler/qsc_stim_parser/src/lex.rs Dismissed
Comment thread source/compiler/qsc_stim_parser/src/parser.rs Fixed
Comment thread source/compiler/qsc_stim_parser/src/parser.rs Dismissed
Comment thread source/compiler/qsc_stim_parser/src/parser.rs Dismissed
Comment thread source/compiler/qsc_stim_parser/src/parser.rs Dismissed
Comment thread source/compiler/qsc_stim_parser/examples/lex_stim.rs Outdated

@amcasey amcasey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about the lexer. I'm just learning, so don't take any of this as blocking.

Comment thread source/compiler/qsc_stim_parser/src/lex.rs
};

#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub struct Token {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concept of a known-but-erroneous token? For example, in many languages 0.2 is valid, but .2 is not, but you'd want both to appear as Double tokens for error recovery purposes. (This is almost certainly out of scope for this proof-of-concept implementation.)

Comment thread source/compiler/qsc_stim_parser/src/lex.rs
Comment thread source/compiler/qsc_stim_parser/src/lex.rs
self.eat_while(|c| c.is_ascii_digit());
let mut is_double = false;
if self.chars.next_if(|(_, c)| *c == '.').is_some() {
self.eat_while(|c| c.is_ascii_digit());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to consume at least one digit? Or does the language allow 2. as a valid double?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we allow scientific notation. Surely, 2.e isn't allowed?

self.whitespace();
}

fn scan_number(&mut self) -> TokenKind {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a sign for the whole number? +2?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope! numbers are very basic in stim, nothing fancy (not even computations)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to confirm, there are no negative angles? You have to normalize to a positive angle?

}

fn scan_identifier(&mut self, lo: usize) -> TokenKind {
self.eat_while(|c| c.is_alphanumeric() || c == '_');

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of languages don't allow identifiers to start with digits. Not sure if that's true of stim.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is! per their grammar this is what a name can be: [a-zA-Z][a-zA-Z0-9_]*
granted the "identifier" concept isn't exactly from stim, but I used it to simplify the code. Will have to revisit it later for correctness, though

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex you provided doesn't seem to allow the identifier to start with a digit.

.map_or(self.input_len as usize, |(i, _)| *i);
// TODO: What if some identifier starts with "rec" but is not a rec token?
match &self.input[lo..hi] {
"rec" => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably just blanking, but where did we check for the open [?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three cases we could have [] are:
1- rec[...]
2- sweep[...]
3- tags! For example in the statement: X_ERRORa 3 4

But parsing the brackets individually added a ton of complexity to distinguish between these three cases, so I chose to just consume them as a whole with those tokens, and then strip them away for the content. Will also revisit this later!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the complexity. But it looks like maybe the single token includes the contents of the square brackets and isn't a keyword followed by punctuation, etc?

Comment thread source/compiler/qsc_stim_parser/src/lex.rs
while self.chars.next_if(|i| f(i.1)).is_some() {}
}

fn whitespace(&mut self) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to consume newlines without creating corresponding tokens?

@amcasey amcasey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parser comments. Still non-blocking. Happy to chat if my questions don't make sense (which is reasonably likely).


#[derive(Debug)]
pub struct Line {
pub span: Span,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from the span that's in the Instruction?

Comment thread source/compiler/qsc_stim_parser/src/parser.rs
Comment thread source/compiler/qsc_stim_parser/src/parser.rs
None => break,
}
}
let closing_brace = self.expect(TokenKind::Close(Brace));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we might want to synthesize a missing closing brace for recovery purposes.

}

fn parse_line(&mut self, instruction: Instruction) -> Line {
self.expect(TokenKind::Newline);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find it a little strange to start a line with a newline, rather than end it with a newline. Does this cause any problems at file boundaries?

}

fn extract_uint(&mut self, token: Token, span: Option<Span>) -> u32 {
self.extract_string(token, span).parse().unwrap()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this panics if the number is too large to fit in a u32, for example?

@joao-boechat joao-boechat changed the base branch from main to oscarpuente/add-loss-to-fault-strings June 15, 2026 19:59
Comment thread source/qdk_package/qdk/stim/__init__.py Outdated
from typing import List, Literal, Optional, Tuple


def compile(src: str, noise: Optional[NoiseConfig]) -> Tuple[str, NoiseConfig]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this returning a tuple. I keep forgetting to destructure the results and wondering why I have a list. Also qsharp.compile and openqasm.compile return a QirInputData. We should be consistent.

@billti billti Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's OK if we return the QIR string for now rather than QirInputData, but let's still just return the QIR and not a tuple)

@joao-boechat joao-boechat force-pushed the joaoboechat/stim-compiler branch from e1767ae to 44f52b7 Compare June 16, 2026 17:24
@joao-boechat joao-boechat marked this pull request as ready for review June 19, 2026 02:55
Base automatically changed from oscarpuente/add-loss-to-fault-strings to main June 19, 2026 23:20
shot.unitary[1] = op.unitary[1];
shot.unitary[4] = cplxNeg(op.unitary[4]);
shot.unitary[5] = cplxNeg(op.unitary[5]);
} else if (rand < (p_x + p_z + p_y)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this get moved? Was there an issue to fix or optimization to be had?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants