A domain-adapted large language model for mechanical engineering specifications, material property retrieval, engineering calculations, and GD&T interpretation. Built by fine-tuning Qwen2.5-3B-Instruct with LoRA on ~5,000+ expert-authored Q&A pairs covering real-world engineering problems.
+------------------------------------------+
| MechSpec Pipeline |
+------------------------------------------+
|
+----------------------+----------------------+
| | |
+--------v--------+ +--------v--------+ +--------v--------+
| Data Pipeline | | Training (LoRA) | | Evaluation |
+-----------------+ +-----------------+ +-----------------+
| generate_qa_pairs| | Qwen2.5-3B-Inst | | MechEval Bench |
| validate_answers | | 4-bit NF4 quant | | - Mat Retrieval|
| format_dataset | | LoRA r=32 | | - Calculations |
| split | | SFTTrainer | | - GD&T Interp |
| fetch_mp_api | | Cosine schedule | | Radar chart |
+-----------------+ +-----------------+ +-----------------+
| | |
v v v
+--------+--------+ +--------+--------+ +--------+--------+
| 5000+ Q&A Pairs | | LoRA Adapters | | Comparison |
| Alpaca format | | Merged model | | Report |
| 85/10/5 split | | safetensors | | Per-task scores |
+-----------------+ +-----------------+ +-----------------+
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Fine-tuning method | LoRA (PEFT) |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| LoRA dropout | 0.05 |
| Target modules | q, k, v, o, gate, up, down projections |
| Quantization | 4-bit NF4 (bitsandbytes) |
| Epochs | 3 |
| Batch size | 2 (x8 gradient accumulation = effective 16) |
| Learning rate | 2e-4 (cosine schedule, 5% warmup) |
| Precision | fp16 (T4 compatible) |
| Max sequence length | 2048 tokens |
| Training hardware | 1x NVIDIA T4 (16 GB VRAM) |
| Training time | ~30 minutes |
| Trainable parameters | ~48M / 3B total (1.6%) |
| Category | Count | Examples |
|---|---|---|
| Material property retrieval | ~2,000 | Yield strength, UTS, modulus, density, CTE |
| Beam deflection | ~500 | Cantilever, simply supported, distributed load |
| Stress analysis | ~500 | Von Mises, Mohr's circle, axial, torsion |
| GD&T interpretation | ~500 | Position, flatness, runout, concentricity |
| Thermal calculations | ~400 | Expansion, constrained stress, conduction |
| Pressure vessel design | ~300 | Cylindrical, spherical, ASME BPVC |
| Other engineering | ~800 | Fatigue, springs, bolts, bearings, welds, buckling |
| Model | Material Retrieval | Calculations | GD&T | Aggregate |
|---|---|---|---|---|
| Qwen2.5-3B (base) | XX% | XX% | XX% | XX% |
| MechSpec-Qwen-3B | XX% | XX% | XX% | XX% |
| Qwen2.5-7B (base) | XX% | XX% | XX% | XX% |
| GPT-4o (reference) | XX% | XX% | XX% | XX% |
Results pending full evaluation. Run make eval to generate scores.
pip install -e .mechspec-data --output data/generated/qa_pairs.json --num-pairs 5000mechspec-train --config configs/training_config.yamlfrom transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "outputs/merged" # or HuggingFace Hub ID
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)
messages = [
{"role": "system", "content": "You are a mechanical engineering expert."},
{"role": "user", "content": "What is the yield strength of Ti-6Al-4V?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)mechspec-generate --model-path outputs/merged --interactivemechspec-eval --config configs/eval_config.yaml --model-path outputs/mergedmechspec-qwen/
+-- configs/ Training and evaluation YAML configs
+-- src/mechspec/
| +-- data/ Data generation, validation, formatting
| +-- training/ LoRA fine-tuning and model merging
| +-- eval/ MechEval benchmark and scoring
| +-- inference/ Single-query and batch inference
+-- data/eval_benchmark/ Curated evaluation datasets (JSON)
+-- notebooks/ Colab training notebook
+-- tests/ Pytest test suite
+-- docs/ Training log and data source documentation
Run the entire pipeline from data generation to evaluation:
make pipelineOr step by step:
make data # Generate Q&A pairs
make validate # Validate generated data
make format-data # Format for training
make split # Create train/val/test splits
make train # Fine-tune with LoRA
make merge # Merge adapters into base model
make eval # Run MechEval benchmark
make report # Generate evaluation reportpip install -e ".[dev]"
make lint # Run ruff linter
make type-check # Run mypy
make test # Run pytest
make test-cov # Run with coverageMechSpec-Qwen is designed for:
- Retrieving material properties for common engineering alloys
- Solving introductory-to-intermediate engineering calculation problems
- Interpreting GD&T (Geometric Dimensioning and Tolerancing) specifications
- Providing step-by-step engineering analysis with proper units
- Not a replacement for FEA/simulation software. The model performs closed-form analytical calculations, not finite element analysis.
- Material property values are typical/nominal. Actual values vary by heat treatment, manufacturing process, and material lot. Always verify against certified material test reports (CMTRs) for critical applications.
- Not validated for safety-critical decisions. Do not use model output as the sole basis for structural design or safety analysis. A licensed professional engineer (PE) must review all calculations used in practice.
- Limited to training data coverage. The model is trained on ~20 common engineering alloys and standard textbook-level problems. It may hallucinate values for uncommon materials or advanced analysis methods.
- English only. The model is fine-tuned on English-language engineering content.
- Engineering calculations require professional verification before use in real-world applications.
- Incorrect material properties or calculations could lead to structural failures if used without review.
- This model should be used as a productivity tool, not as a substitute for engineering education or professional practice.
Training was performed on a single NVIDIA T4 GPU for approximately 30 minutes. Estimated energy consumption: ~0.05 kWh. Estimated CO2 emissions: ~0.02 kg CO2eq (assuming US average grid carbon intensity of 0.4 kg CO2/kWh).
This project is licensed under the Apache License 2.0. See LICENSE for details.
The base model (Qwen2.5-3B-Instruct) is licensed under the Qwen License.