SMILE (Statistical Machine Intelligence & Learning Engine) is a comprehensive, high-performance machine learning framework for the JVM. SMILE v5+ requires Java 25; v4.x requires Java 21; all previous versions require Java 8. SMILE also provides idiomatic APIs for Scala and Kotlin. With advanced data structures and algorithms, SMILE delivers state-of-the-art performance across every aspect of machine learning.
- Features
- Module Map
- Installation
- Quick Start
- SMILE Studio & Shell
- Model Serialization
- Visualization
- License
- Issues & Discussions
- Contributing
- Maintainers
- Gallery
| Area | Highlights |
|---|---|
| LLM | LLaMA-3 inference, tiktoken BPE tokenizer, OpenAI-compatible REST server, SSE chat streaming |
| Deep Learning | LibTorch/GPU backend, EfficientNet-V2 image classification, custom layer API |
| Classification | SVM, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, Neural Networks, RBF Networks, MaxEnt, KNN, Naïve Bayes, LDA/QDA/RDA |
| Regression | SVR, Gaussian Process, Regression Trees, GBDT, Random Forest, RBF, OLS, LASSO, ElasticNet, Ridge |
| Clustering | BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical, SIB, SOM, Spectral, Min-Entropy |
| Manifold Learning | IsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA |
| Feature Engineering | Genetic Algorithm selection, Ensemble selection, TreeSHAP, SNR, Sum-Squares ratio, data transformations, formula API |
| NLP | Sentence / word tokenization, Bigram test, Phrase & Keyword extraction, Stemmer, POS tagging, Relevance ranking |
| Association Rules | FP-growth frequent itemset mining |
| Sequence Learning | Hidden Markov Model, Conditional Random Field |
| Nearest Neighbor | BK-Tree, Cover Tree, KD-Tree, SimHash, LSH |
| Numerical Methods | Linear algebra, numerical optimization (BFGS, L-BFGS), interpolation, wavelets, RBF, distributions, hypothesis tests |
| Visualization | Swing plots (scatter, line, bar, box, histogram, surface, heatmap, contour, …) and declarative Vega-Lite charts |
Each module has its own detailed user guide. Click the README link for the module overview, or drill into individual topic guides.
Data structures, math, linear algebra, statistical utilities, I/O
| Document | Topics |
|---|---|
| README | Module overview and dependency setup |
| DATA_FRAME.md | DataFrame API — creation, selection, transformation |
| DATA_IO.md | CSV, JSON, Parquet, Arrow, JDBC, Avro readers/writers |
| DATA_TRANSFORMATION.md | Scalers, encoders, imputers, feature transforms |
| DATASET.md | Built-in benchmark and real-world datasets |
| FORMULA.md | R-style formula language for model matrices |
| DISTRIBUTIONS.md | Probability distributions (Normal, Poisson, Beta, …) |
| HYPOTHESIS_TESTING.md | t-test, chi-squared, ANOVA, KS-test, … |
| DISTANCES.md | Euclidean, Mahalanobis, Hamming, edit distance, … |
| NEAREST_NEIGHBOR.md | KD-Tree, Cover Tree, BK-Tree, LSH |
| KERNELS.md | Gaussian, polynomial, Laplacian, and other kernel functions |
| RBF.md | Radial basis function networks |
| INTERPOLATION.md | Linear, cubic spline, bilinear, bicubic |
| GRAPH.md | Adjacency list/matrix graph, BFS/DFS, spanning trees |
| SORT.md | Quick sort, heap sort, counting sort, index sort |
| HASH.md | Locality-sensitive hashing, SimHash |
| RNG.md | Random number generators, sampling, permutations |
| BFGS.md | L-BFGS and BFGS numerical optimizers |
| ICA.md | Independent Component Analysis |
| TENSOR.md | N-dimensional array (CPU tensor without LibTorch) |
| WAVELET.md | DWT, CWT, and wavelet families |
| GAP.md | GAP statistic for optimal cluster count estimation |
| COMPRESSED_SENSING.md | Compressed sensing and basis pursuit |
Classification, regression, clustering, manifold learning, and more
| Document | Topics |
|---|---|
| README | Module overview |
| CLASSIFICATION.md | SVM, Random Forest, AdaBoost, GBDT, KNN, Naïve Bayes, LDA, … |
| REGRESSION.md | SVR, Gaussian Process, LASSO, Ridge, ElasticNet, GBDT, … |
| CLUSTERING.md | K-Means, DBSCAN, BIRCH, SOM, Spectral Clustering, … |
| FEATURE_ENGINEERING.md | Feature selection, PCA, ICA, projection, encoding |
| MANIFOLD.md | t-SNE, UMAP, IsoMap, LLE, Laplacian Eigenmap |
| ANOMALY_DETECTION.md | IsolationForest, one-class SVM, local outlier factor |
| ASSOCIATION_RULE_MINING.md | FP-growth, association rules, frequent itemsets |
| SEQUENCE.md | HMM (Baum-Welch, Viterbi), CRF |
| TIME_SERIES.md | ARIMA, box-plots, autocorrelation |
| REGRESSION.md | Full regression API reference |
| TRAINING.md | Cross-validation, bootstrap, hyper-parameter search |
| VALIDATION.md | Hold-out, k-fold, leave-one-out evaluation |
| VALIDATION_METRICS.md | Accuracy, AUC, F1, RMSE, MAE, confusion matrix |
| HYPER_PARAMETER_OPTIMIZATION.md | Grid search, random search, Bayesian optimization |
| VECTOR_QUANTIZATION.md | LVQ, Neural Gas, SOM as vector quantizers |
| ONNX.md | Exporting and importing models via ONNX |
LibTorch-backed GPU/CPU tensor operations, neural network layers, LLaMA-3 inference, EfficientNet
| Document | Topics |
|---|---|
| README | Full deep-learning & LLM user guide (tensors, layers, loss, optimizer, EfficientNet, LLaMA) |
The deep/README.md covers:
smile.deep.tensor— Tensor factory, indexing, arithmetic, AutoScope memory management, dtype/devicesmile.deep.layer— Linear, Conv2d, pooling, normalization (BN/GN/RMS), dropout, embedding, sequential blockssmile.deep.activation— ReLU, GELU, SiLU, Tanh, Sigmoid, Softmax, GLU, HardShrink, …smile.deep.Loss— MSE, cross-entropy, BCE, Huber, KL, hinge, and moresmile.deep.Optimizer— SGD, Adam, AdamW, RMSpropsmile.deep.Model— Abstract base class + training loopsmile.deep.metric— Accuracy, Precision, Recall, F1Score with macro/micro/weighted averagingsmile.llm—Message,Role,FinishReason,ChatCompletionrecords; sinusoidal & RoPE positional encodingssmile.llm.tokenizer—Tokenizerinterface,TiktokenBPE implementation (LLaMA-3 compatible)smile.llm.llama— Full LLaMA-3 stack:Llama.build(),generate(),chat(), streaming viaSubmissionPublishersmile.vision—VisionModel,ImageDataset,EfficientNet.V2S/M/L()pretrained models, ImageNet labelssmile.vision.transform—Transforminterface,ImageClassificationpipeline, resize/crop/toTensor helpers
Text normalization, tokenization, POS tagging, stemming, relevance ranking
| Document | Topics |
|---|---|
| README | Module overview |
| TOKENIZER.md | Sentence splitter, word tokenizer, regex tokenizer |
| POS.md | Part-of-speech tagging (Brill tagger, HMM tagger) |
| STEM.md | Porter, Lancaster, Lovins stemmers; lemmatization |
| COLLOCATION.md | Bigram/trigram statistical tests, phrase extraction |
| RELEVANCE.md | TF-IDF, BM25, keyword extraction |
| TAXONOMY.md | WordNet integration, synsets, hypernyms |
Swing-based interactive plots and declarative Vega-Lite charts
| Document | Topics |
|---|---|
| README | Swing plotting API — scatter, line, bar, box, histogram, heatmap, surface, contour, wireframe |
| VEGA.md | Declarative smile.plot.vega (Vega-Lite) — JSON spec generation, web/Jupyter rendering |
Quarkus-based REST inference service with OpenAI-compatible API and SSE streaming
| Document | Topics |
|---|---|
| README | Building and running the server, /chat/completions endpoint, SSE streaming, configuration |
REPL / notebook environment for Java, Scala, and Kotlin
| Document | Topics |
|---|---|
| README.md | Desktop Studio notebook UI, cell types, output rendering |
| CLI | CLI entry points (smile, smile shell, smile scala, smile kotlin, smile server) |
Idiomatic Scala shim — concise wrappers, symbolic operators, Scala collections integration
| Document | Topics |
|---|---|
| README | API overview, smile.classification, smile.regression, smile.clustering, smile.plot in Scala |
Idiomatic Kotlin shim — extension functions, named parameters, builder DSLs
| Document | Topics |
|---|---|
| README | API overview, extension functions, Kotlin-style builders |
| packages.md | Full package-by-package listing of all Kotlin extension functions |
Lightweight zero-dependency JSON library for Scala with a clean DSL
| Document | Topics |
|---|---|
| README | Parsing, building, pattern matching, path navigation, serialization |
Use SMILE models inside Spark ML pipelines
| Document | Topics |
|---|---|
| README | SmileTransformer, SmileClassifier, SmileRegressor; training and scoring in Spark DataFrames |
<!-- Core ML algorithms -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-core</artifactId>
<version>6.0.0</version>
</dependency>
<!-- Deep learning + LLMs (requires LibTorch) -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-deep</artifactId>
<version>6.0.0</version>
</dependency>
<!-- Natural language processing -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-nlp</artifactId>
<version>6.0.0</version>
</dependency>
<!-- Data visualization -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-plot</artifactId>
<version>6.0.0</version>
</dependency>libraryDependencies += "com.github.haifengl" %% "smile-scala" % "6.0.0"dependencies {
implementation("com.github.haifengl:smile-kotlin:6.0.0")
}Several algorithms (manifold learning, Gaussian Process, MLP, some clustering) require BLAS and LAPACK.
Linux (Ubuntu / Debian)
sudo apt update
sudo apt install libopenblas-dev libarpack2macOS (Homebrew)
brew install arpack
# If macOS SIP strips DYLD_LIBRARY_PATH, copy the dylib to your working dir:
cp /opt/homebrew/lib/libarpack.dylib .Windows — pre-built DLLs are included in the bin/ directory of the
release package.
Add that directory to PATH.
GPU (CUDA) — make sure the LibTorch CUDA native libraries are on
java.library.path and that your Bytedeco pytorch classifier matches
your CUDA version (e.g., linux-x86_64-gpu-cuda12.4).
import smile.classification.RandomForest;
import smile.data.formula.Formula;
import smile.io.Read;
// Load data
var data = Read.csv("src/test/resources/iris.csv");
// Train a random forest
var forest = RandomForest.fit(Formula.lhs("species"), data);
// Predict
int label = forest.predict(data.get(0));
System.out.println("Predicted class: " + label);For deep learning and LLM examples, see deep/README.md. For visualization examples, see plot/README.md.
SMILE ships with an interactive desktop Studio (notebook-style) and a set of CLI shells. See studio/README.md for full documentation.
Download a pre-packaged release from the releases page, then:
cd bin
./setup # install required native dependencies
./smile # launch SMILE Studio (desktop GUI)Other entry points:
| Command | Description |
|---|---|
./smile |
Desktop notebook IDE |
./smile shell |
Java REPL with all SMILE packages pre-imported |
./smile scala |
Scala REPL |
./smile train |
Train a supervised learning model |
./smile predict |
Predict on a file using a saved model |
./smile serve |
Start the LLM inference server |
To increase the JVM heap:
./smile -J-Xmx30GMost SMILE models implement java.io.Serializable. You can serialize a
trained model to disk and load it in a production environment or inside a
Spark job:
// Save
try (var out = new ObjectOutputStream(new FileOutputStream("model.ser"))) {
out.writeObject(forest);
}
// Load
try (var in = new ObjectInputStream(new FileInputStream("model.ser"))) {
var loaded = (RandomForest) in.readObject();
}SMILE provides two visualization layers:
smile.plot.swing— Swing-based interactive 2D/3D plots. See plot/README.md.smile.plot.vega— Declarative Vega-Lite charts for browsers and Jupyter. See plot/VEGA.md.
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-plot</artifactId>
<version>6.0.0</version>
</dependency>SMILE employs a dual license model designed to meet the development and distribution needs of both commercial distributors (OEMs, ISVs, VARs) and open source projects. For details, see LICENSE. To acquire a commercial license, contact smile.sales@outlook.com.
| Channel | Purpose |
|---|---|
| GitHub Discussions | Questions, ideas, show-and-tell |
Stack Overflow [smile] |
Technical Q&A |
| Issue Tracker | Bug reports and feature requests |
| Online Docs | Tutorials and programming guides |
| Java API · Scala API · Kotlin API · Clojure API | API Javadoc |
Please read CONTRIBUTING.md for build and test instructions.
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||

























