Skip to content

haifengl/smile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5,883 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Statistical Machine Intelligence & Learning Engine SMILE

Maven Central

SMILE (Statistical Machine Intelligence & Learning Engine) is a comprehensive, high-performance machine learning framework for the JVM. SMILE v5+ requires Java 25; v4.x requires Java 21; all previous versions require Java 8. SMILE also provides idiomatic APIs for Scala and Kotlin. With advanced data structures and algorithms, SMILE delivers state-of-the-art performance across every aspect of machine learning.


Table of Contents

  1. Features
  2. Module Map
  3. Installation
  4. Quick Start
  5. SMILE Studio & Shell
  6. Model Serialization
  7. Visualization
  8. License
  9. Issues & Discussions
  10. Contributing
  11. Maintainers
  12. Gallery

Features

Area Highlights
LLM LLaMA-3 inference, tiktoken BPE tokenizer, OpenAI-compatible REST server, SSE chat streaming
Deep Learning LibTorch/GPU backend, EfficientNet-V2 image classification, custom layer API
Classification SVM, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, Neural Networks, RBF Networks, MaxEnt, KNN, Naïve Bayes, LDA/QDA/RDA
Regression SVR, Gaussian Process, Regression Trees, GBDT, Random Forest, RBF, OLS, LASSO, ElasticNet, Ridge
Clustering BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical, SIB, SOM, Spectral, Min-Entropy
Manifold Learning IsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA
Feature Engineering Genetic Algorithm selection, Ensemble selection, TreeSHAP, SNR, Sum-Squares ratio, data transformations, formula API
NLP Sentence / word tokenization, Bigram test, Phrase & Keyword extraction, Stemmer, POS tagging, Relevance ranking
Association Rules FP-growth frequent itemset mining
Sequence Learning Hidden Markov Model, Conditional Random Field
Nearest Neighbor BK-Tree, Cover Tree, KD-Tree, SimHash, LSH
Numerical Methods Linear algebra, numerical optimization (BFGS, L-BFGS), interpolation, wavelets, RBF, distributions, hypothesis tests
Visualization Swing plots (scatter, line, bar, box, histogram, surface, heatmap, contour, …) and declarative Vega-Lite charts

Module Map

Each module has its own detailed user guide. Click the README link for the module overview, or drill into individual topic guides.

base/ — Foundation

Data structures, math, linear algebra, statistical utilities, I/O

Document Topics
README Module overview and dependency setup
DATA_FRAME.md DataFrame API — creation, selection, transformation
DATA_IO.md CSV, JSON, Parquet, Arrow, JDBC, Avro readers/writers
DATA_TRANSFORMATION.md Scalers, encoders, imputers, feature transforms
DATASET.md Built-in benchmark and real-world datasets
FORMULA.md R-style formula language for model matrices
DISTRIBUTIONS.md Probability distributions (Normal, Poisson, Beta, …)
HYPOTHESIS_TESTING.md t-test, chi-squared, ANOVA, KS-test, …
DISTANCES.md Euclidean, Mahalanobis, Hamming, edit distance, …
NEAREST_NEIGHBOR.md KD-Tree, Cover Tree, BK-Tree, LSH
KERNELS.md Gaussian, polynomial, Laplacian, and other kernel functions
RBF.md Radial basis function networks
INTERPOLATION.md Linear, cubic spline, bilinear, bicubic
GRAPH.md Adjacency list/matrix graph, BFS/DFS, spanning trees
SORT.md Quick sort, heap sort, counting sort, index sort
HASH.md Locality-sensitive hashing, SimHash
RNG.md Random number generators, sampling, permutations
BFGS.md L-BFGS and BFGS numerical optimizers
ICA.md Independent Component Analysis
TENSOR.md N-dimensional array (CPU tensor without LibTorch)
WAVELET.md DWT, CWT, and wavelet families
GAP.md GAP statistic for optimal cluster count estimation
COMPRESSED_SENSING.md Compressed sensing and basis pursuit

core/ — Machine Learning Algorithms

Classification, regression, clustering, manifold learning, and more

Document Topics
README Module overview
CLASSIFICATION.md SVM, Random Forest, AdaBoost, GBDT, KNN, Naïve Bayes, LDA, …
REGRESSION.md SVR, Gaussian Process, LASSO, Ridge, ElasticNet, GBDT, …
CLUSTERING.md K-Means, DBSCAN, BIRCH, SOM, Spectral Clustering, …
FEATURE_ENGINEERING.md Feature selection, PCA, ICA, projection, encoding
MANIFOLD.md t-SNE, UMAP, IsoMap, LLE, Laplacian Eigenmap
ANOMALY_DETECTION.md IsolationForest, one-class SVM, local outlier factor
ASSOCIATION_RULE_MINING.md FP-growth, association rules, frequent itemsets
SEQUENCE.md HMM (Baum-Welch, Viterbi), CRF
TIME_SERIES.md ARIMA, box-plots, autocorrelation
REGRESSION.md Full regression API reference
TRAINING.md Cross-validation, bootstrap, hyper-parameter search
VALIDATION.md Hold-out, k-fold, leave-one-out evaluation
VALIDATION_METRICS.md Accuracy, AUC, F1, RMSE, MAE, confusion matrix
HYPER_PARAMETER_OPTIMIZATION.md Grid search, random search, Bayesian optimization
VECTOR_QUANTIZATION.md LVQ, Neural Gas, SOM as vector quantizers
ONNX.md Exporting and importing models via ONNX

deep/ — Deep Learning & LLMs

LibTorch-backed GPU/CPU tensor operations, neural network layers, LLaMA-3 inference, EfficientNet

Document Topics
README Full deep-learning & LLM user guide (tensors, layers, loss, optimizer, EfficientNet, LLaMA)

The deep/README.md covers:

  • smile.deep.tensor — Tensor factory, indexing, arithmetic, AutoScope memory management, dtype/device
  • smile.deep.layer — Linear, Conv2d, pooling, normalization (BN/GN/RMS), dropout, embedding, sequential blocks
  • smile.deep.activation — ReLU, GELU, SiLU, Tanh, Sigmoid, Softmax, GLU, HardShrink, …
  • smile.deep.Loss — MSE, cross-entropy, BCE, Huber, KL, hinge, and more
  • smile.deep.Optimizer — SGD, Adam, AdamW, RMSprop
  • smile.deep.Model — Abstract base class + training loop
  • smile.deep.metric — Accuracy, Precision, Recall, F1Score with macro/micro/weighted averaging
  • smile.llmMessage, Role, FinishReason, ChatCompletion records; sinusoidal & RoPE positional encodings
  • smile.llm.tokenizerTokenizer interface, Tiktoken BPE implementation (LLaMA-3 compatible)
  • smile.llm.llama — Full LLaMA-3 stack: Llama.build(), generate(), chat(), streaming via SubmissionPublisher
  • smile.visionVisionModel, ImageDataset, EfficientNet.V2S/M/L() pretrained models, ImageNet labels
  • smile.vision.transformTransform interface, ImageClassification pipeline, resize/crop/toTensor helpers

nlp/ — Natural Language Processing

Text normalization, tokenization, POS tagging, stemming, relevance ranking

Document Topics
README Module overview
TOKENIZER.md Sentence splitter, word tokenizer, regex tokenizer
POS.md Part-of-speech tagging (Brill tagger, HMM tagger)
STEM.md Porter, Lancaster, Lovins stemmers; lemmatization
COLLOCATION.md Bigram/trigram statistical tests, phrase extraction
RELEVANCE.md TF-IDF, BM25, keyword extraction
TAXONOMY.md WordNet integration, synsets, hypernyms

plot/ — Data Visualization

Swing-based interactive plots and declarative Vega-Lite charts

Document Topics
README Swing plotting API — scatter, line, bar, box, histogram, heatmap, surface, contour, wireframe
VEGA.md Declarative smile.plot.vega (Vega-Lite) — JSON spec generation, web/Jupyter rendering

serve/ — Inference Server

Quarkus-based REST inference service with OpenAI-compatible API and SSE streaming

Document Topics
README Building and running the server, /chat/completions endpoint, SSE streaming, configuration

studio/ — Interactive Shell & Desktop IDE

REPL / notebook environment for Java, Scala, and Kotlin

Document Topics
README.md Desktop Studio notebook UI, cell types, output rendering
CLI CLI entry points (smile, smile shell, smile scala, smile kotlin, smile server)

scala/ — Scala API

Idiomatic Scala shim — concise wrappers, symbolic operators, Scala collections integration

Document Topics
README API overview, smile.classification, smile.regression, smile.clustering, smile.plot in Scala

kotlin/ — Kotlin API

Idiomatic Kotlin shim — extension functions, named parameters, builder DSLs

Document Topics
README API overview, extension functions, Kotlin-style builders
packages.md Full package-by-package listing of all Kotlin extension functions

json/ — JSON Library (Scala)

Lightweight zero-dependency JSON library for Scala with a clean DSL

Document Topics
README Parsing, building, pattern matching, path navigation, serialization

spark/ — Apache Spark Integration

Use SMILE models inside Spark ML pipelines

Document Topics
README SmileTransformer, SmileClassifier, SmileRegressor; training and scoring in Spark DataFrames

Installation

Maven

<!-- Core ML algorithms -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-core</artifactId>
  <version>6.0.0</version>
</dependency>

<!-- Deep learning + LLMs (requires LibTorch) -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-deep</artifactId>
  <version>6.0.0</version>
</dependency>

<!-- Natural language processing -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-nlp</artifactId>
  <version>6.0.0</version>
</dependency>

<!-- Data visualization -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-plot</artifactId>
  <version>6.0.0</version>
</dependency>

SBT (Scala)

libraryDependencies += "com.github.haifengl" %% "smile-scala" % "6.0.0"

Gradle (Kotlin)

dependencies {
    implementation("com.github.haifengl:smile-kotlin:6.0.0")
}

Native Libraries (BLAS / LAPACK)

Several algorithms (manifold learning, Gaussian Process, MLP, some clustering) require BLAS and LAPACK.

Linux (Ubuntu / Debian)

sudo apt update
sudo apt install libopenblas-dev libarpack2

macOS (Homebrew)

brew install arpack
# If macOS SIP strips DYLD_LIBRARY_PATH, copy the dylib to your working dir:
cp /opt/homebrew/lib/libarpack.dylib .

Windows — pre-built DLLs are included in the bin/ directory of the release package. Add that directory to PATH.

GPU (CUDA) — make sure the LibTorch CUDA native libraries are on java.library.path and that your Bytedeco pytorch classifier matches your CUDA version (e.g., linux-x86_64-gpu-cuda12.4).


Quick Start

import smile.classification.RandomForest;
import smile.data.formula.Formula;
import smile.io.Read;

// Load data
var data = Read.csv("src/test/resources/iris.csv");

// Train a random forest
var forest = RandomForest.fit(Formula.lhs("species"), data);

// Predict
int label = forest.predict(data.get(0));
System.out.println("Predicted class: " + label);

For deep learning and LLM examples, see deep/README.md. For visualization examples, see plot/README.md.


SMILE Studio & Shell

SMILE ships with an interactive desktop Studio (notebook-style) and a set of CLI shells. See studio/README.md for full documentation.

Download a pre-packaged release from the releases page, then:

cd bin
./setup      # install required native dependencies
./smile      # launch SMILE Studio (desktop GUI)

Other entry points:

Command Description
./smile Desktop notebook IDE
./smile shell Java REPL with all SMILE packages pre-imported
./smile scala Scala REPL
./smile train Train a supervised learning model
./smile predict Predict on a file using a saved model
./smile serve Start the LLM inference server

To increase the JVM heap:

./smile -J-Xmx30G

Model Serialization

Most SMILE models implement java.io.Serializable. You can serialize a trained model to disk and load it in a production environment or inside a Spark job:

// Save
try (var out = new ObjectOutputStream(new FileOutputStream("model.ser"))) {
    out.writeObject(forest);
}

// Load
try (var in = new ObjectInputStream(new FileInputStream("model.ser"))) {
    var loaded = (RandomForest) in.readObject();
}

Visualization

SMILE provides two visualization layers:

  • smile.plot.swing — Swing-based interactive 2D/3D plots. See plot/README.md.
  • smile.plot.vega — Declarative Vega-Lite charts for browsers and Jupyter. See plot/VEGA.md.
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-plot</artifactId>
  <version>6.0.0</version>
</dependency>

License

SMILE employs a dual license model designed to meet the development and distribution needs of both commercial distributors (OEMs, ISVs, VARs) and open source projects. For details, see LICENSE. To acquire a commercial license, contact smile.sales@outlook.com.


Issues & Discussions

Channel Purpose
GitHub Discussions Questions, ideas, show-and-tell
Stack Overflow [smile] Technical Q&A
Issue Tracker Bug reports and feature requests
Online Docs Tutorials and programming guides
Java API · Scala API · Kotlin API · Clojure API API Javadoc

Contributing

Please read CONTRIBUTING.md for build and test instructions.


Maintainers


Gallery

SPLOM

Scatterplot Matrix

Scatter

Scatter Plot

Heart

Line Plot

Surface

Surface Plot

Scatter

Bar Plot

Box Plot

Box Plot

Histogram

Histogram Heatmap

Rolling

Rolling Average

Map

Geo Map

UMAP

UMAP

Text

Text Plot

Contour

Heatmap with Contour

Hexmap

Hexmap

IsoMap

IsoMap

LLE

LLE

Kernel PCA

Kernel PCA

Neural Network

Neural Network

SVM

SVM

Hierarchical Clustering

Hierarchical Clustering

SOM

SOM

DBSCAN

DBSCAN

Neural Gas

Neural Gas

Wavelet

Wavelet

Mixture

Exponential Family Mixture

Teapot

Teapot Wireframe

Interpolation

Grid Interpolation