From 979512019c3c9faac7c5cf529b71c2ea64078062 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 6 Jun 2026 22:12:29 +0000 Subject: [PATCH] =?UTF-8?q?Add=20TODO:=20Android=20AAR=20distribution=20+?= =?UTF-8?q?=20Kotlin=20fa=C3=A7ade=20+=20sample=20app?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures two related gaps in the existing Android arm64 packaging: publish a Gradle-consumable AAR (AndroidManifest + jniLibs// layout) alongside the current JAR-with-resources artifact, and provide a first-party Kotlin-friendly façade (Flow adapter, suspend variants) with a minimal sample app to give the AAR an exercised end-to-end path. https://claude.ai/code/session_01CP5if6tGKcN7FGapf7Qugp --- TODO.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/TODO.md b/TODO.md index ec52a26c..3e691551 100644 --- a/TODO.md +++ b/TODO.md @@ -30,6 +30,12 @@ These are JNI plumbing items for upstream API additions. Policy: add only after - Per-run timing line (`TimingsLogger` class + wire-in to `CompletionResponseParser` and `ChatResponseParser`; format mirrors what `llama.cpp` CLI prints — `prompt: N tok in X ms (Y tok/s) | gen: … | cache: N | draft: …`; dedicated SLF4J logger `net.ladenthin.llama.timings` so users can suppress it independently; 7 unit tests pin format + pipeline behaviour). - **Remaining first-batch items:** UTF-8 boundary-safe streaming decoder + jbang example. +### Android distribution: AAR + Kotlin-friendly API + sample app + +- **Publish a proper Android AAR alongside the existing JAR-with-resources packaging.** Today java-llama.cpp already cross-compiles the Android arm64 native lib in two flavours (CPU-only, bundled into the main JAR; OpenCL/Adreno under classifier `opencl-android-aarch64`), but both ship as plain Maven JARs that bury `libjllama.so` under `net/ladenthin/llama/Linux-Android/aarch64/`. Android/Gradle consumers expect an `.aar` with an `AndroidManifest.xml`, the native lib under `jni/arm64-v8a/`, and Maven coordinates like `net.ladenthin:llama-android:@aar`. This is the format the [LLaMAndroid](https://github.com/Rattlyy/LLaMAndroid) integration referenced elsewhere in this file has to work around manually. Investigate using `com.android.library` via Gradle in a sibling module, or hand-rolling the AAR layout from the Maven build. Coordinate ABI coverage with any future armv7-a / x86_64 work so the AAR can declare multiple `jniLibs//` entries when those land. + +- **Provide a Kotlin-friendly façade + Android sample app.** The pure-Java `LlamaIterable` / `LlamaModel` API works on Android today (LLaMAndroid wraps it in a Kotlin `flow {}` block), but a small first-party Kotlin module — coroutine `Flow` adapters, `suspend` variants of the blocking calls, idiomatic `use {}` resource handling — would lower the integration cost meaningfully and serve as the canonical reference for downstream consumers. Pair it with a minimal sample app (single `Activity`, model picker, streaming text view) under e.g. `examples/android-sample/` so the AAR has an exercised end-to-end path in CI. Treat LLaMAndroid as the prior-art baseline; reuse patterns that already work there. + ### GraalVM Native Image evaluation - **Evaluate GraalVM Native Image as an alternative distribution target.** Reference: [GraalVM Native Image](https://www.graalvm.org/latest/reference-manual/native-image/). The pure-Java sibling projects in the README's "Similar Projects" list (mukel's `llama3.java` / `gemma4.java` / `gptoss.java` / `qwen35.java` / `nemotron3.java`) demonstrate that single-jar, no-JNI Java inference is viable for individual model architectures. Native Image opens an orthogonal direction for THIS project: AOT-compile the Java layer + JNI bridge to a self-contained binary that bundles the libjllama.so (or per-OS equivalent) and starts in milliseconds without a JVM, which would make jllama usable in CLI tools, serverless functions, and short-lived processes where JVM startup is the dominant cost.