From 979512019c3c9faac7c5cf529b71c2ea64078062 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 6 Jun 2026 22:12:29 +0000
Subject: [PATCH] =?UTF-8?q?Add=20TODO:=20Android=20AAR=20distribution=20+?=
 =?UTF-8?q?=20Kotlin=20fa=C3=A7ade=20+=20sample=20app?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Captures two related gaps in the existing Android arm64 packaging:
publish a Gradle-consumable AAR (AndroidManifest + jniLibs/<abi>/ layout)
alongside the current JAR-with-resources artifact, and provide a first-party
Kotlin-friendly façade (Flow adapter, suspend variants) with a minimal
sample app to give the AAR an exercised end-to-end path.

https://claude.ai/code/session_01CP5if6tGKcN7FGapf7Qugp
---
 TODO.md | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/TODO.md b/TODO.md
index ec52a26c..3e691551 100644
--- a/TODO.md
+++ b/TODO.md
@@ -30,6 +30,12 @@ These are JNI plumbing items for upstream API additions. Policy: add only after
     - Per-run timing line (`TimingsLogger` class + wire-in to `CompletionResponseParser` and `ChatResponseParser`; format mirrors what `llama.cpp` CLI prints — `prompt: N tok in X ms (Y tok/s) | gen: … | cache: N | draft: …`; dedicated SLF4J logger `net.ladenthin.llama.timings` so users can suppress it independently; 7 unit tests pin format + pipeline behaviour).
   - **Remaining first-batch items:** UTF-8 boundary-safe streaming decoder + jbang example.
 
+### Android distribution: AAR + Kotlin-friendly API + sample app
+
+- **Publish a proper Android AAR alongside the existing JAR-with-resources packaging.** Today java-llama.cpp already cross-compiles the Android arm64 native lib in two flavours (CPU-only, bundled into the main JAR; OpenCL/Adreno under classifier `opencl-android-aarch64`), but both ship as plain Maven JARs that bury `libjllama.so` under `net/ladenthin/llama/Linux-Android/aarch64/`. Android/Gradle consumers expect an `.aar` with an `AndroidManifest.xml`, the native lib under `jni/arm64-v8a/`, and Maven coordinates like `net.ladenthin:llama-android:<version>@aar`. This is the format the [LLaMAndroid](https://github.com/Rattlyy/LLaMAndroid) integration referenced elsewhere in this file has to work around manually. Investigate using `com.android.library` via Gradle in a sibling module, or hand-rolling the AAR layout from the Maven build. Coordinate ABI coverage with any future armv7-a / x86_64 work so the AAR can declare multiple `jniLibs/<abi>/` entries when those land.
+
+- **Provide a Kotlin-friendly façade + Android sample app.** The pure-Java `LlamaIterable` / `LlamaModel` API works on Android today (LLaMAndroid wraps it in a Kotlin `flow {}` block), but a small first-party Kotlin module — coroutine `Flow<LlamaOutput>` adapters, `suspend` variants of the blocking calls, idiomatic `use {}` resource handling — would lower the integration cost meaningfully and serve as the canonical reference for downstream consumers. Pair it with a minimal sample app (single `Activity`, model picker, streaming text view) under e.g. `examples/android-sample/` so the AAR has an exercised end-to-end path in CI. Treat LLaMAndroid as the prior-art baseline; reuse patterns that already work there.
+
 ### GraalVM Native Image evaluation
 
 - **Evaluate GraalVM Native Image as an alternative distribution target.** Reference: [GraalVM Native Image](https://www.graalvm.org/latest/reference-manual/native-image/). The pure-Java sibling projects in the README's "Similar Projects" list (mukel's `llama3.java` / `gemma4.java` / `gptoss.java` / `qwen35.java` / `nemotron3.java`) demonstrate that single-jar, no-JNI Java inference is viable for individual model architectures. Native Image opens an orthogonal direction for THIS project: AOT-compile the Java layer + JNI bridge to a self-contained binary that bundles the libjllama.so (or per-OS equivalent) and starts in milliseconds without a JVM, which would make jllama usable in CLI tools, serverless functions, and short-lived processes where JVM startup is the dominant cost.