Skip to content

Commit 6f859ac

Browse files
committed
updated k-means and fixed bugs in refactoring
fixed issues with k-means. fixed bug with variable for loop and numbering in IRWIN-HALL DISTRIBUTION routine in refactoring. also fixed small bug in line 1250.
1 parent 329849a commit 6f859ac

2 files changed

Lines changed: 21 additions & 15 deletions

File tree

_apple-2-blog/k-means.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ tags:
88
- K-means
99
- machine learning
1010
date: 2025-09-20
11+
last_modified_at: 2025-10-06
1112
---
1213

1314
Wait. Does k-means count as machine learning? Yes. Yes, it does.
@@ -32,7 +33,7 @@ At the end of each loop I draw a line between the latest estimates of cluster ce
3233

3334
## K-means explained
3435

35-
[K-means clustering](https://en.wikipedia.org/wiki/K-means_clustering){:target="_blank"} is a recursive algorithm that aims to partition \\(n\\) observations into \\(k\\) clusters in which each observation belongs to the cluster with the nearest mean, called the cluster centroid.
36+
[K-means clustering](https://en.wikipedia.org/wiki/K-means_clustering){:target="_blank"} is an iterative algorithm that aims to partition \\(n\\) observations into \\(k\\) clusters in which each observation belongs to the cluster with the nearest mean, called the cluster centroid.
3637

3738
|Step|Description|
3839
|--|--|
@@ -196,7 +197,7 @@ This code is a slog and it's not really critical to understanding ML but I thoug
196197
3030 IF KM%(I - 1,1) - KM%(I,1) <> 0 THEN M = -1 * (KM%(I - 1,0) - KM%(I,0)) / (KM%(I - 1,1) - KM%(I,1))
197198
3040 P%(0,0) = (KM%(I,0) - KM%(I - 1,0)) / 2 + KM%(I - 1,0)
198199
3050 P%(0,1) = (KM%(I,1) - KM%(I - 1,1)) / 2 + KM%(I - 1,1)
199-
3060 GOSUB 3500
200+
3060 GOSUB 3080
200201
3070 NEXT
201202
3080 REM -- DRAW LINE FROM SLOPE AND POINT --
202203
3090 NX = 1 : REM -- REM NUMBER OF INTERSECTIONS --
@@ -230,4 +231,9 @@ I then iterate through the 4 sides of the 'box' on the screen, using the corners
230231

231232
Yes! Yes, we can.
232233

233-
While k-means is simple, it does not take advantage of our knowledge of the Gaussian nature of the data. If we know that the distributions are Gaussian, which is very frequently the case in machine learning, we can employ a more powerful algorithm: Expectation Maximization (EM). This post is already long enough, so we'll deal with that another day. Eventually, perhaps, we'll also get to deep learning, although developing back propagation for an arbitrary size neural net using APPLESOFT BASIC on an <span class="no-break">Apple ][+</span> is not going to be easy.
234+
While k-means is simple, it does not take advantage of our knowledge of the Gaussian nature of the data. If we know that the distributions are at least approximately Gaussian, which is frequently the case, we can employ a more powerful application of the Expectation Maximization (EM) framework (k-means is a specific implementation of centroid-based clustering that uses an iterative approach similar to EM with 'hard' clustering) that takes advantage of this. This post is already long enough, so we'll deal with that another day. Eventually, perhaps, we'll also get to deep learning, although developing back propagation for an arbitrary size neural net using APPLESOFT BASIC on an <span class="no-break">Apple ]\[+</span> is not going to be easy.
235+
236+
## Social media sharing
237+
I shared this post on [Hacker News](https://news.ycombinator.com/item?id=45415510){:target="_blank"} and, to my surprise and delight, got a decent amount of feedback, including some helpful corrections.
238+
239+
My [post on Facebook](https://www.facebook.com/markdcramer/posts/pfbid0eG3AuPSxFTqBm4nggDAjqJrQe8VFv5VUEKf5SQ8qtnDjVFZ4qaSZvauDBcKRxSoFl){:target="_blank"} received _significantly_ less reaction. My friends are apparently a lot less interested in ML algorithms on retro hardware than randos on Hacker News. My share requests to Facebook groups [Apple II Enthusiasts](https://www.facebook.com/groups/5251478676){:target="_blank"} and [Retro Microcomputers](https://www.facebook.com/groups/retrocomputers){:target="_blank"} were both rejected, which is a bit disappointing. Still, those are fun groups to follow. My [post on the VCF forum](https://forum.vcfed.org/index.php?threads/ml-on-an-apple.1254709/){:target="_blank"} also got zilch. (I attend the [Vintage Computer Festival](https://vcfed.org/events/vintage-computer-festival-west/){:target="_blank"} at the [Computer History Museum](https://computerhistory.org/){:target="_blank"}, which was a good time.) Perhaps I'll try Reddit and LinkedIn next to see what happens...

_apple-2-blog/refactoring.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ The only remaining tweaks were for the plotting routines. The two routines below
135135
1220 IF X0% < 1 OR X0% > 270 THEN RETURN
136136
1230 IF X1% < 1 OR X1% > 150 THEN RETURN
137137
1240 HPLOT X0% - 1,X1% TO X0% + 1,X1%
138-
1250 HPLOT X0%,X1% - 1 TO X0%,X1 + 1
138+
1250 HPLOT X0%,X1% - 1 TO X0%,X1% + 1
139139
1260 RETURN
140140
141141
1300 REM == DRAW BOX AT X0, X1 ==
@@ -153,17 +153,17 @@ The only remaining tweaks were for the plotting routines. The two routines below
153153
The final chunk of code is interesting. After sharing my [Box-Muller](/apple-2-blog/synthesizing-data#box-muller-to-the-rescue) routine with my friend, [Răzvan Surdulescu](https://www.linkedin.com/in/surdules/){:target="_blank"}, he suggested using the [Irwin-Hall Distribution](https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution){:target="_blank"}. `SQR` and `COS` are expensive operations in BASIC and Irwin-Hall [approximates a normal distribution](https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution#Approximating_a_Normal_distribution){:target="_blank"} by sampling 12 uniform random numbers.
154154

155155
```bbcbasic
156-
1400 REM == IRWIN-HALL DISTRIBUTION ==
157-
1490 Z0 = -6:Z1 = -6
158-
1500 FOR I = 1 TO 12
159-
1510 Z0 = Z0 + RND(1)
160-
1520 Z1 = Z1 + RND(1)
161-
1530 NEXT
162-
1540 RETURN
163-
1550 REM == RANDOM GEN SPEED TEST ==
164-
1560 FOR I = 1 TO 500
165-
1570 GOSUB 1400
166-
1580 NEXT
156+
1500 REM == IRWIN-HALL DISTRIBUTION ==
157+
1510 Z0 = -6:Z1 = -6
158+
1520 FOR Z = 1 TO 12
159+
1530 Z0 = Z0 + RND(1)
160+
1540 Z1 = Z1 + RND(1)
161+
1550 NEXT
162+
1560 RETURN
163+
1600 REM == RANDOM GEN SPEED TEST ==
164+
1610 FOR I = 1 TO 500
165+
1620 GOSUB 1400
166+
1630 NEXT
167167
```
168168
I gave it a go and ran a comparison by generating 500 pairs of random numbers with each. This took 63s with Box-Muller and 90s with Irwin-Hall, so I will stick with the former. That being said, this was a nifty idea.
169169

0 commit comments

Comments
 (0)