You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fixed issues with k-means. fixed bug with variable for loop and numbering in IRWIN-HALL DISTRIBUTION routine in refactoring. also fixed small bug in line 1250.
Copy file name to clipboardExpand all lines: _apple-2-blog/k-means.md
+9-3Lines changed: 9 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,7 @@ tags:
8
8
- K-means
9
9
- machine learning
10
10
date: 2025-09-20
11
+
last_modified_at: 2025-10-06
11
12
---
12
13
13
14
Wait. Does k-means count as machine learning? Yes. Yes, it does.
@@ -32,7 +33,7 @@ At the end of each loop I draw a line between the latest estimates of cluster ce
32
33
33
34
## K-means explained
34
35
35
-
[K-means clustering](https://en.wikipedia.org/wiki/K-means_clustering){:target="_blank"} is a recursive algorithm that aims to partition \\(n\\) observations into \\(k\\) clusters in which each observation belongs to the cluster with the nearest mean, called the cluster centroid.
36
+
[K-means clustering](https://en.wikipedia.org/wiki/K-means_clustering){:target="_blank"} is an iterative algorithm that aims to partition \\(n\\) observations into \\(k\\) clusters in which each observation belongs to the cluster with the nearest mean, called the cluster centroid.
36
37
37
38
|Step|Description|
38
39
|--|--|
@@ -196,7 +197,7 @@ This code is a slog and it's not really critical to understanding ML but I thoug
196
197
3030 IF KM%(I - 1,1) - KM%(I,1) <> 0 THEN M = -1 * (KM%(I - 1,0) - KM%(I,0)) / (KM%(I - 1,1) - KM%(I,1))
3090 NX = 1 : REM -- REM NUMBER OF INTERSECTIONS --
@@ -230,4 +231,9 @@ I then iterate through the 4 sides of the 'box' on the screen, using the corners
230
231
231
232
Yes! Yes, we can.
232
233
233
-
While k-means is simple, it does not take advantage of our knowledge of the Gaussian nature of the data. If we know that the distributions are Gaussian, which is very frequently the case in machine learning, we can employ a more powerful algorithm: Expectation Maximization (EM). This post is already long enough, so we'll deal with that another day. Eventually, perhaps, we'll also get to deep learning, although developing back propagation for an arbitrary size neural net using APPLESOFT BASIC on an <spanclass="no-break">Apple ][+</span> is not going to be easy.
234
+
While k-means is simple, it does not take advantage of our knowledge of the Gaussian nature of the data. If we know that the distributions are at least approximately Gaussian, which is frequently the case, we can employ a more powerful application of the Expectation Maximization (EM) framework (k-means is a specific implementation of centroid-based clustering that uses an iterative approach similar to EM with 'hard' clustering) that takes advantage of this. This post is already long enough, so we'll deal with that another day. Eventually, perhaps, we'll also get to deep learning, although developing back propagation for an arbitrary size neural net using APPLESOFT BASIC on an <spanclass="no-break">Apple ]\[+</span> is not going to be easy.
235
+
236
+
## Social media sharing
237
+
I shared this post on [Hacker News](https://news.ycombinator.com/item?id=45415510){:target="_blank"} and, to my surprise and delight, got a decent amount of feedback, including some helpful corrections.
238
+
239
+
My [post on Facebook](https://www.facebook.com/markdcramer/posts/pfbid0eG3AuPSxFTqBm4nggDAjqJrQe8VFv5VUEKf5SQ8qtnDjVFZ4qaSZvauDBcKRxSoFl){:target="_blank"} received _significantly_ less reaction. My friends are apparently a lot less interested in ML algorithms on retro hardware than randos on Hacker News. My share requests to Facebook groups [Apple II Enthusiasts](https://www.facebook.com/groups/5251478676){:target="_blank"} and [Retro Microcomputers](https://www.facebook.com/groups/retrocomputers){:target="_blank"} were both rejected, which is a bit disappointing. Still, those are fun groups to follow. My [post on the VCF forum](https://forum.vcfed.org/index.php?threads/ml-on-an-apple.1254709/){:target="_blank"} also got zilch. (I attend the [Vintage Computer Festival](https://vcfed.org/events/vintage-computer-festival-west/){:target="_blank"} at the [Computer History Museum](https://computerhistory.org/){:target="_blank"}, which was a good time.) Perhaps I'll try Reddit and LinkedIn next to see what happens...
Copy file name to clipboardExpand all lines: _apple-2-blog/refactoring.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -135,7 +135,7 @@ The only remaining tweaks were for the plotting routines. The two routines below
135
135
1220 IF X0% < 1 OR X0% > 270 THEN RETURN
136
136
1230 IF X1% < 1 OR X1% > 150 THEN RETURN
137
137
1240 HPLOT X0% - 1,X1% TO X0% + 1,X1%
138
-
1250 HPLOT X0%,X1% - 1 TO X0%,X1 + 1
138
+
1250 HPLOT X0%,X1% - 1 TO X0%,X1% + 1
139
139
1260 RETURN
140
140
141
141
1300 REM == DRAW BOX AT X0, X1 ==
@@ -153,17 +153,17 @@ The only remaining tweaks were for the plotting routines. The two routines below
153
153
The final chunk of code is interesting. After sharing my [Box-Muller](/apple-2-blog/synthesizing-data#box-muller-to-the-rescue) routine with my friend, [Răzvan Surdulescu](https://www.linkedin.com/in/surdules/){:target="_blank"}, he suggested using the [Irwin-Hall Distribution](https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution){:target="_blank"}. `SQR` and `COS` are expensive operations in BASIC and Irwin-Hall [approximates a normal distribution](https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution#Approximating_a_Normal_distribution){:target="_blank"} by sampling 12 uniform random numbers.
154
154
155
155
```bbcbasic
156
-
1400 REM == IRWIN-HALL DISTRIBUTION ==
157
-
1490 Z0 = -6:Z1 = -6
158
-
1500 FOR I = 1 TO 12
159
-
1510 Z0 = Z0 + RND(1)
160
-
1520 Z1 = Z1 + RND(1)
161
-
1530 NEXT
162
-
1540 RETURN
163
-
1550 REM == RANDOM GEN SPEED TEST ==
164
-
1560 FOR I = 1 TO 500
165
-
1570 GOSUB 1400
166
-
1580 NEXT
156
+
1500 REM == IRWIN-HALL DISTRIBUTION ==
157
+
1510 Z0 = -6:Z1 = -6
158
+
1520 FOR Z = 1 TO 12
159
+
1530 Z0 = Z0 + RND(1)
160
+
1540 Z1 = Z1 + RND(1)
161
+
1550 NEXT
162
+
1560 RETURN
163
+
1600 REM == RANDOM GEN SPEED TEST ==
164
+
1610 FOR I = 1 TO 500
165
+
1620 GOSUB 1400
166
+
1630 NEXT
167
167
```
168
168
I gave it a go and ran a comparison by generating 500 pairs of random numbers with each. This took 63s with Box-Muller and 90s with Irwin-Hall, so I will stick with the former. That being said, this was a nifty idea.
0 commit comments