NORCE Logo

Mini-batch $k$-means provides a scalable baseline to partition the InSAR catalogue into coherent deformation regimes. By processing small batches, it keeps memory usage manageable while approximating the centroid updates of classic $k$-means.

Why naive $k$-Means fails here

Batched $k$-Means overview

# Standard K-Means
X = load_entire_dataset()            # requires all samples in memory
model = KMeans(n_clusters=k)
model.fit(X)                         # multiple full passes over X
labels = model.predict(X)

# Batched Mini-Batch K-Means
model = MiniBatchKMeans(n_clusters=k)
init_pool = []
for batch in stream_dataset():       # batches come from DataModule
    init_pool.append(batch)
    if total_samples(init_pool) >= S:
        break
model.fit(concat(init_pool))         # warm-start centroids once

for batch in stream_dataset():       # second pass covers full dataset
    model.partial_fit(batch)         # incremental centroid updates
labels = []
for batch in stream_dataset():
    labels.append(model.predict(batch))
store_results(labels)

The batched version only needs a fraction of the dataset in memory at any point, while still converging to stable clusters thanks to the incremental updates provided by MiniBatchKMeans.

Results

The dynamic features are the Reservoir embeddings.

The static features in mainland Norway (Lyngen, Nordnes) are:

On Svalbard data, the static features are: