Understanding ELM

Yep, this is the Methodology section

How It Works?

General stuff about linear model and its non-linear extension

Liner model learning capacity vs. number of inputs

Obtaining the best solution – easily

Two-Step Solution

that particular version that we gonna use

mention that many solutions exist, and they are better for specific purposes – like iterative update for PageRank web search algorithm working with millions of inputs

motivation and benefits of that solution:
  • Typical use case with more samples than hidden neurons

  • Other cases make no sense, as with same or large amount of hidden neurons we learn perfectly the training set with its noise that goes against the learning theory of ignoring the noise in data

  • Uses very efficient and highly-optimised computate operations (matrix multiplication and Cholesky solver) that give literally 10x-100x speedup vs manual code, optimally use both CPU and memory, and are available everywhere including mobile hardware and mobile GPUs

  • First step covers the most computations but needs to be done only once. Results of first step are simply added for all data chunks making it super easy to add more data to already existin model. We can even forget some data with it! Super simple to accelerate with GPU or run in chunks for Big Data applications.

  • Second step computes an actual solution and is very fast. Only the second step needs to be repeated for model tuning.

  • Because the second step is run from scratch every time more data is added to model, there is no error accumulation as would otherwise be if we would update the solution itself. Also, it’s very easy to fine-tune model again on new data.

  • Easy to include L2 regularization, and change L2 regularization coefficient on a live model. Very easy to test smaller numbers of neurons on already trained model. L1 regularization kinda runs but it still requires the whole dataset for the best results…

Batch Processing

All about Dask and out-of-core processing when part of data remains on SSD (yes, I purposedly ignore hard drives as an outdated technology in active working storage)

Forgetting Mechanism

Its the same as adding data but with other sign. Re-calculate solution after that.

Stability & Optimization

Basic tricks of linear models - increase neuron numbers until overfit, kill overfitting with L2 regularization.

Combining With Deep Learning

Like, add deep learning for feature extraction. It workz! It fast! Get all the features as super-fast learning, adding more data, or forgetting something.

You can even export trained DL+ELM model as one large model for inference, and even fine-tune it if you wish so – alghouth gains on the test set are questionable.

(upcoming) Deep ELM

There are multi-layer extensions of ELM that actually make sense and work well.