Hierarchical Temporal Memory, NuPIC, and Numenta’s Commendable Behavior

For my independent study I decided to research a fairly recently developed artificial intelligence concept known as Hierarchical Temporal Memory (HTM). I was delighted when Dr. Dankel agreed to oversee my project, as I greatly respect him and he has years of experience teaching artificial intelligence concepts.

The goal of my independent study was to learn about the HTM and, if time allowed, create my own implementation and test it. As I delved into the subject, however, it became apparent that there simply would not be anywhere near the necessary amount of time to create an implementation, so I decided to settle for a basic understanding.

This document contains an account of my research and present understanding of the HTM, as well as a brief history of the concept.

Concept Origins

The first public mention of the HTM I was able to find appears to be from Jeff Hawkins’ book, On Intelligence.

After leaving Palm, Inc., Hawkins continued a personal quest he began decades before: to invent a general theory that explains how the brain works. Together with Donna Dubinsky and Dileep George, he founded Numenta Incorporated in March of 2005.¹ Later on, Numenta renamed itself to Grok in order to emphasize its transition from a research-based company to a more product-oriented commercial venture. In this document, I will use the name Numenta when referring to Hawkins’ company because its use is still widespread and can be found throughout their documentation, source code, and employee email addresses.

The HTM builds upon artificial neural network (ANN) and Bayesian concepts that came before it² (although you wouldn’t know this from Numenta’s documentation). Its novelty lies in its choice of topology and the way it chooses to incorporate and interpret modern models of the neocortex.

Roadblocks Encountered During Research

I originally planned to delve much deeper into this subject than I was able to. My hope was to create my own implementation of the HTM and test it on various input data, but for various reasons, detailed below, I was forced to settle for just a basic understanding of the concept.

Limited time

Although the thought of implementing the HTM on my own might sound too ambitious for an independent study project, I thought that I might have a chance of doing it partly because I’d scheduled the course for Summer C, but mostly due to my own ignorance about the complexity that would be involved.

Had this independent study been the sole focus of my summer learning, I might have made some decent progress in that direction, especially when Numenta suddenly announced (this very summer) to open source their implementation. However, I had also signed up for two math courses that turned out to demand far more of my attention than I had anticipated (all of it). I struggled along with the math courses and ultimately decided to drop both of them halfway through Summer A. At that point, I hadn’t had a chance to spend any time on my independent study.

Dropping both math courses meant I had to arrange for another independent study course (math related) in order to fulfill my graduation requirements. Having thus obtained some free time, I immediately allocated a significant portion of it to another project.⁴

These developments, coupled with my responsibilities at home and at work, meant that I wouldn’t be able to complete my study of the HTM before the end of Summer C. Dr. Dankel was very kind and gracious enough to grant me an extension so that I could finish up this work before Fall classes began in full swing. For this, he has my sincerest gratitude and thanks.

Issues encountered with NuPIC

As mentioned, Numenta released their implementation of the HTM and CLA as an open source project, calling it the Numenta Platform for Intelligent Computing, or NuPIC for short.

As wonderful as this news was, I quickly discovered that NuPIC contained many bugs, preventing it from simply compiling on many standard system setups, including mine.

A catastrophic choice: C++ and Python

C++

I have well over a decade of experience writing software in C-based languages and consider myself to be an expert in the language and all its well known dialects. Of the “flavors of C” out there, I can say without hesitation that C++ is the absolute worst. I’d even go so far as to call C++ one of the worst programming languages ever invented by humankind, made all the worse due to its widespread use.

Linus Torvalds, mild-mannered father of three, husband of a a six-time Finnish national karate champion⁵, and author of the Linux kernel and the git version control system, had this to say about C++:

A complete treatise on the tragedy that is C++ is beyond the scope of this document (and would make for an excellent dissertation), but needless to say, I believe Numenta’s decision to use it in NuPIC is a most unfortunate one. I suspect its costs will outweigh any perceived benefits they have in mind.

Python

Prior to my involvement with NuPIC, I had not harbored any ill feelings toward Python. I later found out this was entirely due to my lack of exposure to it. I had never before worked in any serious capacity with the language, only occasionally modifying a script here and there.

On its surface, Python gives off the appearance of being an elegant—perhaps one could even say beautiful—programming language. This is due in part to the meaning it assigns to whitespace. Python code must look a certain way and follow very precise rules regarding whitespace, because tabs, spaces, etc., all have meaning in the language that can change how the program behaves.

This unorthodox decision, on the one hand, promotes a single concise and ubiquitous code style that all Python developers share. This can make it easier for programmers to read each other’s code and can also prevent potential quarrels from arising over who should be forced to change their coding style.

On the other hand, this decision comes at a price that can be experienced by anyone who has spent time sharing or copying Python code online. Web browsers, web forum software, and differences between editors, encodings, and operating systems can mangle the original formatting of the code, making it dead on arrival.

If that was Python’s only problem, Numenta’s decision to use it might not be so unfortunate. After all, Python’s syntax is fairly straightforward when compared with other languages, and might even deserve some praise.

Surprisingly, most of the issues I had with Python lay outside of the language itself, and had everything to do with its ecosystem. The majority of the problems I encountered attempting to build NuPIC and run its demos were, in fact, related to Python, not C++.

Python’s lack of a decent package manager…

To start, Python does not have a good package manager for installing modules. At the moment pip appears to be the most popular package manager, and is the one used by NuPIC. However, pip is missing features commonly found in other package managers (like an info command to lookup information about a module), and failed to install dependencies for at least one module in NuPIC.

…turns out to be a deeper problem surrounding Python

Python’s package managers cannot be fully blamed for their shortcomings and bugs. The real cause of many of the problems NuPIC had with its Python setup stem from design decisions related to how Python installs itself on systems and handles modules, as well as a long-standing schism in its versioning.

An in-depth dissection of Python’s ecosystem, folder hierarchy, and module system is also beyond the scope of this document, but I will attempt to briefly list some of the questionable design decisions in this area:

Type of file	Override option
Python modules	–install-purelib
extension modules	–install-platlib
all modules	–install-lib
scripts	–install-scripts
data	–install-data
C headers	–install-headers

Bundled dependencies

A significant portion of my time was spent learning NuPIC’s build system and getting it to first compile, and then to run the examples.

For some reason, NuPIC’s repository contained not only the source code of its dependencies, but also binaries of various libraries and executables, one for each of the two platforms it supports (darwin64 and linux64). I don’t think I’ve ever seen that before. I spent a while changing the build scripts to not rely on these packaged dependencies and instead fetch them from an online repository using a package manager.

Once I got the project to compile successfully, the next few days were spent trying to get this test program to run successfully:

UNIX convention violations and environment pollution

Notice the bizarre use of environment variables in the command above. For some inexplicable reason the project’s README instructs developers to modify their bash configuration script to export several NuPIC-related environment variables:

This also results in the alteration of the user’s PATH, PYTHONPATH, and LD_LIBRARY_PATH environment variables in a very arrogant (and simultaneously ignorant) fashion. These variables aren’t meant to be modified by a project. They exist as a convenience for users to modify the behavior of existing programs, and to customize their shell environment.

Worse, NuPIC exports several variables into the user’s environment, a practice that is generally frowned upon and referred to as “polluting” a user’s environment. There is no legitimate reason that any of this is necessary. I hope future versions of the software remove this behavior.

Hard coded path(s)

NuPIC’s README file claimed that it supported either Python 2.6 or 2.7, however I quickly discovered that this was not true. NuPIC’s htmtest failed to run for various reasons, and one of those was the fact that the project appeared to be untested with Python 2.7 and contained at least one critical hard-coded path reference to python2.6.

Instead of designing NuPIC in a generic fashion that would allow every part of it to dynamically find modules and dependencies at runtime, the project was designed to be compiled for a particular environment, making it impossible to create a portable library out of it that could be linked against and distributed with a program that made use of its features.

Alarming problem with the Apache CLA and Numenta’s commendable behavior

After fixing some of the issues I found with NuPIC’s build system, I submitted my modifications in a pull request and was asked to sign a Contributor’s License Agreement (CLA, not to be confused with the Cortical Learning Algorithm). Upon review, I noticed it contained a troubling sentence relating the granting of a patent license:

Unlike other types of literature, legal documents appear to encourage the use of run-on sentences, and therefore the reader can be forgiven if they don’t immediately see the very alarming scenario that sentence appears to allow for.

I was told that it was a word-for-word copy of the same section in the Apache CLA (v2) (a very common CLA with a long history of use), and therefore it would not be changed. However, after some additional poking, they brought up the issue with their legal team and discussed it internally. Eventually, they agreed to add a few clarifying words that would address the issue completely.

Of significance, Numenta announced the changes via their blog, and stated that they would allow existing contributors to sign the updated version. Matt Taylor, Numenta’s “Community Flag-Bearer”, explained the essence of what was clarified in the update:

I am quite grateful for Matt’s help in addressing this issue. I have no doubt that his professionalism played an important role in Numenta’s decision.

Significance for the Open Source Community

The fact that a fairly large and well known company took these steps to clarify the Apache CLA has significant consequences for the entire Open Source community.

In effect, Numenta’s actions legitimize the concerns that were raised, which sends a strong signal to every other company out there that uses the Apache CLA. It also sends a strong signal to every single developer who has ever signed a CLA that contains an identical (or similar) patent license clause.

The group most affected, however, are developers who have not yet signed an Apache-based CLA and have become aware of this issue (either through this paper or some blog post). The reason for this is that even if most companies would never abuse the CLA in the manner that the original language allows, the mere awareness of the possibility implies consent.

Why? Because if you are aware of the potential consequences that signing a legal document can have, and you still put your signature on it, then you cannot even use ignorance as a defense should the issue ever arise. That was the reason it became impossible for me to sign Numenta’s original CLA:

At the time of publishing, the Apache CLA was at version 2.0. Hopefully, the Apache Foundation can amend their CLA in a future update.

Overview of the HTM

In these sections we’ll give an overview of the essential concepts necessary to have a basic understanding of the HTM. We’ll also consider a few ways in which they compare with other types of artificial neural networks (ANNs). Some of the topics discussed in Numenta’s white paper, including reasons for various design decisions, performance considerations, etc., won’t be covered. Prior exposure to basic ANN concepts is assumed.

‘H’ is for ‘Hierarchy’

The HTM has a hierarchical topology on account of hierarchies observed in some naturally occurring neural networks, such as those observed in the brain.

A hierarchical topology is also useful because it allows for the mapping of highly detailed (and possibly noisy) data, to progressively more stable and abstract concepts in the context of limited computer memory. The graphic below was taken from Numenta’s white paper and illustrates four HTM regions stacked on top of one another in a hierarchy. The aforementioned “noisy data” (pixels from a webcam, bits and bytes from an mp3, etc.) are fed into the bottom region of the hierarchy:

Cells

An HTM can be composed of just one region, or several regions with data being fed through them. The region itself is comprised of interconnected cells that can be in one of three states:

The cells are stacked on top of each other in vertical columns forming a three dimensional grid:

Sparse Distributed Representations

Recall that the purpose of a region is to map (potentially noisy) input-data to more stable abstract concepts. These “abstract concepts” actually have very non-abstract representations, namely the on or off state of cells within the region.

HTM regions have “sparse distributed representations”. Numenta’s whitepaper defines this term as follows:

I think the paper did a fine job explaining how this works in practice with the following example:

‘T’ is for ‘Temporal’

Numenta’s white paper states that, “Time plays a crucial role in learning, inference, and prediction.” We’ll see the truth in this when we compare the HTM with purely feed-forward ANNs.

HTM networks are trained with, and form predictions about, time-varying input-data. We should have good reason to believe that the data contains some type of temporal pattern, otherwise it is simply noise (like static on the radio). Too many unrelated temporal patterns in the data-stream can also lead to poor results (students with ADHD and poor study habits know this very well).

‘M’ is for ‘Memory’

We’ve been dancing around the technical details for a while now, so let’s have a closer look at how HTMs create memories and form predictions.

Recall that each HTM region contains a three dimension grid of cells, and each cell can be either on or off (for various reasons). The cells are grouped in vertical columns, and all the cells in a column get the same feed-forward input (either directly from the data, or the output from a lower HTM region). The input, as mentioned previously, is converted to a sparse distributed representation, and then each column gets a unique subset of that input (usually overlapping with other columns, but never the exact same subset). Cells also receive lateral input from other columns of cells.

Which cells get activated within a column depends on a variety of factors, and the use of columns allows the HTM to create temporal contexts. The white paper explains:

Which cells in a column become active depends on the current activation state of those cells:

Real neural networks contain inhibitory signals as well as excitatory ones. The same concept is applied in HTM regions as part of the lateral input that cells receive from their neighboring columns:

Lateral input can inhibit as well as excite cells, and this is how HTM regions form predictions:

The influence one cell has on another is moderated by the strength of the a virtual synapse between, which can have a decimal value between 0 and 1. For additional details, please see the white paper.

Comparing Intelligences

HTM vs. Primitive feed-forward ANNs

As mentioned to previously, HTM regions are used to map noisy data to more stable concepts. Another way to state that is: regions extract temporal patterns out of their input.

In this section I will use the phrase “primitive ANN” to refer to feed-forward networks that do not have contain cycles between cells.

Primitive ANNs do not take into account time, and this severely limits the types of patterns they are capable of recognizing. They do terrible job of anticipating future events because they don’t support a contextual temporal model.

Those fluent in modern English can probably fill in the blank without much trouble, or at least are capable of narrowing down the possibilities to a handful of words.

To illustrate the difference between temporal patterns and those handled by primitive ANNs, consider how you might have reacted had I instead written:

Now it is much more difficult to predict what belongs in the blank. This is due to the lack of prior context. Human senses such as touch, vision, and hearing, absorb and react to symbols (input) in a sequential manner. The sequence these symbols present themselves in usually results in predictable patterns. We can predict the word that belongs in the first blank simply because our brain has heard it stated so many times after the words “What”, “does”, and “that”.⁹

Primitive ANNs do not use any sort of temporal context to make predictions. Instead, they treat input as chunks of raw data that is then mapped directly to an output without consideration for the order in which the data arrived. The output depends entirely on the ANN’s training data and error-correction algorithms.

Therefore, a primitive ANN processes “What does that” in a single gulp, treating it no differently than it would a single word.

Another consideration is efficiency. Numenta’s white paper points out that hierarchies result in less training time for the network “because patterns learned at each level of the hierarchy are reused when combined in novel ways at higher levels.” While I’m not 100% certain (as I did not have the time to test and compare implementations), I suspect that hierarchy accounts for only part of the improvement in efficiency. I think a hierarchy of HTMs would outperform an “equivalent” hierarchy of primitive ANNs because of the memory-efficiencies that temporal contexts provide.

HTM vs. Long short term memory

RNNs have the capability of processing and interpreting temporal sequences unlike the feed-forward networks we just discussed. They accomplish this through loop-back connections among the cells within the network, giving their memory the capacity to process arbitrary sequences of data.

Of the various types of RNNs out there, the LSTM produces some of most the impressive results. Most other types of RNNs are not able to handle long pauses between events in the input data (for example, a period of silence when someone is talking). The LSTM does not have this limitation. Felix Gers thesis explains: ¹¹

In Spatio–Temporal Memories for Machine Learning: A Long-Term Memory Organization, researchers Starzyk and He took inspiration from both the LSTM and HTM to create their own hierarchical neural net based on interconnected long-term memory (LTM) cells: ¹²

I think a comparison of these two approaches deserves more attention and is something that I’d like to research further. I am also interested in reading additional material on this topic, so please, don’t be a stranger and feel free to send relevant material my way, I’d greatly appreciate it!

Academia vs. Numenta

Surprisingly, the first major difference to jump out at me between the HTM and LSTM had nothing to do with algorithms or their respective topologies. Instead, it was the potent stench of dry, lifeless academic language wafting out of all of the documentation I could find on the LSTM that left a most profound impression on me. Indeed, the time I spent reading Ph.D. theses and papers from IEEE journals represented one of the least enjoyable parts of my research purely on account of the terrible prose. Regardless of what some might believe, Universities do not bundle academic degrees with a license to carry a rotting right-hemisphere. The guilty tend to cling to argumentum ad populum as their sole defense. In a just universe, they’d still be working through secondary school.

On the other hand, I have to commend academics for being team players, a concept that appears somewhat foreign to Numenta. Whereas all of the academic papers I read contained countless¹³ references and citations to other work in the field, the same could not be said of any of Numenta’s. None of the three Numenta papers on the HTM ¹⁴ ¹⁵ ¹⁶ contain anything resembling a references section acknowledging the authors of prior work upon which their ideas are based.

Conclusion

From my research I’ve concluded that, at least for me, the HTM and its corresponding learning algorithm(s) warrant further research, as they appear to be very useful and powerful tools for prediction, modeling real-world data, and creating artificial intelligences.

Also, software should not be patentable.¹⁸ ¹⁹ ²⁰ ²¹ ²² The practice should be abolished immediately for the good of mankind and whomsoever disagrees with me on this owes me $1 billion in royalties because I patented the concept of being an asshole.

About the Author

Greg Slepak refers to a sometimes-conscious entity whose primary public activities some recognize as ‘entrepreneurship’, ‘software engineering’, and ‘writing’. He taught himself C at age 12, and continued to learn other programming languages, with present favorites Clojure and newLISP. During his freshman year as an undergraduate at UF, he joined CIMAR and wrote the High Level Planner for UF’s entry to the 2007 DARPA Urban Challenge. In 2008 he founded Tao Effect LLC as part of an effort to make data encryption more user-friendly.

Document History

This is a living document and future updates are a significant possibility. Notifications of updates will be posted to my twitter account, and possibly to my blog as well.

Version	Date	Comment
2.0.0	September 14, 2013	Changed title of paper. Replaced “Numenta’s Unethical Behavior” with “Numenta’s Commendable Behavior” because Numenta updated the agreement. Updated many related sections.
1.0.1	August 28, 2013	Minor edits.
1.0.0	August 26, 2013	Document first published online.

Acknowledgments & References

I’m very much aware of the horrific state of this section, and hope to clean it up a bit in a future update. If you’d like to help me de-wikifiy the citations you are most welcome to contact me with links to better sources.

I would like to thank Andrea Devers for helping me edit and proofread this document, and Bob Jesse for his feedback.

https://en.wikipedia.org/w/index.php?title=Hierarchical_temporal_memory&oldid=567124697 ↩
https://en.wikipedia.org/w/index.php?title=Hierarchical_temporal_memory&oldid=567124697#Similarity_to_other_models ↩
To a Mouse by Robert Burns ↩
http://www.kinostudios.com/mandelbulb.html ↩
https://en.wikipedia.org/w/index.php?title=Linus_Torvalds&oldid=569095816 ↩
http://docs.python.org/release/2.3.4/inst/search-path.html ↩
http://docs.python.org/2/install/↩
http://docs.python.org/2/install/↩
Requiring commas and periods within quotes (when they were not originally there) is an archaic and illogical practice that appears to have been invented as a hack to get around ancient printing problems. Its use in modern English is therefore an idiotic anachronism and should be abolished. If you see anyone doing it please be sure to forward this link to them. Raising hell about the matter is entirely up to you. I personally encourage it.↩
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.↩
Long Short-Term Memory in Recurrent Neural Networks ↩
Starzyk, He Spatio-temporal memories for machine learning: a long-term memory organization.. See page 769 in “IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009”.↩
Actually, there were counted.↩
HTM_CorticalLearningAlgorithms.pdf ↩
Numenta HTM Concepts.pdf ↩
Numenta HTM Learning_Algos.pdf ↩
Starzyk, He Spatio-temporal memories for machine learning: a long-term memory organization.. See page 769 in “IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009”.↩
http://www.forbes.com/sites/reuvencohen/2013/05/08/new-zealand-government-announces-that-software-will-no-longer-be-patentable/↩
http://www.burgess.co.nz/law/why-software-should-not-be-patentable-the-academic-approach/↩
http://endsoftpatents.org/pages/resources-for-lawyers/↩
http://www.tonymarston.net/php-mysql/software-patents-are-evil.html ↩
http://www.javaworld.com/javaworld/jw-03-2012/120316-open-sources.html ↩

Hierarchical Temporal Memory, NuPIC,
and Numenta’s Commendable Behavior

Table of Contents

Introduction