100 functions, Alan Perlis, and Big Data

Alan Perlis (1922 – 1990), widely regarded to have been one of the founding fathers of Computer Science, once said:

"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."

This quote is from Alan Perlis' Epigrams on Programming (1982). At first glance, it seems like Alan's opinion can be applied in many different scenarios of computer science.

Other golden epigrams from Mr. Perlis

One example is in Lisp, where 100 functions can be composed together in many unique ways, since they were designed to work on one particular structure. A modern analogy that may be understood by a more mainstream audience lies in Java -- imagine writing 100 functions on the List interface versus having to implement 10 functions for Arraylist, LinkedList, etc (what a nightmare!).

Clojure strives to accomplish Perlis' vision by avoiding interdependence of data on code through providing a large library of functions that operate on simple basic data types. Clojure programs don't emphasize data classes and structures, but rather they emphasize the functional code which operates upon them.

Perlis' philosophy has also infiltrated frontend land and modern javascript. What Clojurescript's Om brought to the React landscape was the concept of centralizing all application state within a single immutable store. Every view involved within the user interface is simply one of the "100 functions" operating on the "one data structure", triggering re-renders when changes of relevant data happen. You see this paradigm in the wildly popular Redux project, where, through actions, reducers update the centralized application state, triggering view re-renders within the user interface.

/* Prints
  visibilityFilter: 'SHOW_ALL',
  todos: [
      text: 'Keep all state in single source of truth.',
      completed: true,
      text: 'Ate breakfast.',
      completed: false

All of TodoMVC's application state in one JSON.

As a company in the business of big data, we couldn't help but wonder if we can apply Perlis' wisdom on massive datasets.

Well, we wondered, and then we did.

In our product Traintracks, all sources of data are streamed into a single immutable store. Employees, teams, or departments within an organization can create unlimited virtual layers on top of our data store we call libraries. In a library, anyone can use Virtual ETL to unify dirty sources of data into a clean layer of abstraction to work with. Queries of this data can be accessible through automatically generated immutable APIs. Any person or department can have their very own library to customize their view of the data without affecting someone else's.

What kind of benefits does this bring to our enterprise-scale clients?

  1. Virtual ETL. Data does not have to be physically cleaned for use through one-off scripts, just virtually unified, eliminating up to 90% of manual ETL data scientists undertake every day.

  2. Memorization. Results of previously run queries can be saved and used as parts of future queries involving the same data.

  3. Forever one single source of truth. No unmanageable exponential blowup of copies of data for every single use case within an enterprise.

Here at Traintracks, we believe combining Alan Perlis' wisdom with immutability and virtualization is central to solving big data. If you agree, join us on our journey in helping enterprises convert their data into knowledge.

comments powered by Disqus