Memex and what it all means ...

memex-1

Memex conceptually cross-links all data into single compressed graph of indexes. Searches against the Memex can be as simple as a keyword search or as complex as

  1. an entire article about a domain
  2. an entire numerical vector representing on IOT sensor stream
  3. combination of an analysis article and numerical vector representing on IOT sensor stream
  4. combinations of sets of analysis articles and set of numerical vector representing all IOT sensors associated with an event, a route or a recipe

With the expectations that the search would return similar articles, similar numerical vectors, similar sets of articles and numerical vectors or combinations thereof, ordered by similarity.  A  natural query format is supported as if the Memex is like an entire set of enterprise experts whereby another person could query simply with a few keywords or specifically what is similar to an actual textual description contained in an email or other document.

The API is not like RDF or SPARQL or other graph query language requiring complex resource description format (RDF). An example of a "simple" SPARQL query is at this URL:

https://www.w3.org/TR/rdf-sparql-query/#WritingSimpleQueries  

The Memex query utilities include:

  1. ability to find reference articles
  2. ability to find specific numerical profiles
  3. ability to find specific reference articles pertaining to specific numerical profiles
  4. ability to define, then reference article collections and corresponding collections of numerical profiles.

Some say that we already have such systems but the truth is that most modern data mining / ML pipelines at their very first step constrain result sets / analysis by some combination of filtering, averaging, linear trending or reduction in some way in order to constrain the computational complexity and avoid the so-called "curse of dimensions" argument.

For some domains, the reductionist principles enables sufficient fidelity to indicate the preferences for common consumable items like shoes, clothes and movie recommendations. 

--------------------------------------------------------------------------------------------------------------------

Memex - https://www.youtube.com/results?search_query=Darpa+Memex 

As Lead Tech Evangelist within Boeing Enterprise Strategic Growth group I was assigned to move Memex technology into commercial space.  At the request of Dr. Allen Adler,   http://www.boeing.com/company/bios/allen-adler.page ; he wanted to know what domains / applications where Memex DID NOT WORK.  The only failed assignment was earthquake prediction - although I think now that I didn't have enough data as there are two teams in the world that can predict 3 of 10 and 7 of 10 over 6.5 earthquakes.  One commercial Memex created more than $100 million / month in a logistics setting (a publicly acknowledge case study); see other blog posts titled "Wheels".

Google / Alphabet undoubtedly has one of the best Memex implementations.  As an example type ahead predictions of next words on queries is based on large co-occurrence vectors (and matrices) of all words ever utilized across all queries.  Google's co-occurrence query matrices are available publicly to window level 5.

The Memex that I utilized could transform complete database content, free text from emails, technical manuals, numerical data into millions of co-occurrence matrices stored in a complex, compressed graph of triples.  The triples were not "flat" like an RDF or NeoJ graph, instead users could configure the items considered to be entities.  Triple transitive relationships were expressed as E-A-E; E - Entity and A - Attribute.  Instead of a flat space representation the Entity nodes were weighted and elevated as significant nodes that a customer might care about.

There is a term in operations research optimization called a convex hull - hyper-planes, as discussed here: https://www.youtube.com/watch?v=ylqqa-TiDvg .  I always visualized the Entities forming edges of the hyper-planes and the attributes are values inside a matrix that related the two entities together.  By traversing all the entity edges an algorithm could collect all the co-occurrence triples - E-A-E's forming a subset of a large graph.  Therefore even a small keyword search often resultsin somewhat large graph.  

The next question is how do you compare graphs ?

The answer is via metrics - and these metrics are at the heart of all data science / machine learning.

Most groups filter out attributes at this point - because we have a reductionist tendency in data mining.

Other groups keep everything and continually add more, creating larger and large index of indexes; some people call this deep learning; others qualify deep learning as only those groups using NN tech.

At any rate, similarity metrics are needed; there are several methods in practice derived from information theory; many groups utilize  kNN or deep learning or other trade-secret formulation to derive how one complex graph is closer to another.

----------------------------------------------------------------------------------------------------------


Comments

Popular posts from this blog

New Wheels, Old Wheels, Threading the Needle

Bahill ( Eye Tracking & Baseball Hall of Fame ), Grad School and Systems Engineering Curricula

Portfolio Derivative Analysis