Invariant with respect to time

Sometimes new ideas arrive at our doorstop. Before this idea arrived on my doorstep I followed the normal course for a software engineer moving from one language to another with some vague promise that xyz if better than method zyx. 


Later you might find that these are new implementations of old ideas.  

In my case the idea was Memex - introduced by US Director of Research in the 40's.

Memex implementations are diverse; some implementations are singularly applied to a specific domain.  Other implementations require extensive programming.

an implementation that I helped evangelize cross-correlated all words, symbols, and numerical data into a compressed graph index.  It can answer questions like a human - what do you know about this ? - how is this similar to other data like this ? with the answers expressed in terms of similarity with associated explanations.  Occasionally the answer was "I don't know" 

The Memex implementation was an index of indexes; sometimes as expressed as a network of networks. The API was simple, given any context xyz give back everything related; it didn't require any pre-designated label.  Although if given one, it could also tell you how similar one item was from another across all the attributes.

The largest "memex" held co-occurrences between:

        *  all words from emails across all time related to a product.

       *  all customers who owned the product

       *  all parts inside product

       *  all part descriptions - who made them, materials, machining, dimensions, vendors

       * all line numbers, positions, drawing, technical references 

       * people on each side of the "how to fix it" conversation

The tech staff called the capability "find without looking", because the entire email was the query and all the similar previous service requests.  The service requests may contain 30 pages of back and forth conversations between everyone and when they are resolved the "how to fix" paragraphs are at the end of the email. Reduced the time to find from days to minutes. 

At one point this capability was utilized to move the needle by $100 million in a month.

Technically the implementation mapped all the observations into millions of matrices that represented the large graph. Graph traversal provided the links between all the attributes.  The Graph wasn't flat conceptually it was partitioned by certain attribute sets deemed as important from a business case perspective.  You could look at how the attributes were mapped into memory; like all the words against all the customers.  All the "words" numbered between 200-400 K.  All the parts 40 million. All the customers several 1000.  Boeing customers are whomever currently owns a plane built by either Boeing or McDonald Douglas.  Small airlines are often bought by regional groups; regional groups are bought out by 1st tier groups. When planes are no longer fit for commercial passenger traffic they are sold to transport or other groups like BLM, the Forest Services or third world nations.  The words were mapped via NER (Named Entity Recognition) streams into types - nouns, verbs, etc (as provided by wordnet), words could be mapped to more than one type.  Ontological reasoners don't work in practice due to inconsistency in mapping by humans.  Machine learning groups learn bias due to confirmation bias and this type of mapping. 

The queries against the index select a bitmap based on the context and utilize that as the starting point for a graph traversal. The bitmap selector is multidimensional, in parallel reaching into the graph to gather all the attributes.  The similarity results can actually tell you "why" - because it contains output links from the network of networks.

Neural Networks dominate the ML communities, unfortunately many of these infrastructures don't provide end-to-end traceability or explanations with regard to classifications. In a few more iterations this will likely emerge as a requirement for industrial use.


      



Comments

Popular posts from this blog

New Wheels, Old Wheels, Threading the Needle

Bahill ( Eye Tracking & Baseball Hall of Fame ), Grad School and Systems Engineering Curricula

Portfolio Derivative Analysis