Posts

Elasticsearch (Partial Memex Example) using CV-19 terms of interest

Image
 Covid-19 Pharma Terms of Interest (originally asked in May 2020 time frame):                    cd4 , cd8                    lgG, IgA                    nsp3, nsp4                    orf3a, orf8 Elasticsearch can be configured to store all words contained with unstructured text documents, using fielddata: true mapping.  During index and also query time, standard English words are recognized.  Alternate analyzers can be built to meet domain needs.  Ontological data can be utilized inside of Elasticsearch to provide domain specific synonym list like LOINC.  Typical analysis is driven by keyword matches. The picture below indicates a set of matches across the keyword sets within COV-19 related abstracts  The matching abstracts can be reviewed sequentially for mat...

PubMed Comments - Similarity of Published Articles

Researchers require finding other published materials to help determine supporting evidence associated  with an internal hypothesis in specific functional terms under study.  A research team will have specific term set.  Some terms will match and some terms won't and it is that detailed differences that helps their endeavor.  PubMed is a great resource built over the years to return matches to keyword search.  In many domains groups collaborate on naming conventions sometimes captured in ontological formats. MESH - https://meshb.nlm.nih.gov/search - is one of many standards within the Medical community in use of proper terminology when reporting or publishing articles. LOINC - https://loinc.org/ - is another mapping standard HL7 -  And other datasets like: Searchable databases - CPT, Rx, ICD9 Access to Therapeutic Specific Databases Data aggregation from EMR, claims, Rx data as listed by AdviseClinical LLC -  They are associates of Dr Galpin ...

Memex - Pandemic Dataset Example

Image
Latest Covid-19 related publications are available every day, thankfully. Latest *.tar.gz files get unrolled onto the file system.   The metadata.csv file is 51Mb in size; it contains all the publication abstracts.  Each record contains more than 1000 fields.  The abstract text is one field. Inside this file are references to original PubMed articles. There are approximately 270K publications in this data set.  The number of fields per record is more than 1500 fields in one example, shown below: In this example the user searched for the author "Korth"; the above record also indicates "liver", as a significant finding.  Domain experts across many domains have good personal memories, the word "liver" is one word of interest out of many that could be of interest.  Domain experts look for word combinations co-occurring within the same abstract or text. In this data set they may also look for new publications, or novel findings or findings similar to a hypo...

Julia Lang - Enabling Mathematical Modeling

 In 2016, I started using Julia Lang based on its open source availability and the extensive published conference videos posted on YouTube.  I liked Julia for several reasons: (1) linking to C, C++, FORTRAN, Python libraries, (2) GPU interface support, (3) REPL, (4) parallel compute interfaces and (5) differential equations. Differential equations are at the core of most domains, enabling compact mathematical equations relating subsystems to each other.  There are many programming language utilized for a variety of tasks including web services, databases, operating systems, distributed systems, file systems, etc. There are very few programming languages dedicated to ease scientific and engineering modeling and exploration. With ease means that first level programming entities such as matrices and vectors. Famous proprietary examples include Mathematica,  Macsyma, Matlab. Julia provides a shareable, non-proprietary implementation: https://juliapackages.com/p/diffeqtut...

QEEG - Quantative EEG - 3-sigma reliable mental health diagnostic fda-approved method; original patent in public domain

High Level Summary  QEEG practitioners indicate medical assessment: QEEG Normative DB  - QEEG Technical Paper  -  Brain Center - University of Houston  ; U of Washington    UW2   UW3 Software -  Commercial - Thatcher  - Biof Open Source - EEGLAB  , S-LORETA 1) QEEG as practiced by a psychiatrist enables objective means to determine drug and dosage; as opposed to current practice of prescribing random drugs based on marketing claims with no means to check for objective improvements. 2) QEEG as practiced for assessment of TBI or mTBI is more accurate than MRI; brain can swell up to two days after incident.  MRI's are often taken just after an head injury incident; EEG changes after 20 minutes of incident. In military TBI, the IED blast injures the closest personnel.  The question is how to determine if the concussive force affected others within the team in a field environment ?  Fortunately, portable EEG devices with Q...

Memex and what it all means ...

memex-1 Memex conceptually cross-links all data into single compressed graph of indexes. Searches against the Memex can be as simple as a keyword search or as complex as an entire article about a domain an entire numerical vector representing on IOT sensor stream combination of an analysis article and numerical vector representing on IOT sensor stream combinations of sets of analysis articles and set of numerical vector representing all IOT sensors associated with an event, a route or a recipe With the expectations that the search would return similar articles, similar numerical vectors, similar sets of articles and numerical vectors or combinations thereof, ordered by similarity.  A  natural query format is supported as if the Memex is like an entire set of enterprise experts whereby another person could query simply with a few keywords or specifically what is similar to an actual textual description contained in an email or other document. The API is not like RDF or SPARQL o...

Agile Test, Moving V&V Back Upstream Process

Systems Modeling, Design, Development, Integration, Test, Validation, Certification are the waterfall process stages.  Unfortunately, it is commonplace practice to be optimistic schedule wise and correspondingly underestimate project complexity.  Design and development groups push their schedule out and this also shortens the integration, test, validation and certification.   1) Zachman Framework 2) Digital Twin / MBSE / SysML 3) Brainstorming Session - Dr. Bahill 4) Periodic non-advocate tech review - Dr. Bahill 5) Risk Management - order by risk 6) DARPA Style Teaming - small cross-functional teams 7) Teaching alternate analytic tool sets 8) Preserving Context / Correlated Index / Associative Memory / Similarity Metrics Zachman Framework Interacting with Senior IPT Leads and their Initial Models & Designs, I would advocate and demonstrate benefits of utilizing the Zachman Framework, a method to summarize the system state across all necessary functional elements. Dig...