Ethersource technology

Gavagai has developed Ethersource based on decades of research effort in computational and computable semantics. Ethersource is designed to emulate some of the key characteristics of human information processing in a computationally efficient way.

At its core, Ethersource computes and tracks relations between terms in symbols in streaming language data. Through its design, Ethersource has inherent advantages to traditional approaches, both with respect to statistical language models and traditional vector space models, and with respect to knowledge-based systems. We welcome comparisons with any contender!

Meaning: From strings to concepts

Cutting-edge research in representation of meaning

Ethersource is the result of over ten years of world-leading research in computational semantics and knowledge representation, building on several internationally acknowledged scientific breakthroughs, and incorporating groundbreaking further proprietary development. (Refer to our publication list if you are curious about our starting points and results here!) The meaning of Meaning is something we are prepared to hold forth about at length: in practice, when applying our model to practical tasks, it learns real concepts from symbols in a string sequence. This means that e.g. tracking mentions of some concept of interest does not need exact search string matching to achieve coverage - the system generalises from mentions to the conceptual level!

We continuously evaluate Ethersource using several scientifically recognized (semantic) benchmark tests in several languages, including the TOEFL and ESL synonym tests for English. A blog post gives a head-to-head comparison of TOEFL scores between Ethersource and several other computational approaches making similar claims.

Learning

Ethersource is inherently and constantly learning.

Any information processing system (and in particular, any model of meaning) designed to handle live, dynamic language data must be able to update its representation continuously and instantaneously. Ethersource models meaning as meaning emanates from current actual language use; it neither presumes nor relies on external resources. Language is in a constant state of flux, and so is Ethersource.

Completeness

Ethersource models the entire signal.

In contrast with most systems, Ethersource is not based on sampling, and (as noted above) not dependent on external lists of especially pregnant and useful terms or concepts. Ethersource reads the lot and can be used to model anything that has been mentioned, without retraining. If your interests shift you will not need a new version, or new training data, or new human assessors or editors to realign the system --- you simply ask Ethersource what it has read about the new concept of interest and you are good to go!

Scalability

Realistic scale coverage and real-time analysis of human language information streams is a competitive must.

Any realistic information processing system must have readiness to cope with vast and vastly growing information streams. Ethersource is designed specifically to be scalable to any size of information stream. Based on neurophysiologically plausible models of information processing, Ethersource uses a fixed-size memory model whose size remains constant with growth of data.

Multilinguality

Ethersource is inherently language agnostic.

Focussing exclusively on a few resource-rich languages is not a sustainable strategy in an increasingly multilingual world; it is arguably not sufficient today, and will most definitely not be sufficient in the (near) future. Contrary to most other text analysis systems on the market today, Ethersource can handle any language (and by extension, any and all sequential symbol systems). Ethersource is designed to model what is common to all languages, rather than what makes the different from each other: the statistical regularities Ethersource exploits are consistent with current linguistic theory, and are sensitive to the generalities of natural language.

Ethersource currently performs targeted processing on a range of typologically diverse languages, including English, Swedish, Chinese, Arabic, Russian, and Hindi. New languages can be added with minimal effort and without changing the system.

Robustness

Ethersource is built to last.

Real (language) data is not lean, clean and neat. The language we see today, especially in social media, is different from the language of traditional linguistic grammars. New text with new conventions, misspellings, non-standard usage, and code switching poses new challenges for text processing tools. Any model that presumes stability, order and consistency will break down when exposed to actual language use. Ethersource is built on the presumption that "language is in order as it is", and is designed to cope - and thrive - with variability, noise and inconsistencies. In our blog post on the various ways of referring to a quarterback we show some examples of what we mean.

Implementation

Ethersource was developed with Java Enterprise Edition (JEE), the industry standard for enterprise Java computing. Using JEE, Gavagai is able to deliver a secure, robust, and scalable multi-platform application with a high degree of development productivity. Ethersource is deployed in a high performance, virtualized, cloud ready, enterprise production environment.