This is our blog. We write about cool experiments, interesting case studies, open talks and technicalities.
Grab the RSS feed
talks and presentations


Gavagai offices in Palo Alto

Gavagai now has an office in down town Palo Alto, at the Nordic Innovation House on 470 Ramona Street where you are welcome to book a meeting with us!


AI & Automation in the Insights Industry – efficiency gains in excess of 99%

Every day, customer interactions (B2B and B2C) generate massive amounts of unstructured text data all over the world: chats, emails, online mentions, customer support tickets, answers to open-ended survey questions, customer reviews. Volumes will grow even faster with increased usage of chat bots and Voice-to-Text.   The Gavagai Explorer helps you perform advanced analysis, with automated thematic clustering and by scoring themes against multi-dimensional sentiments, to get indisputably robust, consistent, and valuable insights from text data, in more than 40 languages.  The Gavagai Explorer allows you to perform instant qual-to-quant conversion and analysis of unstructured text data without any prior…


CGI to reveal new AI solution with technology from SAS and Gavagai

CGI will present a revolutionary meaning-based end-to-end solution in 45 languages with the latest language AI technology from SAS and Gavagai. Regional pre-launch May 18th in Stockholm. Expect a truly transformative shift toward unsupervised, self-learning, meaning-based solutions, built on real AI, and which thrives on the extreme variability in everyday language usage. Stay tuned!  


Gavagai at CLEF 2015

We presented a short paper at the 6th CLEF 2015 Conference and Labs of the Evaluation Forum in Toulouse on the deliberations from a workshop on Evaluating Learning Language Models we held last Fall with generous support from ELIAS. The presentation raised a fair bit of interest and several requests for a continuation workshop, and we are now motivated to continue by actually implementing the evaluation metrics suggested from the workshop.


Greek elections restarted

The Greek political scene is in full swing preparing for new elections on June 17, little more than a month after the previous elections in May failed to provide a useful basis for forming an executive cabinet. The blogsite Politik i Grekland has published some measurements we made on the relative stature in Greek-language social media for the eleven main parties campaigning for seats in the parliament. Their blog post is in Swedish but the main observation is that left wing party Syriza claims most of the attention – positive, negative, and worried alike – and that the traditional labour party…


Everyday racism in the Swedish blogosphere

We use Ethersource to monitor usage of racist terminology in the Swedish blogosphere. We find that one of the largest demographic groups to use such terminology is young female bloggers. We demonstrate how we are able to cluster and profile users of racist terminology. One of the many benefits of Ethersource is that it is not limited to the standard positive/neutral/negative sentiment palette, but that it can be used to analyze and monitor any type of textually manifested phenomena. Previous examples in this blog include artist popularity, flu trend, aversive language, and positivity vs headache. In this post, we report…


Reputation Mining, May 26 – Istanbul, Turkey

Gavagai’s Fredrik Olsson is in Istanbul today presenting a paper on “Technical Requirements For Knowledge Representation For Reputation Mining On A Realistic Scale” at the LREC 2012 Workshop on Language Engineering for Online Reputation Management.


Measuring the popularity of the contestants in the Eurovision Song Contest using Twitter

In this post, we confirm that Loreen is well placed to win the popular vote in the Eurovision Song Contest final 2012. We use Twitter to measure the popularity of the contestants in ESC 2012. When scaling with Twitter penetration, Sweden gets the highest relative popularity score. This is in line with current betting odds, which unanimously rank Sweden as the most likely winner. Gavagai has previously made accurate forecasts of the distribution of the popular vote in the national ESC final. We have previously shown in this blog that Ethersource monitoring of on-line sentiment can predict the popular vote…


Presentation at “Reality Check: Big Data” – April 26, Moderna Museet, Stockholm

On April 26 at Moderna Museet in Stockholm, Gavagai’s Fredrik Olsson will present Gavagai with a booth at the Reality Check: Big Data conference arranged by Dataföreningen Kompetens and IDG.


Weak signal synonym detection (in Swedish)

As we have previously discussed on this blog, Ethersource constantly and continuously learns new terminology by reading what is written on the Internet. As an example of how Ethersource picks up even weak linguistic signals, we noticed recently that Ethersource suggested the word “tutilurfräs” as a very positive Swedish term. None of us had ever encountered the term “tutilurfräs” before. We looked up the source of this linguistic invention, and found that it originates from a tweet by Swedish punk icon Kajsa Grytt, where she writes that: Å så Pelle!! Å så Hives! Vilket tutilurfräs!! Jag tycker de är genialiska.…


Gavagai’s Fredrik Olsson talks about Big Data

Gavagai’s Fredrik Olsson talks about Big Data in an interview on Twingly’s blog. As Fredrik puts it: “The biggest challenge with Big Data is to stop focusing on Big Data.”


Intelligent Business, April 19 – Grand Hôtel, Stockholm

On April 19 at the Grand Hôtel in Stockholm, Gavagai’s Jussi Karlgren will give a talk on what sort of information flows and new opportunities in Big Data analysis will afford businesses. This will be presentation to the Intelligent Business Convention, with an audience of CIOs and information professionals. He intends to ask the audience how they are prepared to adjust their business practices to fit the new information flows we can expect.


Webbdagarna March 22-23 – Stockholm Waterfront

Gavagai’s Jussi Karlgren will, together with Flemming Bagger, Nordic Segment Leader, Big Data Solutions and Information Governance at IBM, be giving some brief pointers on business opportunities and technical requirements to meet the big data challenge for the Webbdagarna event on March 22-23 in Stockholm.


A Minute-by-minute Popularity Contest – Loreen versus Danny

Despite the fact that the Swedish part of the Eurovision Song Contest final was broadcast live, as a TV viewer it was impossible to get a sense of just how popular the artists were at a given point in time. Having access to Ethersource made sifting out meaningful blog posts and Tweets in real-time a breeze! Below are two graphs outlining, minute-by-minute, the popularity of the two top contestants as expressed in Swedish on-line social media for the day of the final (click the image for a larger version). Note that the popularity score of Loreen’s reaches higher during her…


Fabulous Fest Forecast by Gavagai

Sweden’s contribution to the Eurovision Song Contest this year has been decided in yesterday’s finale with ten contestants. The winner of 2012 year’s Swedish music fest Melodifestivalen is Loreen, with the song “Euphoria”, which landed almost 700000 call-in votes from the at-home TV audience. Using the Ethersource technology, Gavagai followed the on-line sentiment towards all contestants throughout the lead-up to the event. We are pleased to note that Gavagai’s forecast of the results based on expressions of appreciation in blog posts and tweets which was published in the paper edition of Svenska Dagbladet (SvD) in the morning prior to the event –…


SWIRL 2012: Strategic Workshop on Information Retrieval in Lorne

Together with forty or so of my most valued and esteemed jet-lagged colleagues and friends, I attended SWIRL 2012, “the occasional talkshop on the future of information retrieval”, hosted by RMIT in Lorne, near Melbourne, earlier this month. The broadly stated topic for this gathering was to formulate the most fruitful reasonably long range research questions for information retrieval as an academic research field. There were keynote addresses and break-out groups followed by other break-out groups followed by collective authoring sessions to compose a comprehensive consensus view of where we are and where we should be headed. For those already…


On-line Activities Indicate Increasing Flu Trend.

Swedish bloggers and tweeters are increasingly chattering about the seasonal influenza (also covered in an earlier blog post). The trend of the flu signals captured by the Ethersource barometer is clearly on the rise. This should come as no surprise since we are, in fact, looking at the seasonal flu. The interesting thing here is how well the barometer reflects what is reported by the Swedish Institute for Communicable Disease Control (SMI) in their weekly reports. Those reports are based on input from sentinels and laboratories, and by necessity, they lag behind in time: the current report is for the period…


Artist Lars Vilks Attacked. Again.

At 6:45 pm, less than a minute after the news broke on Twitter, Ethersource picked up the first aversive signal relating to tonight’s attack on Lars Vilks. (We’ve covered him previously on this blog). This time, elements in the audience threw eggs at him during an evening lecture in Karlstad. Of the major news outlets, the branch of Swedish Radio located in Karlstad was the quickest in publishing the news, putting it on their national web site at 7:43 pm (SR). The other players were roughly 30 to 45 minutes behind (DN, SVD, SVT). Having Ethersource doing real-time attitudinal analysis…


Hyperdimensionality, semantic singularity, and concentration of distances

This post digs a bit deeper into Ethersource. We discuss the problems of distance concentration and semantic singularity. We argue that Ethersource is not susceptible to these problems. As we have previously discussed in this blog, the number of unique words in social media grows at a rate that far exceeds what we are normally used to when working with collections of more traditional texts. To recapitulate, the lexical variation and growth in New Text is simply astounding; there is a constant and continuous influx of new tokens. We have also previously discussed how Ethersource is designed to handle such…


Tebow, Tebowed, Tebowing: Spelling Variants and Associations

The Wall Street Journal recently ran a piece on the countless ways to spell Tebow. The article reports on spelling variants such as “Teebow”, “Teeeebow”, and “Teeebowww”, all of which are easily recognized using regular expressions. Nevertheless, this is a nice example of how the productivity of the language use of Internet users may pose challenges for keyword-based systems. Ethersource does not use regular expressions to handle this type of variation. On the contrary, it learns terminological variation continuously by observing language use. This means that Ethersource will not only find the type of variants reported in the WSJ article,…


Positiveness Correlates with Holidays, Headache Correlates with New Year's Day

We’ve previously seen that the aggregated overall positiveness of Swedes is cyclical on a weekly basis. Swedes love their days off. We’re now happy to asses what we’ve all suspected for a long time: during Christmas and New Year we all excel in positive thinking! Additionally, the image below reveals that, for some reason, Swedes appear to be very concerned with headaches on the day after the New Year festivities.


Iowa and social media sentiment

We must confess we were a bit wary of extending social media-based prediction into to the minds of Iowans gathering in caucus halls around their state to select their favourite candidate for presidential candidate. Iowan politics is famously local: our measurements are global. As it turns out we were fairly good at picking out what matters. The results gave Mitt Romney, Ron Paul and Rick Santorum more or less equal votes, with others – Newt Gingrich, Michele Bachmann, Rick Perry, Jon Huntsman trailing far behind. Our measurements of social media in the last few days showed that the three most…


GOP Hopefuls in Social Media

The blogsite has published some measurements we made on the relative stature in social media for the main Republican party presidential candidates. Their blog post is in Swedish but the main observations are: Ron Paul has gained a massive boost in mentions lately and is now the most talked about candidate. (This is likely to be a partial effect of the general libertarian and counterestablishmentarian bias of the blogosphere). Michele Bachmann is now the candidate viewed with the most skepticism. (This is likely to be an effect of her recently expressed views on vaccination, which run counter to many…


The Advantage of Ethersource on the TOEFL Synonym Test Compared to other Methods

This post compares the performance of various semantic algorithms Ethersource solves a synonym test with 62% correct answers, while the best runner-up only reaches 52% The results demonstrate the advantage of Ethersource over other relevant methods As part of our internal system performance monitoring, we continuously evaluate Ethersource using a number of standardized benchmark tests. One such test is the synonym part of the TOEFL (Test of English as a Foreign Language). This multiple-choice vocabulary test measures the ability of the subject (in our case, Ethersource) to identify which of four alternatives is the correct synonym to a given target…


Real-time Syndromic Surveillance of Social Media for Disease Symptoms related to Seasonal Influenza

We do real-time monitoring of  social media for disease symptoms there is still no evidence of an outbreak of the seasonal flu in Sweden we observe, however, an increasing trend in the intensity of symptoms The inevitable influenza season will soon come knocking on our doors. How do we know when it has started, and how do we know just how severe it is? To this end, there are on-line tools for syndromic surveillance, aiding individual medical practitioners and national disease control centers alike to combat the spread of influenza. Internationally, perhaps the most well-known monitoring service is Google Flu…