This is our blog. We write about cool experiments, interesting case studies, open talks and technicalities.
Grab the RSS feed
talks and presentations


E!SAVANT – € 1,5 m grant to AI for Influencers

During the last decade, publishers have lost more than two thirds of their advertising revenue to Facebook and Google: Savant will help publishers to regain initiative by offering them an attractive new ad format to present to their customers.

Social SenseMaking service assesses causes and effects of web chatter

The SenseMaking service can detect spikes of discussions on certain topics from online chatter and social media. It breaks them down into subtopics and detects sets of similar discussion spikes by using temporal topic similarity graph analysis. Further, it will provide an analysis of the underlying social networks which produced the events, such as news websites, politicians, blogs, web-trolls, and so on, depending on the issue.  A typical use case of the SenseMaking service is the prediction of the stickiness and the spread patterns of a given topic, such as a newly launched ad by a company or a newly…


Gavagai talks automation in Finance Industry at Middle East Investment Summit

The Middle East Investment Summit in Dubai is a leading industry conference. This year, an entire day of the conference is dedicated to Data & Analytics and Gavagai is honored to host. Lars Hamberg from Gavagai talks about automation in the banking industry and will moderate sessions and panels with IBM Watson and other interesting participants.  

What makes airline passengers happy?

A few of the high-profile carriers, that are currently scoring well, may be dangerously relaxed about their service and may soon fall out of grace, and a few contenders are posed to take their place. It is possible that observed trends in perceived service may be the best predictors for relative changes in passenger satisfaction, overall rating, loyalty and so on and forth. These are some the generalizations emerging from analyzing just the qualitative data, in the form of 20000 passenger reviews, in free-text, covering 22 carriers: Air China, Air France, All Nippon Airways, American Airlines, British Airways, Cathay Pacific…


Findings from 9800 answers to questions regarding gender equality across 7 countries

Saudis and Emiratis are more in favour of discussing financial issues with their spouse than Brazilians, Colombians, Mexicans and Swedes. Saudis and Emiratis are more positive to paternity leave than Russians, Brazilians and Swedes. Saudis are more likely than Russians to encourage daughters who want to give their career top priority. In the west, there are many preconceived notions about arab culture and these new insights provide some interesting perspectives. Here is a Gender Equality Study we carried out in Saudia Arabia, United Arab Emirates, Russia, Sweden, Colombia, Mexico and Brazil ( The study was commissioned by SI, a Swedish…


Take the labor out of qualitative

Meta4 Insight® is an online survey platform that automates the collection of in-depth qualitative data from hundreds of respondents to uncover their deep-seated thoughts and feelings. The platform features Gavagai’s built-in advanced text analytics to enable a quantitative assessment of salient themes and automatically generates shareable outputs, visualized with insightful image clouds. Here is a recorded webinar that describes the use of Meta4 Insight in a case. View case study  


The extraordinary productivity of foul language – Do you and your text analytics solution know these bad words?

By looking into the extraordinary productivity of foul language, this post showcases the ability of the Gavagai’s semantic memories to automatically learn and relate terms in a vocabulary. If you are sensitive to swearing and cursing, you should stop reading now! Foul language, profanity, expletives, and bad words. The creativity of the human mind when it comes to inventing impolite, rude or offensive language is simply amazing. But regardless of how productive a single human being might be, she still will never be able to come up with all the variants of a given bad language concept used throughout an…


What is an efficient way to analyze answers to open-ended survey questions using language technology?

There are challenges to analyzing free-text answers. In the following discussion I will assume that the purpose of the analysis is to achieve an understanding of the themes being discussed and the relative strengths of these themes as well as to get accurate quantification of the numbers of respondents and percentages for each theme. Knowing about multiword expressions. Important concepts in text often consist of more than one word,  for example: “San Francisco”, “no-fly zone”, “give me five”, or “kick the bucket”. An automated tool for analysis of answers to open-ended survey questions needs to understand such multiword expressions or the…


Poor Panama – The Central American state’s name is heavily associated with the events at Mossack Fonseca

Poor Panama. Since the investigation around the Panama Papers was made public earlier this month, mentions of the Republic of Panama in online media has been heavily associated with negative connotations such as “tax evasions”, and “shell companies”, and “leaked documents”. Although more intended as a way of inspecting the state of the semantic memories, the Gavagai Living Lexicon can also serve as a probe into the state-of-mind of the online media. As illustrated in the screenshot of the Lexicon below, “Panama” has, as of this writing, an unfortunate relation to “Mossack Fonseca” (click the image for a larger version). How will…


Making sense of 14793 answers to the question: “What do you most wish for the coming year?”

In order to better understand their customers’ thoughts and wishes for the coming year, AMF – a limited liability life insurance company that is owned equally by the Swedish Trade Union Confederation (LO) and the Confederation of Swedish Enterprise (Svenskt Näringsliv) – sent out a survey to more than 100 000 senior citizens. The survey included the open-ended question: What do you most wish for the coming year? Read the case study on how Gavagai Explorer was used to make sense of the 14793 answers to that question here.


Understanding the Whats and Whys of a Net Promoter study at scale

What concerns Detractors? What make Promoters promoters? In this case study, we use Gavagai Explorer to acquire insight into the answers of an open-ended follow-up question in a Net Promoter Score survey of the Swedish Telecom business with 2535 respondents. Net Promoter Score (NPS) is a metric for measuring the loyalty of a company’s customers based on their feedback. NPS is widely used across all kinds of industries, for instance in the travel and hotel business, software services, and telecommunications. Typically, NPS surveys are conducted continuously, so as to assess the performance of an organization over time. Each survey may…


All your topics are belong to us

Jodel is undoubtedly one of the most interesting online communities at the moment. For those of you who have not already become addicted to the Jodel app, Jodel is an anonymous and localized community where users post and react to real-time updates (both text and photos) about anything and everything that is going on in the local community right now. Although the intended target audience is primarily university students, the appeal of anonymous real-time localized information has spread wide outside the student community. We at Gavagai love interesting text data, and Jodel seems like a veritable goldmine of information to…


Leveraging Text Intelligence to understand Drivers of Customer (Un)Happiness

Imagine that you are responsible for a product. Over a period of time, your customers have provided feedback in the form of ratings and written reviews. The figure below shows how the reviews are distributed across the different ratings. Now, from the figure it appears that most of your customers are happy, that is, have rated the product 4 or 5 on a 1 – 5 scale. It is also clear that a substantial number of customers are not happy at all and have thus rated the product 1 or 2. In between these extremes, there are a number of…


Business bingo – Is your text analytics system up-to-date with current affairs?

In my role as Chief Data Officer at Gavagai, I meet with lots of leads, clients, and data providers. Much of our conversations are carried out in English, and as a non-native speaker, I sometimes find the choice of wordings peculiar, and at times slightly amusing.Touch base, reach out, back-to-back, and help me understand, to name but a few. In the game of buzzword bingo, players tick off pre-defined buzzwords available on a bingo-like board. But what to enter as buzzwords? How would you recognize such a word? In my view, many of the business terms I’ve encountered would qualify as…


What do Czech, Hebrew and Italian have in common?

Answer: they have just been added to the Gavagai Living Lexicon, which is an unsupervised semantic memory that continuously learns language by reading large amounts of online news and social media. You can think of the lexicon as a brain in silico (or, equivalently, as a piece of artificial intelligence) that tirelessly reads online media and learns how terms are related to each other. As of today (2016-02-16), the Gavagai Living Lexicon contains the following 20 languages: Czech, Danish, Ducth, English, Estonian, Finnish, French, German, Hebrew, Hungarian, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish …and more is…


Gavagai and the most recent Greek election

As we have done previously, we again followed Greek editorial and social media in the days preceding last week’s parliamentary election in Greece. The opinion polls published in the weeks before suggested a neck-to-neck race between the incumbent socialist Syriza party and the main contender, the conservative Nea Demokratia. Our friend Haralampos Karatzas took our numbers for analysis on his blog on Greek politics (in Swedish). They showed a very different picture: Syriza garnered more attention. As previously, Syriza, as a controversial party, also was the focus of stronger expressions of sentiment than any other party. Last time around, we…


Pulled pork is now officially mainstream cuisine

Pulled pork which until recently has been the most archetypical hipster meal component in Stockholm has now according our live lexicon become part of the Swedish mainstream. Its closest neighbours in Swedish word space are less than edgy.


Social Media Syndromic Surveillance

The Public Health Agency of Sweden has an initiative called Hälsorapport, which is part of the European system Influenzanet, whose overall goal is “monitor the activity of influenza-like-illness (ILI) with the aid of volunteers via the internet.” The goal of Hälsorapport is similarly to monitor the spreading of diseases in Sweden and to inform the general public, the health care system, and other government agencies about the current health status of Sweden. The monitoring is done by eliciting weekly reports from volunteers regarding their general health status, and in particular regarding any symptoms they might have. According to the website,…


Gamergate tweets and sentiment analysis

The heated #GamerGate debate raging in social media in recent weeks has recently emerged into editorial media as well. A dive into the actual data done by Brandwatch and published by Newsweek last week found that indeed more of the material published on Twitter under that hashtag is vitriolic and confrontative rather than a discussion on media ethics. Newsweek writes: “If GamerGate is about ethics among journalists, why is the female developer receiving 14 times as many outraged tweets as the male journalist? … The discrepancies there seem to suggest GamerGaters cares less about ethics and more about harassing women.”…


Gavagai on the September 14 Riksdag Elections in Sweden

We at Gavagai have been tracking political opinion in Sweden for a while now. We have some observations we can share with you today, just before the parliamentary election to the Swedish Riksdag. The major contenders in the election is the incumbent cabinet coalition of four liberal and conservative parties (“alliansen”: M, C, FP, KD) and the four opposition parties, three of which are left-leaning (“de rödgröna”: S, MP, V) and have to some extent aligned their political goals and ambitions, and finally one party (SD) with a populist and xenophobic agenda which none of the other parties wish to…


Tomorrow’s election in the US

Yes: we, as many others, have followed the US elections in the social media. There are many measurements of social media mentions out there, some thorough, some others little more than simple counting. (The fundamentals of the actual issues, polls, and electoral mechanisms are best summarized by Peter Norvig.) Ethersource has been reading social media posts on the main US presidential candidates for the past year or so. Based on this reading, our analysis is that Obama will stay in the White House. … which appears to be in agreement with what most bookies, pundits, and polls predict today. As…


What Ethersource has Learned About Al-Qaeda in the Past Few Days

This post gives examples of Ethersource’s learning capabilities. It gives examples of automatically learned topics and senses of the use of the term Al-Qaeda in English social media. Ethersource is continuously exposed to massive text streams. On a given day, it sees millions of blog posts, tweets, and forum posts. And it learns. It gobbles up information much the same way a human picks up new ways of using new language constructs. Ethersource learns how the terms it reads are related to each other. It learns about topicality, and it learns about the different senses of the terms. As an…


Tiny Needle in Big Data

Weak signal emission, detection, retrieval and analysis We are repeatedly asked about the predictive powers of Ethersource and we need to underline that Ethersource has no predictive power per se. The reason Ethersource can estimate – or forecast – the percentages of public votes in a television contest or the outcome of a national election with some accuracy is simply that Ethersource reads and understands massive amounts of data. This post will focus on something slightly different, namely the ability to find, understand and analyse one or a few tiny pieces of crucial data in massive amounts of data. It…


Miserable Monday and the Effect of Vacation in Swedish Social Media

Recently, we found out that Miserable Monday might not be anything but a myth. As avid fans of the idea of a complete banishment of Mondays, it will take more than a couple of news articles to convince us. Luckily, Ethersource is more than ready to clear up any doubts. For some time, we have been monitoring the Swedish domain of social media, and how people are feeling when talking about themselves. The curves have been steadily working their ups and downs. However, these past few months we have been noticing a very curious occurrence. First, let’s take a look at…


The Severity of the Assange Affair Reaches Year High

Yesterday marked a year high in the number of people airing their concerns regarding Assange in terms of aggression, either toward Assange himself, the Swedish judicial system, or the possible intervention of the UK Government in order to extradict Assange to Sweden. The graph below illustrates that the steep rise in volume during the past 24 to 36 hours diminishes most of the previous on line activities. Although Assange is more or less inseparable from Wikileaks in that he is heavily associated with the organization, at the moment, the public’s subject matter of concern clearly lie with Assange himself.  …