The “WHENS” and “WHERES” of Human Activities

Let’s try to think about what decides which information is useful for us at any given moment. For sure there are many factors coming into play such as the location where we happen to be (work, airport, etc.), the current date and time (e.g., Christmas, late at night), the people accompanying us (friends, colleagues, …), or the activity we are engaged in (e.g., driving, filling out tax return form).Next, let’s guess which information is essential for any information system that is supposed to serve us the right piece of information at just the right time. Correct! All these contextual cues and good understanding of the relationship between them is the alpha and the omega of any truly useful AI assistant, expert and recommender system, or a contextual search engine in general.

Sample of human activities extracted from Twitter demonstrating the dependencies between activities, time and selected locations (school, airport, various locations).

Context of Human Activity

A lot has been written about the time, location, and even social context. In this publication, we focus on human activity and its relation to time and location dimensions. We break the problem down into three tasks:

  • (1) obtaining a large set of open-domain human activities,
  • (2) extracting the spatiotemporal context of these activities at the moment they were performed,
  • (3) predicting activity given user context and vice versa.

Related Publication

Understanding Context for Tasks and Activities
J. R. Benetka, J. Krumm (Microsoft), and P. N. Bennet (Microsoft).
In: ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’19), March 2019.
info / paper (PDF) / slides

Activity Extraction

Many prior works address the first task by using manually curated lists of activities. The shortcoming of such approach is that it hardly covers the wide spectrum of one’s potential activities. More scalable way to go is application of natural language processing (NLP) methods on a suitable textual corpus. If we realize that verbs or verb phrases are, by definition, sentence constituents that introduce an action (e.g., feed, go to) and nouns or noun phrases typically fulfil the role of verbs’ arguments (e.g., ducks, popular café), we can create a simple grammar that isolates verb+noun pairs that describe an activity (e.g., feed ducks, go to popular café). This is one of the situations when a straightforward solution takes us a long way – this time in capturing thousands of diverse human activities.

To extract activity descriptors from free text, we isolate the linguistically natural structure of verb+noun pairs (alternatively verb phrase+noun phrase, or combination) using a syntactic parser and a part-of-speech grammar.

Twitter as Self-reporting Platform

Next step is to choose a suitable data source which ideally carries evidence about people’s activities along with information about their location and time. Social networks make a perfect candidate: they can be seen as large crowdsourcing platforms with the potential to reveal the global picture of human activity behavior. Twitter especially stands out for a number of reasons: 1) people use it to advertise their doings [1], 2) most of the posts are ‘now’-oriented, and 3) tweets come with a timestamp and geolocation (raw or via Foursquare).
A seemingly prohibitive disadvantage of using Twitter as an evidence source is the inherent bias [2] of its content. Fortunately, as we demonstrate in the paper (see Figure below), self-censorship is only affecting which activities people decide to share, not the contextual details of the ones they post about. This finding allows for creation of trustworthy spatiotemporal profiles of activities that we do capture.

Temporal profiles (normalized) of selected activities in manually collected ground truth dataset ATUS (red) and Twitter (blue) and their divergence. High correlation confirms credibility of Twitter as a source of self-reported activities.

Extraction outcomes

As shortage of data is the least of a problem when working with Twitter data, after analyzing Foursquare-linked tweets from the US region for over a year, we extracted more than 100,000 distinct activity descriptors on many granularity levels (from general notions such as `thinking‘ to very specific endeavours like `practicing egg drop soup delivery skill‘). Moreover, by piggybacking on the Foursquare categorization of places we could model location in terms of types (e.g., gym, airport), not coordinates in space.

Most frequent activities

The notorious bias of Twitter is evident in the distribution of activities rather than in their mere presence. Most commonly, people mention activities related to travel/transportation, eating & drinking, or entertainment. The diversity, however, seems to be endless. Check out the ranked list below:

Probabilistic activity models

The extracted instances of activities and their spatiotemporal patterns present a fertile ground for establishing models that embrace human activity as a contextual feature. In the paper, we go on and actually build the models that predict a person’s activity given her context and vice versa.

The plot below illustrates the probabilistic distribution of locations where people “drop off a kid” throughout the day.

Relative location probabilities changing in time for ‘dropping off kid’ activity as returned by our model.

Insights & Takeaways

  • Social networks can be used as a source of thousands of open-domain activities.
  • By focusing on ongoing activities extracted from Twitter we can reliably profile their spatiotemporal patterns.
  • Clever combination of multiple data sources (i.e., Twitter and Foursquare) allowed for modeling locations as categories rather than coordinates which subsequently resulted in drastic dimensionality reduction.
  • People DO TWEET about their activities FROM OUTER SPACE!


[1] Is it really about me?: message content in social awareness streams., M. Naaman et al., Proceedings of the CSCW. ACM, 2010.
[2] I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience., A. E. Marwick and D. Boyd, New Media & Society, 2010.