Volunteered Geographic Information (VGI) in the form of actively and passively generated spatial content offers
great potential to understand peoples´ activities, emotional perceptions and mobility behavior. Realizing this
potential
requires methods which take into account the specific properties of such data, for example its heterogeneity,
subjectivity, spatial resolution, but also temporal relevance and bias. The aim of the project was to develop visual
methods for analyzing human behavior from location-based social media and movement data.
Theoretical and conceptual frameworks
The facet framework allows characterization and comparison of collective reactions based on the following dimensions:
spatial, temporal, social, thematic and interlinkage (Dunkel et
al. 2018).
A behavioral model was introduced that summarizes people’s reactions under the influence of one or more events
(Burghardt et al., in press). In
addition, influencing factors are described using a context model, which makes it possible to analyze visitation and
mobility patterns with regard to spatial, temporal and thematic-attribute changes.
In analyzing behaviors, we aim at identifying general or repeated patterns, but what do we mean by “patterns”? In
our theoretical model for pattern discovery, we proposed a formal definition of the concept of pattern in data,
explained how patterns are formed by relationships between data items, discussed different types of patterns,
described
operations that can be done with patterns that have been discovered, and interpreted the established principles of
visual
representation of data from the perspective of enabling correct and effective pattern discovery (
Andrienko et al. 2021).
We performed an exploratory empirical study of how people identify and interpret data patterns in complex
cartographic representations of spatial distributions and how they involve these patterns in reasoning and knowledge
building.
Eye tracking and voice recording were used to capture this process (
Andrienko et al. 2022). We considered several existing theoretical
models of
visually supported reasoning and knowledge building and found that none of them taken alone can adequately describe
the processes we observed, but a combination of three particular models, including the pattern discovery model, may
provide sufficient expressive power.
Additional Resources:
LBSN
Structure - A common language independent, privacy-aware and cross-network social-media data scheme,
implementing the four facets of the conceptual framework (Dunkel et al. 2018)
Bibliography: Theoretical and conceptual frameworks
Dunkel, A.; Andrienko, G.; Andrienko, N.; Burghardt, D.; Hauthal, E. and Purves, R. (2018). A conceptual
framework for studying collective reactions to events in location-based social media. International Journal
of
Geographical
Information Science, 33:4, 780-804. https://doi.org/10.1080/13658816.2018.1546390
Burghardt, D.; Dunkel, A.; Hauthal, E.; Shirato, G.; Andrienko, N.; Andrienko, G.; Hartmann, M.; and Purves,
R. (in press). Extraction and Visually Driven Analysis of VGI for Understanding People’s Behavior in
Relation to
Multi-Faceted Context. Springer, 241-270.
Andrienko, N., Andrienko, G., Miksch, S., Schumann, H., & Wrobel, S. (2021). A theoretical model for pattern
discovery in visual analytics. Visual Informatics, 5(1), 23–42. https://doi.org/10.1016/j.visinf.2020.12.002
Andrienko, N., G. Andrienko, S. Chen, and B. Fisher. 2022. “Seeking Patterns of Visual Pattern Discovery for
Knowledge Building.” Computer Graphics Forum 41 (6): 124–48. https://doi.org/10.1111/cgf.14515.
Löchner, M.; Dunkel, A. and Burghardt, D. (2018). A privacy-aware model to process data from location-based
social media. VGI Geovisual Analytics Workshop, colocated with BDVA 2018. Konstanz, Germany, 19. Okt 2018.
In:
BURGHARDT, Dirk, ed., Siming CHEN, ed., Gennady ANDRIENKO, ed., Natalia ANDRIENKO, ed., Ross PURVES, ed.,
Alexandra DIEHL,
ed.. VGI Geovisual Analytics Workshop http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1tc0wl382uqkr0
Generic Methods
Representativity and bias in location based social media
Geosocial media is increasingly recognized as an important resource, for example, to support the analysis of
visitation patterns, assessing collective values, or improving human well-being through fair and equitable design of
public green spaces. To this end, analysts must first assess "what" is collectively valued, "where", by "whom" and
"when”, to understand the how and why of human behavior. However, the reproducibility of human behavior research is
often impaired due to several biases affecting Geo Social Media. VGI and geosocial media are often noisy, limitedly
representative, difficult to fully sample, and often shared through incompletely documented and opaque application
programming interfaces (APIs). This means that samples, populations, and the phenomena being observed often change
between studies. For this reason, we sought to develop a robust and transferable ‘workflow template’, for assessing
human activities and subjective landscape values.
In a study by (Dunkel et al. 2023a), we explicitly
limited the initial set of collected data to a narrow thematic filter -
worldwide reactions to sunset and sunrise. This allowed us to compare parameter effects in isolation, test the
robustness of existing measures and identify opportunities for improvement. Our results show that it is possible to
disconnect the study of landscape preference from overall visitation frequencies, a common bias that analysts
encounter in VGI and geosocial media analysis.
Code: Signed chi implementation (Python)
DOF=1CHI_CRIT_VAL=3.84CHI_COLUMN="usercount_est"defcalc_norm(grid_expected:gp.GeoDataFrame,grid_observed:gp.GeoDataFrame,chi_column:str=CHI_COLUMN):"""Fetch the number of data points for the observed and
expected dataset by the relevant column
and calculate the normalisation value
"""v_expected=grid_expected[chi_column].sum()v_observed=grid_observed[chi_column].sum()norm_val=(v_expected/v_observed)returnnorm_valdefchi_calc(x_observed:float,x_expected:float,x_normalized:float)->pd.Series:"""Apply chi calculation based on observed (normalized)
and expected value
"""value_observed_normalised=x_observed*x_normalizeda=value_observed_normalised-x_expectedb=math.sqrt(x_expected)chi_value=a/bifbelse0returnchi_valuedefapply_chi_calc(grid:gp.GeoDataFrame,norm_val:float,chi_column:str=CHI_COLUMN,chi_crit_val:float=CHI_CRIT_VAL):"""Calculate chi-values based on two GeoDataFrames
(expected and observed values)
and return new grid with results
"""grid['chi_value']=grid.apply(lambdax:chi_calc(x[chi_column],x[f'{chi_column}_expected'],norm_val),axis=1)# add significance column, default False
grid['significant']=False# calculate significance for both negative and positive chi_values
grid.loc[np.abs(grid['chi_value'])>chi_crit_val,'significant']=True
By using the signed chi square test (with respect to the sample topic of reactions to the sunset and sunrise), we
can identify collectively important places and areas, independent of overall user frequencies. The illustrated
process can be seen as a blueprint, offering a workflow that can be adapted and transferred to other contexts,
beyond reactions to the sunset and sunrise. To this effect, the code for data processing and creation of figures is
fully provided in several notebooks shared in a separate data repository
(Dunkel et al. 2023b). Furthermore, the use of abstracted,
estimated non-personal data based on HyperLogLog, demonstrates a practically viable solution, supporting a shift
towards privacy-preserving and ethically-aware data analytics in research on human preferences.
Bibliography: Representativity and bias in location based social media
Dunkel, Alexander, Maximilian C. Hartmann, Eva Hauthal, Burghardt Dirk, and Ross S. Purves. 2023a. “From
Sunrise to Sunset: Exploring Landscape Preference through Global Reactions to Ephemeral Events Captured in
Georeferenced Social Media.” PLoS ONE 17 (1). https://doi.org/10.1371/journal.pone.0280423.
Dunkel, Alexander, Maximilian C. Hartmann, Eva Hauthal, Burghardt Dirk, and Ross S. Purves. 2023b.
“Supplementary Materials for the Publication ‘From Sunrise to Sunset: Exploring Landscape Preference through
Global Reactions to Ephemeral Events Captured in Georeferenced Social Media.’” OpARA.
https://doi.org/10.25532/OPARA-200.
Methods for comparative analyses
Time series shapes, Shirato et al. (2023)
Tactical Analysis in Football, Andrienko et al. (2021)
Identifying, exploring, and interpreting time series shapes in multivariate time intervals
To analyze a behavior unfolding during a long time period, we may need to divide it into parts called episodes. We
developed an approach to analyzing episodes of behaviors described by multivariate numeric time series data (Shirato et al. 2023). It involves recognition of
predefined types of patterns in the temporal variation of the singular variables within episodes and visually
supported discovery of more complex patterns made by temporal relationships between the simple patterns.
Constructing Spaces and Times for comparative analysis
We developed a generic visual analytics framework for identifying, exploring, and comparing patterns of collective
movement in different classes of situations
(Andrienko et al. 2021). It includes a combination of visual
query techniques for flexible selection of episodes of situation development, a method for dynamic aggregation of
data from selected groups of episodes, and a data structure for representing the aggregates that enables their
exploration and use in further analysis. The approach was tested in application to tracking data from football
games. It enabled detection and interpretation of interesting general patterns of team behaviors and revealing
behavior differences between classes of game situations.
Comparison for Multi-item Data Streams
For comparing data streams involving multiple items (e.g., words in texts, actors or action types in action
sequences, visited places in itineraries, etc.), we propose Co-Bridges (
Chen et al. 2021), a visual design that uses river and
bridge
metaphors, where two sides of a river represent data streams, and bridges connecting temporally or sequentially
aligned segments of streams are used to show commonalities and differences between the segments.
Bibliography
Gota Shirato, Natalia Andrienko, Gennady Andrienko (2023) Identifying, exploring, and interpreting time
series shapes in multivariate time intervals, Visual Informatics, 2023,
https://doi.org/10.1016/j.visinf.2023.01.001
Gennady Andrienko, Natalia Andrienko, Gabriel Anzer, Pascal Bauer, Guido Budziak, Georg Fuchs, Dirk
Hecker, Hendrik Weber, and Stefan Wrobel (2021). Constructing Spaces and Times for Tactical Analysis in
Football. IEEE
Transactions on Visualization and Computer Graphics, 2021, vol. 27(4), pp.2280-2297
https://doi.org/10.1109/TVCG.2019.2952129
S. Chen, N. Andrienko, G. Andrienko, J. Li and X. Yuan, "Co-Bridges: Pair-wise Visual Connection and
Comparison for
Multi-item Data Streams," in IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2,
pp.
1612-1622,
Feb. 2021, doi: 10.1109/TVCG.2020.3030411.
Sentiment, emotion and activity analysis
Since geosocial media are used to state opinions, express emotions, or document
experiences, they contain a lot of subjective information. The recognition of such
subjective phenomena is usually done via natural language processing, which is by
now quite sophisticated, but can hardly recognize irony or sarcasm, for example,
and is often applied limited to one or a few languages. Promising solutions have
been achieved in this context with emojis, which have become extremely popular in
geosocial media and are available in steadily growing numbers.
A use of emojis to investigate subjectivity was implemented in a study by Hauthal et al.
(2021), proposing the measure of typicality. Typicality is a relative measure specifically tailored
for geo-social media that determines how typical a particular object of interest (e.g., emoji or hashtag) is within
a sub-dataset compared to the total dataset. Sub-datasets may be formed spatially, temporally, thematically, etc.
Typicality is calculated by the normalized difference of two relative frequencies and returns a positive (= typical)
or negative (= atypical) value.
Typicality was used to identify emojis in the previously mentioned global Instagram dataset that provide information
about the context of the user while observing the event. On the one hand, these emojis deliver information about
activities performed and on the other hand also about perceived landscape features in the immediate surroundings. It
was found that emojis provide more detailed information in this regard than the hashtags contained in the same
dataset. Moreover, location-specific emojis were identified, which are chosen depending on the location and match
the features of the physical environment, as shown by matching them with geographic attributes. This proves that
emojis are not randomly chosen, but provide insights not only into the user's situational context, but also into
their perception and thus appreciation of certain aspects of the environment.
Bibliography
Hauthal, E., Dunkel, A., & Burghardt, D. (2021). Emojis as contextual indicants in location-based
social media posts. ISPRS International Journal of Geo-Information, 10(6).
https://doi.org/10.3390/ijgi10060407
Hauthal, E.; Burghardt, D., Fish, C. and Grifin, A. (2020). Sentiment Analysis. International
Encyclopedia of Human Geography (2nd Edition), 169-177, https://doi.org/10.1016/B978-0-08-102295-5.10593-1
Stahl Olafsson, Anton, Ross S. Purves, Flurina M. Wartmann, Maria Garcia-Martin, Nora Fagerholm, Mario
Torralba, Christian Albert, et al. 2022. “Comparing Landscape Value Patterns between Participatory Mapping
and Geolocated Social Media Content across Europe.” Landscape and Urban Planning 226 (October): 104511.
https://doi.org/10.1016/j.landurbplan.2022.104511.
Koblet, Olga, and Ross S. Purves. 2020. “From Online Texts to Landscape Character Assessment: Collecting and
Analysing First-Person Landscape Perception Computationally.” Landscape and Urban Planning 197 (May):
103757. https://doi.org/10.1016/j.landurbplan.2020.103757.
Activity analysis for landscape and urban planning
Even though individual people perceive landscapes and their attributed values differently, there are landscapes
which the majority of people perceive as scenic and beautiful. These prolific landscapes (e.g. Preikestolen in
Norway or Wildkirchli in Switzerland) are often depicted by characteristic motif images, which are clusters of
images all taken from a similar viewpoint and angle. Which landscapes become popular is driven by propagation of
landscape or nature appreciation through travel guides or art from the romantic area, popularizing a selective
subset of landscapes; thus not a new phenomenon. Today, tourism agencies and other influencers (e.g. celebrities,
companies, movies, songs) can shape landscapes through social media promotion by planting seed images that people
will try to recreate and by doing so form new motifs.
Privacy-Aware Visualization of VGI to Analyze Spatial Activity, Dunkel et al. (2020)
Analyzing Flickr images to identify popular viewpoints
By reaching millions of people and potentially influencing their future visiting plans, this social media induced
tourism can have drastic physical consequences on the local environment, infrastructure and people (add citation).
In a paper by Hartmann et al.
(2022), we created an operationalizable conceptual model of motifs that is able to identify, extract and
monitor prone landscapes based on geotagged social media data. More specifically, the proposed pipeline leverages
creative-commons Flickr images from the YFCC100M dataset within the European Nature 2000 protected areas which
represent a network of breeding and resting sites within important landscapes for rare and threatened species.
Analysis of the motifs revealed that 65% depict cultural elements such as castles and bridges whereas the remaining
35% contain natural features that were biased towards coastal elements like cliffs. Ultimately, the early detection
of emerging motifs and their monitoring allows the identification of locations subject to increased pressure which
enables managers to explore why sites are being visited and to take timely and appropriate actions (e.g. allocation
of infrastructure such as toilets and rubbish disposals or visitor routing).
Privacy-Aware Visualization of VGI to Analyze Spatial Activity
In recent years user privacy has become an increasingly important consideration. Potential conflicts often emerge
from the fact that VGI can be re-used in contexts not originally considered by volunteers. Addressing these privacy
conflicts is particularly problematic in natural resource management, where visualizations are often explorative,
with multifaceted and sometimes initially unknown sets of analysis outcomes. In a paper by Dunkel et al. (2020), we present an integrated and
component-based approach to privacy-aware visualization of VGI, specifically suited for application to natural
resource management. As a key component, HyperLogLog (HLL)—a data abstraction format—is used to allow estimation of
results, instead of more accurate measurements.
Identifying tranquil areas is important for landscape planning and policy-making. Research demonstrated
discrepancies between modelled potential tranquil areas and where people experience tranquillity based on field
surveys. Because surveys are resource-intensive, user-generated text data offers potential for extracting where
people experience tranquillity. In a study by Wartmann et
al. (2021), we explore and model the relationship between landscape ecological measures and experienced
tranquillity extracted from user-generated text descriptions.
Evaluation of potential keywords yielded six keywords associated with experienced tranquillity, resulting in 15,350
extracted tranquillity descriptions. The two most common land cover classes associated with tranquillity were
arable and horticulture, and improved grassland, followed by urban and suburban.
In the logistic regression model across all land cover classes, freshwater, elevation and naturalness were positive
predictors of tranquillity. Built-up area was a negative predictor. Descriptions of tranquillity were most similar
between improved grassland and arable and horticulture, and most dissimilar between aarable
and horticulture and aurban. This study highlights the potential of applying natural language
processing to extract experienced tranquillity from text, and demonstrates links between landscape ecological
measures and tranquillity as a perceived landscape quality.
Indicators of cultural ecosystem services
In our increasingly urbanized world, the cultural ecosystem services (CES) provided by urban nature play a crucial
role in enabling and maintaining the well-being of urban dwellers. Despite the increased number of studies
leveraging geosocial media data for more efficient and socio-cultural-oriented CES assessment, the high complexity
and costs associated with existing methods such as manual or automated image classification hinder their application
in urban planning and ecosystems management. A study by Gugulica et al. (2023) introduces a novel method that
draws on the semantic similarity between word2vec word embeddings to classify large volumes of geosocial media
textual metadata and quantify indicators of CES use. We demonstrated the applicability of our approach by
quantifying spatial patterns of aesthetic appreciation and wildlife recreation in the green spaces of the city of
Dresden based on the classification of >50,000 geotagged Instagram and Flickr posts. Moreover, we analyzed and
mapped semantic patterns embedded in geosocial media and gained essential insights that can contribute toward a
context-dependent assessment of CES use, which in turn can help inform decision making for more sustainable planning
and management of urban ecosystems. The performance evaluation of the classification proves the validity of the
proposed unsupervised text classification approach as a practical, reliable, and more efficient alternative to
laborious and expensive annotation efforts required by manual or supervised classification methods.
Social Media Images for Urban Bicycle Infrastructure Planning
Object Detection for Urban Bicycle Infrastructure Planning, Knura et al. (2021)
Not only descriptive textual information and emojis can be used for the analysis of geosocial media data, but it is
also possible to use the image information directly. As an application for urban bicycle infrastructure planning, an
object recognition algorithm based on convolutional neural networks was used to identify bicycles and potential
parking spaces. The research and development work was carried out as a cooperation of a Young Research Group within
the framework of the priority program VGIscience (Knura et al.
2021). The research on object recognition was carried out in the COVMAP project, the processing of social media data
and the development of methods for visual analysis was realized by the projects EVA-VGI and TOVIP.
Bibliography
Dunkel, Alexander. 2021. “Tag Maps in Der Landschaftsplanung.” In Handbuch Methoden Visueller Kommunikation
in Der Räumlichen Planung, edited by Diedrich Bruns, Boris Stemmer, Daniel Münderlein, and Simone Theile,
137–66. Wiesbaden: Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-29862-3_8.
Dunkel, Alexander, Marc Löchner, and Dirk Burghardt. 2020. “Privacy-Aware Visualization of Volunteered
Geographic Information (VGI) to Analyze Spatial Activity: A Benchmark Implementation.” ISPRS International
Journal of Geo-Information 9 (10): 607. https://doi.org/10.3390/ijgi9100607.
Hartmann, M. C., Koblet, O., Baer, M. F., & Purves, R. S. (2022). Automated motif
identification: Analysing Flickr images to identify popular viewpoints in Europe’s protected areas.
Journal of Outdoor Recreation and Tourism, 37 (January). https://doi.org/10.1016/j.jort.2021.100479
Wartmann, Flurina M., Olga Koblet, and Ross S. Purves. 2021. “Assessing Experienced Tranquillity through
Natural Language Processing and Landscape Ecology Measures.” Landscape Ecology 36 (8): 2347–65.
https://doi.org/10.1007/s10980-020-01181-8.
Gugulica, M. & Burghardt, D. (2023). Mapping indicators of cultural ecosystem services use
in urban green spaces based on text classification of geosocial media data. Ecosystem Services, Volume 60,
https://doi.org/10.1016/j.ecoser.2022.101508
Wartmann, Flurina M., and Ross S. Purves. 2018. “Investigating Sense of Place as a Cultural Ecosystem
Service in Different Landscapes through the Lens of Language.” Landscape and Urban Planning 175 (July):
169–83. https://doi.org/10.1016/j.landurbplan.2018.03.021.
Knura, Martin, Florian Kluger, Moris Zahtila, Jochen Schiewe, Bodo Rosenhahn, and Dirk Burghardt. 2021.
“Using Object Detection on Social Media Images for Urban Bicycle Infrastructure Planning: A Case Study of
Dresden.” ISPRS International Journal of Geo-Information 10 (11): 733. https://doi.org/10.3390/ijgi10110733.
Zahtila, M., Knura, M. Visualizing Point Density on Geometry Objects: Application in an
Urban Area Using Social Media VGI. KN J. Cartogr. Geogr. Inf. 72, 187–200 (2022).
https://doi.org/10.1007/s42489-022-00113-7
We explored the potential of topic modelling as a tool for analyzing episodes of behaviors described by multivariate
time series data (Shirato et al. 2021).
The basic idea is to represent data variation by symbolic tokens and treat episodes as
pseudo-texts to which topic modelling methods can be applied. We tested this idea on data describing collective
movements in episodes from football games. The results showed good potential of the approach.
Exploring the potential of Twitter to understand traffic events
Detecting traffic events and their locations is important for an effective transportation management system and
better urban policy making. Traffic events are related to traffic accidents, congestion, parking issues, to name a
few. Currently, traffic events are detected through static sensors e.g., CCTV camera, loop detectors. However they
have limited spatial coverage and high maintenance cost, especially in developing regions. We investigated whether
Twitter - a social media platform can be useful to understand urban traffic events from tweets in India (Das et al. 2020). The results show that an SVM based model
performs best detecting traffic related tweets. While extracting location information, a hybrid georeferencing model
consists of a supervised learning algorithm and a number of spatial rules outperforms other models. The results
suggest people in India, especially in Greater Mumbai often share traffic information along with location mentions,
which can be used to complement existing physical transport infrastructure in a cost-effective manner to manage
transport services in the urban environment.
Bibliography
Shirato, G., Andrienko, N., Andrienko, G. (2021). What are the topics in football?
Extracting time-series topics from game episodes. IEEE VIS 2021
Das, Rahul Deb, and Ross S. Purves. 2020. “Exploring the Potential of Twitter to Understand Traffic Events
and Their Locations in Greater Mumbai, India.” IEEE Transactions on Intelligent Transportation Systems 21
(12): 5213–22. https://doi.org/10.1109/TITS.2019.2950782.
Supporting comparative visual analytics for political science research
Analyzing and Visualizing Emotional Reactions Expressed by Emojis, Hauthal et al. (2019)
Hauthal et al. (2019) used a Twitter dataset to investigate
reactions
to the political event Brexit in terms of opinions and emotions using emojis in two different approaches. In the
first approach, emojis and hashtags were combined. Hashtags, established in political campaigns before the
referendum, indicate which sub-topic of the overall Brexit debate is addressed in a tweet, i.e. leave or remain. A
spatial comparison of the analysis results with the actual referendum results on NUTS1 level (the highest level in
the hierarchical classification used to clearly identify and classify the spatial reference units of official
statistics in the Member States of the European Union) showed a higher consistency than a pure hashtag-based
consideration without including emojis.
Migration analysis
Focusing on human migration in Sîrbu et al. (2020), we
consider three stages of migration:
the journey, the stay, and the return. For each stage, we discuss the
traditional and novel sources and types of data that can be used in analysis, paying particular attention to the
opportunities created by big data and challenges involved in their analysis.
Bibliography
Hauthal, E.; Burghardt, D.; Dunkel, A. Analyzing and Visualizing Emotional Reactions
Expressed by Emojis in Location-Based Social Media. ISPRS Int. J. Geo-Inf. 2019, 8, 113.
https://doi.org/10.3390/ijgi8030113
Alina Sîrbu, Gennady Andrienko, Natalia Andrienko, Chiara Boldrini, Marco Conti, Fosca
Giannotti, Riccardo Guidotti, Simone Bertoli, Jisu Kim, Cristina Ioana Muntean, Luca Pappalardo, Andrea
Passarella, Dino Pedreschi, Laura Pollacci, Francesca Pratesi & Rajesh Sharma (2020) Human migration: the
big data
perspective. International Journal of Data Science and Analytics, 2020, vol. 11(4), pp.341-360
https://doi.org/10.1007/s41060-020-00213-5