Collection, usage and privacy of mobility data in the enterprise and public administrations (2024)

[1]Alexandra Kapp

Abstract

Human mobility data is a crucial resource for urban mobility management, but it does not come withoutpersonal reference. The implementation of security measures such as anonymization is thus needed toprotect individuals’ privacy. Often, a trade-off arises as such techniques potentially decrease the utilityof thedata and limit its use. While much research on anonymization techniques exists, there islittleinformation on the actual implementations by practitioners, especially outside the big tech context.Withinour study, we conducted expertinterviews to gain insights into practices in the field. We categorize purposes, data sources, analysis, andmodeling tasks to provide a profound understanding of the context such data is used in.We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-artstandards of differential privacy. We provide groundwork for further research on practice-orientedresearch by identifying privacy needs of practitioners and extracting relevant mobility characteristics forfuture standardized evaluations of privacy-enhancing methods.

1 Introduction

Nowadays smartphones are being used daily for a variety of functions - from mobile phoning throughnavigation to renting an e-scooter via an app. The usage of these applications produces data about thelocations and movements of individuals, so-called human mobility data, which can be a greatresource to optimizeservices but also for a multitude of diverse tasks such as traffic planning[1] or epidemiological research[2].As such data entails highly personal information, it falls under the European General Data ProtectionRegulation (GDPR) which restricts companies from freely using such collected data for arbitrary purposes.While the analysis of human mobility data offers great potential, it can be assumed that not all desirableuse cases are implemented due to uncertainty regarding privacy regulations. A recentstudy[3] concludes that 46% of German companiesrefrain from innovations because of ambiguities in the interpretation of the GDPR. For example, 31%claimed to not have implemented new technologies based on Big Data or Artificial Intelligence because of itand 41% stated that they were unable to set up data pools or share data with business partners.

Anonymization111“Anonymization” is a misleading term, as it suggests that data becomesfullyanonymous. Numerous examples of successful reidentification of individuals in “anonymized”datasuggestsotherwise, e.g., [4], [5],[6]. of data can be used as a measure to enhance customers’ privacyand simplify data usage for companies, as GDPR principles no longer apply once data is consideredanonymous (Recital 26 GDPR). Therefore, one option to make use of data more confidently is theimplementation of privacy-enhancing methods that sufficiently guarantee privacy. However,anonymization of mobility data is a difficult task since people’s movements follow predictable patterns[7] that allow easy re-identification.Individuals have successfully been re-identified from “anonymized” taxi data[5], out of highly aggregated mobile phone data[6], or the aggregated count of customers per station[8]. This already illustrates that procedures that guarantee sufficient anonymizationfrom a legal point of view are partly considered insecure within the privacy community.While big tech companies such as Google, Apple, or Microsoft put effort into adopting state-of-the-artprivacy conceptslike local differential privacy [9], it is doubtful that these are widelyused outside the big tech industry [10].

Making mobility data available in a privacy-sensitive manner is a complex and multi-faceted problem.There is typically a trade-off between utilizing data and protecting privacy and the legal and technicalassessment of the anonymity of the data may differ.It is not trivial to gain in-depth insights on data practices in thefield as companies rarely share detailed information on data usage. Even companies such as the cellularnetwork operator Telefónica that claims to use sophisticated anonymization techniques[11] do not share details about their methods.

We aim to understand which privacy-enhancing methods for human mobility data are already in use bypractitioners and which privacy needs are still present. Thus, a profound understanding of real-lifepractices in the work with respective data is necessary, as the suitability of privacy methods depends onthe context they are applied in. For example, if the goal is to releasereports with aggregated statistics to third parties, one could add noise to the aggregates as acomparatively simple method that likely provides reliable results. On the other hand, training anext-location prediction algorithm requires fine-granular data input and therefore other appropriateprivacy-enhancing methods are needed.

As shown in Figure 1, we presume that mobility data serves as a data source to conductanalysis and modeling tasks which are means to acquire certain purposes.For example, data from a public transport routing app(data source) is used to aggregate the number of routingqueries for each hour of the day (data analysis) to optimize the operating hours of thepublic transport lines (purpose).With expert interviews, we aim to gain insights into these respective categories. In addition, we survey onprivacy measures that are already in use.

Academic research evaluates their proposed privacy methods with similarity measures whichquantify the resemblance of analysis outputs with and without privacy enhancement. The more similarthe two outputs remain the higher the utility of the privacy measure is rated.With this work, we further want to provide a link between real-life practices and academic research byextracting the core mobility characteristics entailed in practitioners’ use-cases so that relevantsimilarity measurescan be identified. For example, analyzing the top 10 most used docking stations of shared bicycles entailsat its core the same characteristic of interest as determining traffic volume on street segments: the spatialdistribution of records. A standardized set of similarity measures that are matched tosuch characteristics would not only enable easier comparison between different privacy enhancingapproaches but alsosimplify practitioners’ assessment of suitable methods for acquired use cases. As similarity measurescurrently vary strongly within the literature it is difficult to compare different approaches, wehereby want to lay the groundwork for future standardization of such similarity measures.

In summary, our contribution consists of the following:(1) We provide a profound insight into real-life practices stated in expert interviews by employees fromcompanies and public administrations in Germany working with human mobility data.(2) We deduce core mobility characteristics as groundwork for the categorization and standardization ofsimilarity measures.(3) We identify privacy needs of practitioners.

This paper is organized as follows. In Section 2, we give an overview of related work onhuman mobility data. Section3 describes our methodology for the data collection, processing, and evaluation of the expertinterviews. In Section 4 the evaluation of the qualitative data is presented. Section5 provides implications for privacy needs and similarity measures deduced from theinterviews. Finally, the results aresummarized and discussed in Section 6.

2 Background

Different techniques to collect human mobility data are well documented in the literature, see e.g.,[12, 13, 14].This includes data from surveys, mobile phone data in the format of call detail records, GPS trackingdevices, usually smartphones that produce spatially and temporally fine granular data, and locations userspost on social media. Some surveys also name WiFi positioning systems[15]. The overviews of data sources mostly focus on openlyavailable data sets or such that have been used for academic research. One can easily imagine what kindof human mobility data companies could potentially raise with differenttechniques. Fiore et al.[16] name five examples of sources formicro-trajectory data: Location-based services, like Google maps, record the GPS position while theapp is running, cellular network operators collect call detail records, municipalities collect MAC addressesvia Wi-Fi probe messages of nearby smartphones, car navigation systems record the GPS data of thenavigation device, banks register the shops their customers pay at. While all these are valid examples ofpotentially used datasets, to the best of our knowledge there are no systematic overviews of mobility dataactually used by companies and the ways this data is handled.

Data on human mobility is a highly desired resource for various purposes. For example epidemicspreading of diseases is being studied [2], just recently during theCOVID-19 pandemic [17]. There is also growing researchfocusing on deep learning approaches to predict the next location of a person[13], for instance, to predict locations of affected people during disasterslike earthquakes [18]. For an overview of various machinelearning applications that human mobility data is used for, see [15].Dedicated research also focuses on visual explorations of mobility data[19] and more and more interactive tools enable users tovisualize large-scale fine granular mobility data222for example kepler.gl and unfolded.ai. Theseexamples are only a few of various use cases that build onhuman mobility data though this does not necessarily reflect applications in enterprise settings.

There is plenty of research dealing with privacy-enhancing methods for mobility data, for adetailedoverview on methods for trajectory micro-data see Fiore et al.[16]. To give a fewexamples: There are simple approaches like the reduction of granularity of coordinates[20] or reducing the sampling interval[21]. More advanced methods aim to provide indistinguishability betweenindividuals within a dataset, like k-anonymity [22], or provideuninformativeness with the guarantee ofdifferential privacy, e.g., [23, 24, 25, 26].

While privacy researchers consider differential privacy as the de-facto standard, thereis little information on the adoption of such methods in the field. Garfinkel etal.[27] point out that the deployment of differential privacy comes withchallengesand requires skilled staff. Calacci et al.[28]state that risk and utility are often evaluated without context which is vital for a proper assessment. Theyanalyze the public and market utility as well as the risks associated with different levels of granularity ofmobility data,thereby only considering coarsening and aggregating as privacy enhancement, which they say is stillmost commonly used in practice. De Montjoye et al.[29] alsocriticize the insufficient implementation of privacy measures for mobile phone data and propose fourdifferent approaches for practical implementations in real-life scenarios. While both, Calacci et al. and deMontjoye et al. assume reasonable scenarios, we aim to collect empirical data on the context of mobilitydata usage.

Privacy-enhancing methods reduce the information content and thus there is a common perception of anassociated reduction of utility of the data. This is true for many use cases, for example, when a publictransportcompany wants to analyze the typical distance their customers are willing to walk to a stop, the utility islikelydecreased when the exact locations are obfuscated with noise or aggregated to larger grid cells. Other usecases are less impacted by such measures, for example, those that are based on highly aggregated datasuch as the evaluation of customer numbers over time of a newbike-sharing system. Thus, it is vital to understand the analysis purposes and methods that areappliedin practice to evaluate the trade-off between utility andprivacy when privacy-enhancing methods are applied. Similarity measures are commonly used in researchto quantifythe utility,though there do not exist any standard measures for privacy-enhancing methods applied to mobilitydata[16].In addition to a (potential) impact on the utility, other effects of privacy-enhancing methods also ought tobe considered in practice, e.g., research about medical data shows that users are more willing to sharedatawhen they have trust that their privacy is preserved[30].

3 Methodology

In July and August 2021 we conducted a total of 13 semi-structured expert interviews that lasted onaverageabout one hour, with a range between 30 minutes and 1.5 hours.The interviews covered questions on mobility data sources, including their origin, structure, and personalreference.Further questions dealt with data analysis and modeling techniques, their purposes as well as the impactthey have on the companies’ actions. Additionally, we asked about analyses planned for the future, thosethat have not been conducted due to (legal) restrictions or obstacles, and data protection andanonymization practices.Questions were asked about how long data is stored, in which format, and whetheranonymization techniques are applied.Questions on data security, the legal basis, and user communication wereincluded, however, they are not further evaluated within this work. See Appendix Afor athe full interview guide.

3.1 Participant recruitment and moderation

All interviewees were employees in leading positions of German organizations working with human mobilitydata. One organization was represented with two interviewees from different departments, thus resultingin twelve different organizations of the following types: public administrations, public transport companies,a mobility platform (part ofa publictransport company), a mobility service provider, an automobile manufacturer, a location-basedservice app, a sensor company providing sensors for people counts, and market researchcompanies. The location-based service app was still in the state of a startup and did not work with any realcustomer data yet, but the participant could report on planned data usage. Also, one public administrationonly recently started with a dedicated team to work with human mobility data and one person from a publictransport company reported mostly from their current build-up process. The rest of the intervieweeshad multiple years of experience with mobility data within their field and company. All participants were inthe positions of founders, CEOs, team or project leads of relevant divisions. See Table 1for an overview of all participants which also introduces the participants’ IDs which will be used in Section4.

IDOrganization typeJob title
\@BTrule[]P1public administrationmobility manager
P2public administrationmanager in traffic mgmt.
P3public administrationhead of the data science
P4public transport companyteam lead AI systems
P5public transport companyproduct owner analytics
P6public transport companyteam lead offer planning
P7mobility platformproject lead
P8mobility service providermanaging director
P9automobile manufacturerhead of analytics
P10location-based service appCEO
P11sensor companyhead of technical division
P12market research companyCEO
P13market research companymanaging director

We recruited participants through contacts of our research group network and by sending email invitationsto relevant company representatives (see Appendix B for the email invitation text). We usedpurposeful sampling [31], a method where a variety of relevant cases is soughtto be included by specifically targeting a selection of such differing subjects, i.e., a variety of differenttypes of organizations working with mobility data.

All interviews were held in German and conducted remotely using a video conferencing tool. The interviewswere recorded and transcribed with the aid of transcription software. The automatically createdtranscripts were proofread and corrected by the interviewer.

3.2 Analysis and coding

The qualitative content analysis with an inductive approach[32] was conducted to analyzethe interview transcripts. All questions within the interview guide were constructed to fit into one of thefollowing groups (see Appendix A) which match the scheme in Figure1:(1) purposes,(2) data sources,(3) data analysis and modeling (initially: methods),and (4) privacy (the code on user communication an legal was not used for the evaluation).All relevant parts of the transcripts were extracted into a table format andcategorized into one of the four codes.Each group was then evaluated separately. During the iterative coding phase eachtranscript chunk was coded on a fine-granular level first, then the codes were grouped into broadercategories. The coding process was conducted by the interviewer herself and the coding iterations werediscussed with and reviewed by one further person; the revisions served as the base for furtherrefinements.For the analysis of purposes, we aimed to find common motivations and themes which are printed in boldtype in Section 4.1.Data sources were categorized based on technical similarities and are listed in Table 2.The major distinction between applications was the computation of statistical aggregations (see all finalcodes in Table 3) and theapplication of mathematical models (see all final codes in Table 4).Themes for the motivation of anonymization were derived (see final themes printed in bold type in Section4.4); applied and planned privacy-enhancing methods were collected (see final codes in Table5).

3.3 Research ethics and anonymization

During the recruitment, participants received information about the purpose of the study, and an informedconsent document (see Appendix C) was signed before the interview. The intervieweralso informedparticipants verballyabout the purpose of the research and the audio recording. Additionally, they were guaranteedprecautious handling of the shared information which does not allow to identify the person orcompany. To further stress that no sensitiveinformation about the person or company would be revealed, this information was again repeated withinthe verbal introduction at the beginning of the interview (see Appendix A). After thetranscription, allaudio files weredeleted and the names of participants and their employers were removed from the transcripts. Thetranscripts were then stored in an encrypted manner. See Appendix D for the full studyprocedure with respect to ethical considerations.

3.4 Limitations

Participants were only recruited from companies based in Germany. Thus, results can only be applied to alimited amount of companies in other countries. As GDPR was of special interest, the results can mosteasily be transferred to other EU countries.We have made our best effort to recruit diverse organizations, though we cannot claim to have included alltypes of organizations working with human mobility data.Our main research focus is directed at companies working on urban mobility topics, therefore we recruitedour participants accordingly. Still, we are aware that companies from other contexts also work withsuch data, for example, location-based service apps like fitness apps, restaurantrecommendations, or dating apps, to only name a few.

4 Findings

Participants are referred to by there ID P1-P13 as assigned in Table 1.

4.1 Purposes for data collection and analyses

The primary purpose for collecting personal data is usually the operation of a service. As P8 stated, theycannot provide route suggestions if they don’t know where thecustomer wants to go to.However, this evaluation focuses on determining themes for data analysis and modeling purposes beyondthe operation of applications.

Several experts mentioned that data is used for demand-driven offers to customers. For example,the P8 (mobility service provider) said they position their vehicles close to the predicted demand and plantheirfleet size accordingly. They also optimize routing algorithms for ride-hailing applications based oncustomer data. P6 (public transport company) stated that they not only do mid- to long-term offerplanningbut also adapt their schedules within a few weeksto better suit the changing needs during the COVID-19 pandemic. They added that ticket options andpricing are alsopart of the long-term offer planning that relies on customer mobility data. Quality management wasmentioned by P8, for example by comparingthe actual and predicted waiting time of taxi customers. P9 (automotive manufacturer)named theneed for data for autonomous driving.

Data is not only used internally but it is also used to provide information to customers: P6 said theyuse historic data topredict future passenger loads and display such information in their routing application. P9 stated that thedisplay of real-time traffic can help car drivers to avoid trafficjams.

Insights from aggregated data have been stated to be used for marketing. Personalizedadvertisinghas onlybeen named by one expert as a potential option that is not intended to be pursued.

Various experts named reports of aggregated statistics for monitoring of KPIs,internal knowledge, and strategicplanning. Not only to obtain new insights but also to verify gut feelings, as one expert said:“Every [manager] knows […] the customer behavior very well. […] [They]have a feeling, an experience, but it’s better to really see it in black and white” [translated from German].

P6 and P3 (both public administration) stated to use data for city andtraffic planning, e.g., to plan bicycle infrastructure. They also envisioned other fields of application forthefuture,e.g., to compute emissions produced within a city by the transport sector or to create plans foremergency or catastrophe situations.

The provision of data to third parties was mentioned in various forms: P3 mentionedtheir efforts of providing as much data as possible as open data for transparent policymaking. Someinterviewees said they are obligedto provide data to other parties, for example, public transport companies need to report aggregatedstatisticsto the public administration. One expert also claimed they are considering to potentially sell anonymizeddata in the future. All datathe public administration has access to can potentiallybe subject to parliamentary inquiries. Two experts also reported on the use of data for evidence in court.

4.2 Data sources and responsibilities

There are two types of data that practitioners work with: data collected by themselves and external data,such as open data, bought data, or data provided by contractual partners.The question of origin plays a major role in terms of who needs to have technical and legal competencieson the protection of privacy: if data isgathered through external sources the responsibility is seen with the providing entity. The provider needsto havethe competencies of applying adequate privacy measures while the party receiving data is (at most)interested in the high-level information on whether the data is GDPR compliant.

As P11 (sensor company) stated: “[…] We have such a certificate for our solution. Well, what weattachto thetender and show: Okay, look at our solution, it works. But it is also compliant with the General DataProtection Regulation. […]Otherwise, we wouldn’t be able to offer the solution in the market in Germany or anywhere else or inEurope […]” [translated from German].While P13 (market research company) said: “There I notice the tendency that the industry clients,so tospeak, they like to play down the data protection requirements a bit in order to get this data morequickly, so as not tomake things so difficult for themselves” [translated from German].

Table 2 summarizes the categorized data sources named by the interviewees which arealready being used or desired to be used in the future.

TypeProvisionUser
(among interviewees)
Available format
\@BTrule[]surveysthird party research institutespublic administrations (P1-P3),
publictransportcompanies (P4-P6)
aggregated and anonymized data
stationary sensor datasensor companies maintain sensors and preprocess datapublicadministration (P3),
public transport company (P6)
preprocessed and anonymized data
routing queriesapp operatormobilityservice provider (P8),
public transport companies (P5, P6),
mobility platform (P7)
data users are also app controllers, thus, access to data in any (legally permitted) way
transaction dataapp operatormobilityservice provider (P8),
mobility platform (P7)
data users are also app controllers, thus,accessto datainany(legally permitted) way
GPS trackingcontroller of tracking device applications (e.g., smartphone apps, vehiclesequippedwith GPS trackers)mobility service provider (P8),
mobility platform (P7),
publictransportcompany (P6),
publicadministration (P3),
market research company (P12),
automobile manufacturer (P9),
location-basedservice app (P10)
heterogeneous:If data users are also app controllers they have access to data in any (legally permitted) way.Third parties can only access aggregated and anonymized data.
mobile phone datacellular network providerspublic administration (P3),
publictransport company (P6)
aggregated and anonymized origin-destination matrices

Large-scale household surveys are a traditional mobility data source that allinterviewed public administrations and public transport companies stated to rely on.The experts mentioned additional custom studies on smaller scales that are commissioned by publicadministrations or companies. They all agreed that surveys are commonly conducted by third-partyresearch facilities who are responsible for the data privacy concept while only aggregated and anonymizeddata is made available to third parties.

Unlike other forms of mobility data where the person carries the tracking device, stationarysensors arepositioned statically and people passing the sensor are registered. As the sensor provider (P11)explained, there are specialized companies that install and run sensors, provide software, make the sensorsignals human-readable, and take care of anonymization measures if the data contains personallyidentifiableattributes. Technical variations that interviewees stated include pressure sensors within theroad surface (e.g., to measure trafficvolume), infrared sensors (e.g., to count entering and exiting public transport passengers), camera-basedsensors (e.g., to count people within a room) or sensors based on WiFi technology that allows the trackingof MAC addresses of mobile devices across multiple sensors. Only camera and WiFi-based sensors wereseen as potentially critical in terms of privacy.

Routing applications provide information on the optimal route and potential alternatives, based on aprovided start and end location and time. Routing queries made within such apps precede manyactual trips and can be considered a proxy to mobility data. P5 and P6 from public transportcompanies reported that theycollect such data with their own routing applications and use it for analytical purposes, e.g., for passengerload forecast. As app operators, they both stated to have raw data access which isrestricted by technical and organizational measures.Usually, query data is not stored with any user identification which limits the personal references. Though,as people tend to query routes to sensitive and personal locations, like home or work, privacy concernscould be raised,as P6 also mentioned.

Apps that allow the booking of mobility services produce mobility-related transaction data.Transactiondata includes the exact start and destination location as well as time, price, and user information. This datais primarily needed to handle the booking transaction with the payment, as P8 said, but is also used foraggregated statistics. P2 from a public administrationreported obtaining aggregated statistics of such transaction data from apartnering bike-sharing provider. Unlike routing queries, transaction data includes information on actuallyperformed trips with precise timeand place information and a linked user record. It is therefore highly personal information.

GPS tracking data is diverse due to different types of devices and applications. Some expertsstated tocollect GPS data themselves, either via their smartphone applications or GPS-equipped vehicles. Forexample, the market research company (P12) offers an app that constantly tracks participants during thecollection phase of a study. GPS data is also acquired from third parties: P3 (publicadministration) reported that they considered using aggregated data about street-level speed andtraffic volume that had been collected with an app for cyclists and P6 (public transport) reported aboutplanning a marketresearch study which includes GPS tracking.

Mobile phone data is collected by cellular network providers. P6 explained that they buy such datafrom a service provider who gets anonymized datafrom cellular network providers. The service provider then processes them into usable formats such asorigin-destination matrices and redistributes them.

4.3 Data analysis and modeling techniques

Methods stated by the experts can mainly be assigned to one of two groups: statisticalaggregations and mathematical models. While statistical aggregations provide descriptive analytics of thedata, mathematical models, i.e., machine learning models and traffic models, allow tasks such asclassification, prediction, or simulation.

All stated statistical aggregations are presented in Table 3.They are grouped according to shared underlying characteristicswhich are generic attributes of mobility dataindependent from the specific context.

Statistical aggregationsmobility characteristic
\@BTrule[]trip counts, customer counts, returning customersrecord counts
total passengers over time, bike rentals over timetemporal distribution of records
people count per location / sensor, top 20 shared mobility stations, sold tickets at a station,passengers entering and exiting a station, number of transits per station, traffic volume,occupancy rate in a place / public transport line, real-time traffic informationspatial distribution ofrecords
all aggregations for spatial distributions disaggregated by certain time windowsspatial andtemporaldistribution of records
mobility demand by OD relations, round trips of shared bikes (i.e., same start and end station)distributionof OD counts
relation of public transport share compared to other modes by OD relationmodal split per OD pair
average trip lengths (evaluated in research studies)trip length
dedicated analysis of trip chains (evaluated in research studies)travel patterns
daily driven distances (car), temporal changes in daily distances (e.g., to see trends during COVID-19pandemicorholidays)daily range
modal split (evaluated in research studies)modal split
trips conducted with multiple traffic modes (e.g., bike & ride) (evaluated in research studies)inter-modality of trips
proportion of people who use more than one traffic mode (evaluated in research studies)multi-modality of people
average speed per street segment (bicycle and car)speed
waiting times at traffic lights, customer time spent in storestime allocation
customer groups (e.g., x% of customers visiting store A also visit store B)correlation between visitsof different locations

On the highest level, mostly all experts aggregate data to record counts like the total number of trips orcustomers.On a more fine-granular level, different experts are interested in spatial (and temporal) distributions thatquantify people atcertain locations (and times), e.g., public transport companies (P6) are interested in the number ofcustomersentering and exiting stations (at different times of day) and the number of tickets sold per station, whilethe mobility service provider (P8) wants to know where (and when) their services are mostly used.

Origin-destination matrices are used by public transport companies (P6, P13) to gain insights intomobility demand,further disaggregated by modes of transport they reveal shortcomings in public transport infrastructure.According to P13, surveys and market research studies evaluate average trip lengths, trip chains, themodal split, as wellas the share of inter-modal trips (i.e., multiple modes of transport are mixed within a trip) and multi-modalpeople (i.e., a person uses multiple traffic modes for different trips).P3 (public administration) stated to be interested in data on speed and traffic volumes of bicycles perroad segment. Additionally, they were interested in waiting times attraffic lights.P11 (sensor company) stated that clients are interested in identifying customer groups based on theirvisitedlocations. Additional performance indicators such as the number of unique customers per location, thenumber of returning customers per location, and time spent at different stores are determined.

Next to statistical aggregations, experts (plan to) use data for different mathematical models, listedin Table 4. Due to the nature of such models, underlying characteristics cannotbe determined in the same manner as before.P8 (mobility service provider) uses demand prediction models and optimizesroutings of ride-hailing services to optimally group users.P11 (sensor provider) explores the prediction of people counts, thoughthey do not see any demand for such features among their customers.P12 (market research company) reported to use classifiers that detect the mode based on continuoussmartphone GPStracking and further sensor data. They alsoexperimented with activity recognition algorithms which are supposed to recognize the purpose of a visit,such as “at home” or “waiting”.Public administrations (P2, P3) and public transport companies (P6) reported using traffic models,commonly4-step-traffic-models333The 4-step-traffic-model is a travel demand model to forecast trafficfollowing four steps: (1) trip generation, (2) trip distribution, (3) mode choice, (4) route choice[33]., which take a variety of data sources as input such as population density,modal split and origin-destination matrices to simulate different scenarios and forecast traffic.According to P2 and P3, agent-based models are also in the planning which require user trajectories of anentire day to properlytake trip chains into account.Predicting the next probable location of a user (next-location prediction) was in a proof of concept stage atthe location-based service app (P10). They also planned on implementing an algorithmto cluster customers’ mobility behavior.

in usein planning
\@BTrule[]4-step traffic modelsnext-location-prediction
occupancy predictionactivity recognition
mode detectionclustering of mobility patterns
routing optimizationagent-based models
demand prediction

4.4 Privacy

We found major differences regarding the engagement of the interviewees with privacymeasures.We hypothesize that there is a difference between participants’ organizations that collect datathemselves and those that obtain them from third parties.For example, experts from public administrations (almost)exclusively work with third-party human mobility data, therefore they did not report any need ofimplementing anonymizationmethods themselves. Still, privacy is an important topic for them, as data protection authorities strictlycheck any personal data that is used by public administrations.

All interviewees applying anonymization methods to their data named one of thetwo reasons:(1) For purposes outside of the scope the user consented to.(2) To make the data available to third parties.Different experts reported that they struggle to pursue all their use cases due to GDPR.They said, that personal datacannot be used for any arbitrary purpose, even if it might serve the customers’ interests.Therefore, anonymization techniques can help to remove the personal reference and enablethe use for additional analyses. As one interviewee explained:“There is a source layer […][with]GPS in full resolution and whatnot. This is normally not usable at all for analysts like me and afterprocessing [and anonymizing] it is moved to the secondary assets. The primary assets for the primaryuse case are thendeleted” [translated from German].

Interviewees with a business model based on providing data to third parties, such as marketresearch companies or the sensor provider, have a high interest in applying privacy measuresas compliance with GDPR is a major criterion to acquire clients. Accordingly, they seemed to havethe highest expertise in privacy-enhancing methods. Table 5 shows an overview ofprivacy-enhancing methods that were stated by the experts.

Privacy measure / guaranteeContext
\@BTrule[]removal of personal attributesstorage of recorded GPS locations without any customerinformation
pseudonymizationsensor company pseudonymizes MAC-address recorded with WiFisensors
aggregation(1) aggregate data from surveys/studies as reports
(2)dashboardwithaggregated bike sharing data provided to public administration
(3) internal knowledge sharing ofinsights based on statistics
indistinguishability(1) marketresearchcompany (P12) provides origin-destination information only for connections above a certainthreshold
(2) mobile phonedata isprovided only for connections above a certain threshold
(3)State Office of Statistics provides spatially aggregated data only for cell counts > 5
(4)automobile manufacturer (P9) only includes POIs in analyses that exceed a certain user countthreshold
coarsening(1) heatmaps instead of maps with single points are used to visualize studyresults
(2) P9 rounds coordinates to three decimal places for the analysis of POIs
cropping of trajectories(1) P9 crops trajectories for the analysis offrequentlyused roadsegments
(2) P3 names cropping as a known best practice and role modelfor potential future release of anonymized open data
noiseP12 uses different anonymization techniques based ontheanalyses:adding of noise is mentioned as one option
synthetizationP12 investigated synthetization options but evaluatedtheutility as not sufficient for their sample sizes
differential privacyP12 tests differential privacy methods to exemptdata frombeing strictly bound to study purposes
de-centralized data processingP12 envisions to run certainalgorithms(e.g., mode detection) directly on user devices in the future; for privacy reasons but also forfaster processing capabilities

P4, P9 and P11 stated to remove personal information that is not needed for analyses, such asname,phone number, or MAC address. Some experts claimed to remove the user identification entirely whileothersstill retained the link between different user records but used pseudonymization methods on theuser ID.Data aggregation is not only a method for analytical purposes but also a measure ofanonymization, as stated by P2, P11 and P12.P2, P6, P12 and P9 reported that data is restricted such that locations counts of spatially aggregateddataneed to surpass a certain threshold to be accessible, thereby providing indistinguishability.Two of them received such restricted data from third parties while two implemented such measuresthemselves.P9 reported working with a reduction of granularity of coordinates and timestamps(coarsening) for startandend locations. P12 explained that map views would only show heatmaps instead of exact pointsas a visual implementation of reduced granularity. Cropping of the beginning and end of atrajectory forfine granular GPS trajectory data was reported by P9 and P12 also stated toadd noise to the data.

Advanced methods like synthesizing data444On the basis of raw data a new synthetic dataset iscreated that, depending on the used algorithm, maintains certain statistical distributions from the originaldataset., methods that implement differential privacy and de-centralizeddataprocessing haveonlybeennamed by one expert (P12) as methods that are being tested within the organization for potential futureusage.

5 Practical implications

5.1 Privacy needs of practitioners

Based on the interviews, we can identify different privacy needs of practitioners.

A common scenario is the compilation of pre-defined aggregated statistics. While the experts did notsee privacy needs in addition to aggregations, privacy research suggests otherwise[6, 8].Since many analyses in different contexts are based on similar characteristics (seeTable3) aset of proven privacy-enhancing methods and tools for standard analyses could be helpful.

However, not all useful analyses are known in advance. Data is used in exploratory scenarios and new usecasesarise. As one expert said: “We repeatedly have questions [that could be answered with the surveydata].But for data protection reasons it was promised that the data will be deleted at the end of lastyear” [translated from German].Data release is another relevant scenario: Data used for decision processes of public administrations isdesired to be published asopen data. Also, agent-based traffic models are desired to be used but they need single user trajectoriesas input which are usually not shared by data providers. Data synthetization techniques could be a viableprivacy enhancement where data remains in the original format and can be used for arbitrary purposes andwithout time restrictions. Though it should be noted that synthetizationtechniques maintain only certain statistical properties depending on the specific algorithm. There is anincreasing amount of research on synthetization of mobility data, but these methods are far fromestablished and practically proven. They need to be evaluated carefully and if applied, limitations of theutility need to be well communicated.

The operation of applications based on machine learning models like mode detection,next-location-prediction, or activity recognition need fine-granular input data which cannot be obfuscatedor aggregated in advance. Differentially private adaptations of machine learning algorithms can be used tolimit the impact of single users onto the model and thereby the potential privacy breaches. Also,de-centralized approaches, like federated learning, could be considered to prevent centralized storage ofpersonal data.

The lack of expertise to assess which anonymization techniques are sufficient causes insecurity andlengthy processes. As one participant said: “[…] it is not so easy to findexpertise that covers both technical know-how on data level and can serve the legal perspective as well.[…]If someone says I want to do this, but the data must be anonymized for that, we have to involve a lot ofother people who tell us how to do it and who can also somehow give the okay for it to be really legallysecure” [translated from German]. Concrete recommendations for action would provide guidance forfaster processes and implementations.

Finally, it should be noted that there is a need for easy-to-use tools that can also be implemented byorganizations that do not have the resources or expertise for employees with dedicated skills on privacymethods. The more accessible such methods are, the likelier the gap between research and practicewill shrink.

5.2 Similarity measures

Utility losses due to privacy-enhancing methods are quantified with similarity measures, as shown inFigure1.555There isno standard name for such measures, different publications also use the following terms:measure (or metric) of utility, evaluation, resemblance, dissimilarity, quality, accuracy, information loss,or utility loss. They determinehow much a characteristic, e.g., the spatial distribution, of privacy-enhanced data resembles theoutput generated with the raw data. As researchers evaluate their proposed privacy methods on varyingsimilaritymeasures results are hard to compare amongst them. Similaritymeasures might address different characteristics oreven different nuances of a characteristic, for example, the spatial distribution can be captured byquantifying how many of the top 50 locations are identified correctly or by comparing thedistribution of location visits with the Jensen-Shannon divergence. Depending on the use case,different characteristics need to be maintained by privacy-enhancing methods, thus different similaritymeasures are relevant.

With Section4.3 we want to provide guidance on relevant characteristics for a futurecategorization and standardization of such similarity measures. While the definition of characteristics isfairly straightforward for statistical aggregations, suitable measures for mathematical models are moredifficult to evaluate. Either a more profound understanding of such models is needed to derive respectivecharacteristics or privacy methods need to be evaluated directly on accuracy measures of downstreamtasks,e.g., correctly detected traffic modes by a mode detection algorithm with and withoutprivacy enhancement.

5.3 Recommendations

In summary, we can derive the following recommendations:

  • To provide practitioners with guidance and clarity on the use of state-of-the-art privacy-enhancingmethods for mobility data, an easily accessible framework could be useful which compiles practicalreal-world use cases and suggests adequate privacy methods.The handout for companies published by Germany’s digital association Bitkom on “Anonymization andpseudonymization of data for machine learning projects is an illustrative exampleof such a publication about a related topic [34].

  • A provision of easy-to-use tools for privacy enhancing methods will enable organizations withoutthe expertise and resources to implement state-of-the-art methods. Such tools could provide acompiled report of typical mobility analyses or the generation of synthetic data.A project like the Synthetic Data Volt (SDV) [35] which is an overall systemfor synthetic data models, benchmarks, and metrics could be extended for mobility data or serve as anexample for a similar approach.

  • A set of standardized similarity measures and downstream tasks would facilitate the comparison ofdifferent privacy enhancing methods and enable practitioners to choose the most suitable method fortheir use case. The SDV package includes model agnostic metrics which could again serve as anexample or be extended with mobility data specific metrics.

  • GDPR certificates for privacy-enhancing technologies could accelerate the processes withinorganizations and provide security for decision makers.

6 Discussion

Movement data undoubtedly holds great potential for commercial as well as scientific analyses. However,the highly individual patterns in the data, which make them so interesting, mean that anonymization ishardly possible without utility losses. The high legal attention to the processing of such data leads tofrequently encountered challenges in practice which motivated us to take a detailed look at the data usedin organizations and the analysis and anonymization methods that are being applied. Weconducted 13 interviews with employees of German companies and public administrations working withhuman mobility data.Even though many assumptions are made concerning the practical use of such data, to the best ofour knowledge, this is the first systematic study to evaluate suchsources, usage, and privacy measures in enterprises. We grouped and listed our results toprovide an overview of real-world practices with such data and identified different scenarios of privacyneeds of practitioners. Thereby, these insights can be used as a basisfor future research on practice-oriented privacy-enhancing techniques and toolsthat help to close the gap between research and practice.

The interview evaluation shows a detailedbreakdown of data sources in use, including their origin andavailable formats. This information can guide future privacy research regarding target groups and usecasesfor proposed methods.

Compliance with GDPR is a major concern stated by many experts. Thereby, legal requirements are almostexclusively the origin of instating privacy measures. This is in accordance withBeringer et al.’s[36] findings who see a need for a regulatory framework forusable privacy and security and conclude that business interests are mainly directed at collecting asmuch data as possible.Though,much uncertainty remains about possible techniques,their implications on utility, and tools to implement those in practice.Expertise of anonymizationtechniques strongly varies among organizations and largelydepends on whether data is gathered and used by themselves, provided to third parties, or if it is onlyreceived by data providers.While academic research has accepted differential privacy as the de-facto standard, it is notyet implemented in practice, if known at all. One expert alsostated that they neither have the time nor the expertise to implement advanced methods. Providingeasy-to-use tools to simplify the implementation of privacy-enhancing methods is thus a necessary step toincrease the usage of such methods. Especially companies that have no dedicated business case ofproviding anonymized data usually lack such resources.

To increase the accessibility of methods that state-of-the-art research suggests, the utility for the actualdata analysis purposes of practitioners needs to be ensured. Therefore, we see the need for a diversepalette of standardized similarity measures that cover different kinds of use cases. Proposedprivacy-enhancing methods use varying similarity measures concerning different mobilitycharacteristics. This makes the comparison and interpretation of the utility across different methodsburdensome. We hope that our research provides a more comprehensive overview of the practical contextof mobility data use cases and relevant mobility characteristics that help to develop a set of diversesimilarity measures reflecting actual practitioners’ needs.

Acknowledgements

This work is part of the FreeMove project.I acknowledge the financial support by the Federal Ministryof Education and Research of Germany in the framework of the FreeMove project.

I hereby thank Helena Mihaljević for the constructive feedback and the other project members for theirvaluable input.

References

  • [1]D.Naboulsi, M.Fiore, S.Ribot, and R.Stanica, “Large-Scale Mobile TrafficAnalysis: A Survey,” IEEE Commun. Surv. Tutor., vol.18, no.1,pp. 124–161, 2016.
  • [2]S.Lai, A.Farnham, N.W. Ruktanonchai, and A.J. Tatem, “Measuring mobility,disease connectivity and individual risk: A review of using mobile phone dataand mHealth for travel medicine,” J. Travel Med., vol.26, no.3,p. taz019, Mar. 2019.
  • [3]B.Research, “DS-GVO und Corona–Datenschutzherausforderungen für die Wirtschaft,”https://www.bitkom.org/sites/default/files/2020-09/bitkom-charts-pk-privacy-29-09-2020.pdf,Sep. 2020.
  • [4]P.Ohm, “Broken Promises of Privacy: Responding to theSurprising Failure of Anonymization,” UCLA Law Rev., vol.57,p. 1701, Aug. 2009.
  • [5]M.Douriez, H.Doraiswamy, J.Freire, and C.T. Silva, “Anonymizing NYC TaxiData: Does It Matter?” in 2016 IEEE Int. Conf. DataScience and Adv. Analytics (DSAA).Montreal, QC, Canada: IEEE, 17, pp. 140–148.
  • [6]F.Xu, Z.Tu, Y.Li, P.Zhang, X.Fu, and D.Jin, “Trajectory Recovery FromAsh: User Privacy Is NOT Preserved in Aggregated Mobility Data,”Proc. 26th Int. Conf. on World Wide Web, pp. 1241–1250, Apr. 2017.
  • [7]M.C. Gonzalez, C.Hidalgo, and A.-L. Barabasi, “Understanding IndividualHuman Mobility Patterns,” Nature, vol. 453, pp. 779–82, Jul. 2008.
  • [8]A.Pyrgelis, C.Troncoso, and E.D. Cristofaro, “What Does The Crowd SayAbout You? Evaluating Aggregation-based Location Privacy,” Proc.Privacy Enhancing Technologies, vol. 2017, no.4, pp. 156–176, Oct. 2017.
  • [9]G.Cormode, S.Jha, T.Kulkarni, N.Li, D.Srivastava, and T.Wang, “Privacyat Scale: Local Differential Privacy in Practice,” inProc. 2018 Int. Conf. Manage. Data.Houston TX USA: ACM, May 2018, pp. 1655–1658.
  • [10]A.Hopkins and S.Booth, “Machine Learning Practices Outside Big Tech:How Resource Constraints Challenge Responsible Development,” inProc. 2021 AAAI/ACM Conf. AI, Ethics, and Society,ser. AIES ’21.New York, NY,USA: Association for Computing Machinery, Jul. 2021, pp. 134–145.
  • [11]Telefónica, “Unser Anonymisierungsverfahren in drei Schritten,”https://www.telefonica.de/analytics/anonymisierungsverfahren-in-drei-schritten.html.
  • [12]H.Barbosa, M.Barthelemy, G.Ghoshal, C.R. James, M.Lenormand, T.Louail,R.Menezes, J.J. Ramasco, F.Simini, and M.Tomasini, “Human mobility:Models and applications,” Phys. Rep., vol. 734, pp. 1–74, Mar.2018.
  • [13]M.Luca, G.Barlacchi, B.Lepri, and L.Pappalardo, “A survey on deep learningfor human mobility,” ACM Comput. Surv., vol.55, no.1, Nov. 2021.
  • [14]J.Wang, X.Kong, F.Xia, and L.Sun, “Urban Human Mobility: Data-DrivenModeling and Prediction,” ACM SIGKDD Explorations Newsletter,vol.21, no.1, pp. 1–19, May 2019.
  • [15]E.Toch, B.Lerner, E.BenZion, and I.Ben-Gal, “Analyzing large-scalehuman mobility data: A survey of machine learning methods and applications,”Knowl. Inf. Syst., vol.58, pp. 501–523, Mar. 2019.
  • [16]M.Fiore, P.Katsikouli, E.Zavou, M.Cunche, F.Fessant, D.L. Hello,U.Aivodji, B.Olivier, T.Quertier, and R.Stanica, “Privacy in trajectorymicro-data publishing: A survey,” Trans. Data Privacy, vol.13,p.91, 2020.
  • [17]F.Schlosser, B.F. Maier, O.Jack, D.Hinrichs, A.Zachariae, andD.Brockmann, “COVID-19 lockdown induces disease-mitigating structuralchanges in mobility networks,” Proc. Natl. Acad. Sci., vol. 117,no.52, pp. 32 883–32 890, Dec. 2020.
  • [18]X.Lu, L.Bengtsson, and P.Holme, “Predictability of population displacementafter the 2010 Haiti earthquake,” Proc. Natl. Acad. Sci., vol.109, no.29, pp. 11 576–11 581, Jul. 2012.
  • [19]G.Andrienko, N.Andrienko, P.Bak, D.Keim, and S.Wrobel, VisualAnalytics of Movement.Berlin Heidelberg: Springer-Verlag, 2013.
  • [20]M.Gruteser and D.Grunwald, “Anonymous Usage of Location-Based ServicesThrough Spatial and Temporal Cloaking,” in Proc. 1st Int. Conf.Mobile Systems, Applications and Services, ser. MobiSys ’03.New York, NY, USA: Association forComputing Machinery, May 2003, pp. 31–42.
  • [21]B.Hoh, M.Gruteser, H.Xiong, and A.Alrabady, “Enhancing Security andPrivacy in Traffic-Monitoring Systems,” IEEE PervasiveComput., vol.5, no.4, pp. 38–46, Nov. 2006.
  • [22]S.Bennati and A.Kovacevic, “Privacy metrics for trajectory data based onk-anonymity, l-diversity and t-closeness,” arXiv:2011.09218 [cs],Nov. 2020.
  • [23]M.E. Gursoy, L.Liu, S.Truex, and L.Yu, “Differentially Private andUtility Preserving Publication of Trajectory Data,” IEEETrans. Mob. Comput., vol.18, no.10, pp. 2315–2329, Oct. 2019.
  • [24]G.Acs and C.Castelluccia, “A case study: Privacy preserving release ofspatio-temporal density in paris,” in Proc. 20th ACM SIGKDD Int.Conf. Knowledge Discovery and Data Mining, ser. KDD ’14.New York, NY, USA: Association for ComputingMachinery, Aug. 2014, pp. 1679–1688.
  • [25]H.To, K.Nguyen, and C.Shahabi, “Differentially private publication oflocation entropy,” in Proc. 24th ACM SIGSPATIAL Int. Conf.Advances in Geographic Inf. Systems, ser. SIGSPACIAL ’16.Burlingame, California: Association forComputing Machinery, Oct. 2016, pp. 1–10.
  • [26]H.Roy, M.Kantarcioglu, and L.Sweeney, “Practical Differentially PrivateModeling of Human Movement Data,” in Data and ApplicationsSecurity and Privacy XXX, S.Ranise and V.Swarup, Eds., vol.9766.Cham: Springer Int.Publishing, 2016, pp. 170–178.
  • [27]S.L. Garfinkel, J.M. Abowd, and S.Powazek, “Issues Encountered DeployingDifferential Privacy,” Proc. 2018 Workshop Privacy in the ElectronicSociety, pp. 133–137, Jan. 2018.
  • [28]D.Calacci, A.Berke, K.Larson, Alex, and Pentland, “The tradeoff between theutility and risk of location data and implications for public good,”arXiv:1905.09350 [cs, math], Dec. 2019.
  • [29]Y.-A. de Montjoye et al., “On the privacy-conscientious use of mobile phonedata,” Scientific Data, vol.5, no.1, p. 180286, Dec. 2018.
  • [30]S.Kalkman, J.van Delden, A.Banerjee, B.Tyl, M.Mostert, and G.van Thiel,“Patients’ and public views and attitudes towards the sharing of health datafor research: A narrative review of the empirical evidence,” J. Med.Ethics, vol.48, no.1, pp. 3–13, Jan. 2022.
  • [31]L.A. Palinkas, S.M. Horwitz, C.A. Green, J.P. Wisdom, N.Duan, andK.Hoagwood, “Purposeful sampling for qualitative data collection andanalysis in mixed method implementation research,” Adm. Policy Ment.Health, vol.42, no.5, pp. 533–544, Sep. 2015.
  • [32]U.Kuckartz, “Qualitative text analysis: A systematic approach,” inCompendium for Early Career Researchers in Mathematics Education,G.Kaiser and N.Presmeg, Eds.Cham:Springer Int. Publishing, 2019, pp. 181–197.
  • [33]M.G. McNally, “The Four Step Model,” in Handbook of TransportModelling, 2nded.Bingley:Emerald Group Publishing Limited, Sep. 2007, pp. 35–53.
  • [34]P.Aichrothet al., “Anonymisierung und Pseudonymisierung von Daten fürProjekte des maschinellen Lernens,” Bitkom, Tech. Rep., 2020.
  • [35]N.Patki, R.Wedge, and K.Veeramachaneni, “The synthetic data vault,” in2016 IEEE Int. Conf. Data Science and Adv. Analytics (DSAA),Oct. 2016, pp. 399–410.
  • [36]B.Beringer, P.O. Uphaus, and H.Rau, “Usable Privacy in Location-BasedServices: The Need for a Regulatory Framework,” inComputational Science and Its Applications –ICCSA 2021, ser. Lecture Notes in Computer Science,O.Gervasiet al., Ed.Cham:Springer Int. Publishing, 2021, pp. 571–579.

Appendix

A Interview guide [translated from German]

Code assignment of questions

purposes
data sources
methods (renamed to: data analysis and modeling)
privacy

initial codes that were not used for further evaluation:User communication and legal

Welcome

  • Give a introduction of the research project.

  • State objective of the interview: “The interviews are a first step within our researchprojecttocapture the status quo of privacy of mobility data in practice: what data is available in the first place, howisit stored and analyzed. Therefore, in this interview, I would like to learn more from you about the threetopics: Data collection, data use, and data storage. I will guide you through the interview based onvariousquestions about these blocks, but this does not have to follow strict protocol - I welcome input that youfind relevant beyond the questions.In our project, data protection plays an important role - an often difficult and definitely sensitive topic. Iwould therefore like to emphasize again in advance that we do not want to imply any lack of measureswiththe questions or put you under any pressure to justify them. All questions that go into detail about dataprotection measures are solely intended to gain a better understanding of current practices so that wecanbring our research closer to reality. No results will be published that could be construed negativelytowardsyour company in any way. The results will of course be anonymized, i.e. no names of companies will bementioned.The same applies to the use of the data: we want to understand on a general level for which purposesdatais needed. No information will be published about your specific use cases that could reveal potentialbusiness secrets.”

  • Ensure consent form has been signed.

  • Confirm verbally that the consent to audio record the interview has been given.

- - - - - - - Start audio recording - - - - - - -

Interview questions

Examples in italic type writing can be used by the interviewer to clarify the question.

General

  • What is the product / service of your company?

  • How many employees are there in your company?

  • What is your position in the company?

Data collection: What personal mobility data do you collect?

  • Which personal mobility data do you collect yourself as acompany?

  • Which data do you purchase or get from third parties?

  • If there are multiple data sets:

    • Which of these data sets is the most relevant (the most challenging from a privacy perspective) toyour work / is used the most?
      (Focus on this data set for the rest of the questions)

  • What does the data look like in detail?

    • What geolocation technology is used to collect the data? (e.g., GPS, CDR, WiFi sensors)

    • How temporally and spatially granular is the data?

    • Is there additional information about the collected locations? (e.g., semantic information aboutthelocations,such as home, workplace, or restaurant)

    • How long is the average duration of a trajectory?

  • Personal reference of the data

    • About which persons is data collected? (e.g., all customers, app users, people passing asensor,…).

    • Over what period of time is mobility data available about a person? (e.g., anonymization after xdays?New user ID every x days?)

    • What other data is known about the user?(e.g., demographic data, place of residence, purchase information, subscriptions / contracts)

Data use: How will the data be used to gain insights for your purposes?

  • For what purposes is the data used?
    (Question to get started on data use: reporting, optimizing pricing models, advertising, etc.? Firstask in general terms, then ask in more detail for specific analyses thatare used.)

  • What types of analyses or modeling are performed? With whatgoal?(Depending on the answer, ask further in detail.)

    • Aggregate statistics

    • Detailed analysis of individual areas or users

    • Models, for prediction or classification

  • At what frequency are the analyses conducted? (e.g., regular reports, real-time, one-time analyses)

  • How have the analyses evolved over time?(e.g., have more been added steadily / become more complex, have different ones been tried anddiscarded)

  • What role does exploration of data, without specific prior targeting,play in yourwork?

  • Which explorations are carried out here?
    Ask for a specific example: what did the last exploration look like? What data, what analyses?

  • Is there additional data that you combine with yours?(e.g., purchased data, open data)

    • If so, which ones and how?

  • What impact do the insights from the data have on your actions?(e.g., positioning of mobility hubs, fare design, personalized advertising)

  • What further analysis or modeling is planned for the future?

    • What insights do you expect to gain from these analyses?

    • What would be the potential impact of the findings?

  • What further analyses or modeling would you do, assuming therewere nohurdles? (e.g.,amount of data,legalrestrictions, computing capacity, or similar)

    • What insights would you hope to gain from these analyses?

    • What would be the implications?

    • What hurdles exist to these analyses not being conducted?

  • Are there privacy measures being applied to the analysis?(e.g., limit on queries, only certain queries, synthetic data generation, k-anonymity)

    • If so, by whom were these initiated?

  • What technical or legal constraints do you have on data use?

  • Are there any analyses or modeling that you have not donebefore due to privacyconcerns? Whichones?

Data storage: How is the data stored?

  • How long is the data stored?

  • In which format is the data stored?
    (e.g., database, single files)

  • Is the data being stored anonymously?

    • If yes, how?

  • Who has access to the data?
    (e.g., individuals, specific departments, the whole company)

  • How is this access documented and controlled?

  • Is the data passed on to third party data service providers?

    • If so, in what form?

  • Are there restrictions on access?
    (e.g., are only certain queries possible? Is access to raw data possible?)

  • Are there data security measures that are taken in datastorage?

  • What other technical or legal restrictions do you enforceregarding datastorage?

User communication

  • Are individuals informed about data collection or processing?

    • If yes, how?

  • On what legal basis is the data collected or processed?(e.g., consent, contract, legitimate interests, legal basis)

B Recruitment email [translated from German]

Dear NAME,

Within the framework of the BMBF-funded research project freeMove, we are working on a dataprotection-compliant use of personal mobility data. As an employee of COMPANY NAME, we cordiallyinvite you to actively participate in our research project in the form of an expert interview.

With the help of these interviews, we would like to gain a better understanding of the use of personalmobility data in practice. Accordingly, we would like to learn more from you about your daily work atCOMPANY NAME. This knowledge will feed into the transdisciplinary research on privacy-compliantprocessing of mobility data. The aim of the research project is to develop practical and legally compliantrecommendations for action that simplify work with personal mobility data and make it faster and moretransparent.The content of the interviews will be used for research purposes and will only be published after strictanonymization.

CONTACT PERSON NAME is your contact person for scheduling an interview.

DETAILS TO SCHEDULE A MEETING

More information about the research project can be found on our website www.freemove.space and in theattached PDF document. If you have general questions about the project process, goals, and initial projectresults, you can contact CONTACT PERSON EMAIL.If you are unable to participate in the interview yourself, we would also be pleased if you could forward yourquestions to your colleagues.

With kind regardsThe freeMove Team

C Text informed consent form [translated from German]

Research project: FreeMove
Performing institution: HTW Berlin
Interviewer: Alexandra Kapp
Interviewee: xxx
Interview date: xx.xx.2021

The BMBF-funded transdisciplinary project FreeMove explores privacy-friendly collection and analysis ofmobility data. The aim of the project is to develop recommendations for action for the handling of personalmobility data.

As part of the scientific research project, the Department of Computer Science at the Hochschule fürTechnik Berlin (HTW Berlin) will conduct expert interviews with employees from administration andbusiness. The purpose of the interview is to gain a sound understanding of the real-world handling and useof personal mobility data.

Personal data is processed, such as the name and employer of the interviewee, and other concreteinformation that could result from the interview because it is revealed by the interviewees.To facilitate the use of the study results and to verify or post-correct the notes written down by theinterviewer, the interviews are recorded. In this process, the voice of the interviewee will be stored for theduration of the transcription process, but will be deleted no later than December 31, 2021.

The transcription will be be supported by transcription software called ’Trint’. In this process, data may betransferred to the UK as ’Trint’ is based in the United Kingdom (UK). Should data be transferred to the UK,this will be done on the basis of the European Commission’s adequacy decision of 28 June 2021, whichrecognizes the UK as a third country with an adequate level of protection. Further information can be foundin the privacy policy of ’Trint’. This is available at https://trint.com/privacy-policy.The scientific analysis of the interview is carried out exclusively by the staff of the FreeMove researchproject. All employees who have access to the interview texts are obliged to maintain data secrecy.

All results will be published exclusively anonymously and without any possible conclusions about individualcompanies, organizations or persons.

Under the above-mentioned conditions, I agree to participate in the interview as part of the FreeMovescientific research project and consent to the recording, transcription, anonymization and analysis for theabove-mentioned purpose. I also agree that my data may be processed using the software ’Trint’ tofacilitate the transcription process and that data may be transferred to the UK.My participation in the interview and my hereby given consent to the processing of my personal data arevoluntary.

I am entitled at any time to request HTW Berlin to provide me with comprehensive information about thedata stored about me.I may at any time request HTW Berlin to correct, delete, block and transfer individual personal data, as wellas to restrict processing.In addition, I can exercise my right to object at any time without giving reasons and modify or completelyrevoke the granted declaration of consent with effect for the future. For this purpose, an e-mail toAlexandra Kapp alexandra.kapp@htw-berlin.de is sufficient. I will not suffer any disadvantages as a resultof refusal or revocation.

I hereby confirm that I have been informed in detail about the aim and the course of the research projectand about my rights.

DATE, SIGNATURE INTERVIEWER

DATE, SIGNATURE INTERVIEWEE

D Study procedure with respect to ethical considerations

  1. 1.

    Recruitment: email with detailed information about research objective

  2. 2.

    Interview

    • Get signed informed consent form which informs about research objective, audio recording,transcription software, anonymization, and analysis of the interview

    • Provide verbal information about research objective and preservation of anonymity at the beginningof the interview

    • Get additional verbal consent on audio recording

    • Start audio recording and interview

  3. 3.

    Transcription

    • Transcribe interviews with transcription software (explicitly stated in consent form)

    • Proof-reading by interviewer

  4. 4.

    Deletion of audio recordings

  5. 5.

    Anonymization: Removal of participants names and company names from transcripts

  6. 6.

    Data storage

    • Storage of printed consent forms in a secured location of the research institution separated fromtranscripts

    • Encrypted storage of transcripts

  7. 7.

    Data evaluation

Collection, usage and privacy of mobility data in the enterprise and public administrations (2024)

References

Top Articles
Latest Posts
Article information

Author: Catherine Tremblay

Last Updated:

Views: 5934

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Catherine Tremblay

Birthday: 1999-09-23

Address: Suite 461 73643 Sherril Loaf, Dickinsonland, AZ 47941-2379

Phone: +2678139151039

Job: International Administration Supervisor

Hobby: Dowsing, Snowboarding, Rowing, Beekeeping, Calligraphy, Shooting, Air sports

Introduction: My name is Catherine Tremblay, I am a precious, perfect, tasty, enthusiastic, inexpensive, vast, kind person who loves writing and wants to share my knowledge and understanding with you.