Data collected by Google cars

Tuesday, April 27, 2010 | 1:01 PM

Labels: , ,

[Editor's note, 5/14/10: This post contains incorrect information about our WiFi data collection (see * below). We have posted a clarification and update about our process on the Official Google Blog.]

Over the weekend, there was a lot of talk about exactly what information Google Street View cars collect as they drive our streets. While we have talked about the collection of WiFi data a number of times before--and there have been stories published in the press--we thought a refresher FAQ pulling everything together in one place would be useful. This blog also addresses concerns raised by data protection authorities in Germany.

What information are your cars collecting?
We collect the following information--photos, local WiFi network data and 3-D building imagery. This information enables us to build new services, and improve existing ones. Many other companies have been collecting data just like this for as long as, if not longer, than Google.

  • Photos: so that we can build Street View, our 360 degree street level maps. Photos like these are also being taken by TeleAtlas and NavTeq for Bing maps. In addition, we use this imagery to improve the quality of our maps, for example by using shop, street and traffic signs to refine our local business listings and travel directions;
  • WiFi network information: which we use to improve location-based services like search and maps. Organizations like the German Fraunhofer Institute and Skyhook already collect this information globally;
  • and 3-D building imagery: we collect 3D geometry data with low power lasers (similar to those used in retail scanners) which help us improve our maps. NavTeq also collects this information in partnership with Bing. As does TeleAtlas.
What do you mean when you talk about WiFi network information?
WiFi networks broadcast information that identifies the network and how that network operates. That includes SSID data (i.e. the network name) and MAC address (a unique number given to a device like a WiFi router).

Networks also send information to other computers that are using the network, called payload data, but Google does not collect or store payload data.*

But doesn’t this information identify people?
MAC addresses are a simple hardware ID assigned by the manufacturer. And SSIDs are often just the name of the router manufacturer or ISP with numbers and letters added, though some people do also personalize them.

However, we do not collect any information about householders, we cannot identify an individual from the location data Google collects via its Street View cars.

Is it, as the German DPA states, illegal to collect WiFi network information?
We do not believe it is illegal--this is all publicly broadcast information which is accessible to anyone with a WiFi-enabled device. Companies like Skyhook have been collecting this data cross Europe for longer than Google, as well as organizations like the German Fraunhofer Institute.

Why did you not tell the DPAs that you were collecting WiFi network information?
Given it was unrelated to Street View, that it is accessible to any WiFi-enabled device and that other companies already collect it, we did not think it was necessary. However, it’s clear with hindsight that greater transparency would have been better.

Why is Google collecting this data?
The data which we collect is used to improve Google’s location based services, as well as services provided by the Google Geo Location API. For example, users of Google Maps for Mobile can turn on “My Location” to identify their approximate location based on cell towers and WiFi access points which are visible to their device. Similarly, users of sites like Twitter can use location based services to add a geo location to give greater context to their messages.

Can this data be used by third parties?
Yes--but the only data which Google discloses to third parties through our Geo Location API is a triangulated geo code, which is an approximate location of the user’s device derived from all location data known about that point. At no point does Google publicly disclose MAC addresses from its database (in contrast with some other providers in Germany and elsewhere).

Do you publish this information?
No.

But wouldn’t GPS enable you to do to all this without collecting the additional data?
Yes--but it can be much slower or not available (e.g. when there is no view of the sky; when blocked by tall buildings). Plus many devices don’t have GPS enabled. GPS is also expensive in terms of battery consumption, so another reason to use WiFi location versus GPS is to conserve energy.

How does this location database work?
Google location based services using WiFi access point data work as follows:
  • The user’s device sends a request to the Google location server with a list of MAC addresses which are currently visible to the device;
  • The location server compares the MAC addresses seen by the user’s device with its list of known MAC addresses, and identifies associated geocoded locations (i.e. latitude / longitude);
  • The location server then uses the geocoded locations associated with visible MAC address to triangulate the approximate location of the user;
  • and this approximate location is geocoded and sent back to the user’s device.
How do your cars collect this WiFi data?
Visibly attached to the top of the car is a commercially available radio antenna. This antennae receives publicly broadcast WiFi radio signals within range of the vehicle. The equipment within the car operates passively, receiving signals broadcast to it but not actively seeking or initiating a communication with the access point.

Why didn’t you let the German DPA see the car?
We offered to let them examine it last year --it is totally untrue to say we would not let them see the car. They are still welcome to do so.

How do you collect 3-D building imagery?
We collect 3D geometry data with low power lasers (similar to those used in retail scanners).

Is this safe?
Yes.

You can also read the WiFi
submission we made today to several national data protection authorities.

Posted by Peter Fleischer, Global Privacy Counsel

** Added additional sentence to first bullet point.


42 comments:

tonfa said...

We do not believe it is illegal--this is all publicly broadcast information which is accessible to anyone with a WiFi-enabled device.

If you truly believe this information is not identifying, then you should release the data (this will enable other to use it, and people won't have to rely on google's server to do geolocation, which is a privacy concern too).

JSG said...

Are you serious?

People are bothered by Google collecting data and keeping it private. Any WLAN-gadget can only as much as ask "i'm near the Router with MAC xyz, where am I?" - and people cry.

You suggest, that Google should make the data public. That would be a serious privacy concern.

A german politician just said, privacy would start at 2m, as Streetview is higher, it should be forbidden. Well, hopefully being taller than 2m and windows in trucks and busses above 2m won't be forbidden aswell....

tonfa said...

@JSG:

What is better (I'm only talking about BSSID/ESSID)?
1) Google doesn't collect any data
2) Google collects data and keeps it private
3) Google collects data and makes it public

In my opinion, 2 is the less interesting option, it still make Google able to have potentially private data, and because it gives them a monopoly on geolocation it allows them to gather even more information, especially if identifying information are sent during geolocation queries (in android, chromium, etc.).

(and it gives them a competitive advantage, since they are able to use geolocation queries to refine their data: when people have fixed IPs, with the help of previous geolocation queries they can find out the precise location of an IP)

And keep in mind people already collect this data, and sell it. So if you're concerned about this being available, it's already the case, except it costs money.

Scott Cleland said...

I am flattered to see that you all are reading and responding in part to my PrecursorBlog.com posts, in particular, my post:
http://precursorblog.com/content/questions-google-its-latest-act-privacide-part-xxi-privacy-vs-publicacy-series
Google would engender much more trust if it did not sneak around mass violating people's privacy, but was upfront about what Google wanted to do with people's private information in advance, and gave people and authorities time and opportunity to give input, before their privacy was ireparably compromised.
Your post would not have been necessary if Google had a track record of respecting privacy and being forthright with users; unfortunately Google's track record is the opposite.
Almost all of Google's problems are self-inflicted, where Google presumes it always knows best and operates accordingly, hardly ever asking for permission/authorization for things users and governments believe they should be asked about.
Moreover, Google's response/defense that others do these same things may be irrelevant because Google has uniquely claimed to always operate under a "don't be evil" corporate standard of behavior... Thus its not about others' standards, its about whether Google lives up to its own lofty representations Google deliberately set high to engender trust...
This post was a step in the right direction. Every real journey begins with a single step... lets see if Google continues to be more responsible, transparent, and accountable... I genuinely hope so.
Scott Cleland, Publisher of GoogleMonitor.com, and Googleopoly.net

God Is Dead? said...

I wonder if this is going on in the United States?

I hope not.

Benjamin said...

@tonfa:

You are contradicting yourself when you say that collecting MAC address data "gives them [Google] a monopoly on geolocation" and "people already collect this data". Clearly Google does not have a monopoly on geolocation. Other companies do that same thing.

Of course it would give them a competitive advantage. If not, why would they do this? They are a company, after all. Again, other companies already do the exact same thing.

As for the data already being available for a price, that's prohibitive in itself. Probably only larger companies interested in that data would pay for it, and it wouldn't be available to just anyone who is interested.

In my opinion, Google collecting this data is a good thing. They can offer up better services and many times for free. Google Maps is free. They offer up the use of their maps API for free, and if you're writing mobile software you can tap into this database for free. Why would I want to store and manage this data myself? Who knows what it would take to process and store the data? Server farms? Data warehouses?

Google is doing us a favor. Think about that.

Mark said...

My Wifi SSID contains my last name, but is usually hidden. Therefore I neither want it to be catalogued nor provided to unknown end-users for whatever reasons, of course. Do you respect that? Do you take precautions to not harvest personalized Wifi data?
(The same question goes to your competitors.)

Vincent said...

The fact that other services do it too, is a cheap excuse imho, as doing what others do still doesn't make it right.

I've read a great proposal in an article last week, that Google should handle the information they have about people like banks handle the money of people..

Everybody should have access to what's being collected (and I mean 'access', not just a display page like 'dashboard') and have every possibility to restrict and customize their personal settings..

I use lots of Google services myself and am not at all one of the haters who think the internet and everything with it is evil.

But transparency will still make this whole thing a lot easier and people who know what they get are happier people..

tonfa said...

@God Is Dead:

Sure this is going one in the US, there. Privacy is usually better protected in Europe where it comes before the freedom of speech (probably due to historical reasons, 2nd world war and Stasi for example).

@Benjamin:

Does anyone know how skyhook does it? Do they collect the data themself? or do they buy it to local companies in each country?

There's still one difference between skyhook and google: skyhook only provides that service, while google uses geolocation as part of other end-user product, at least in Chrome and Android. In those product I don't think the user is aware he is sending data to google's server (knowing what is known and stored by Google on geolocation queries would be nice).

@Mark

I don't think they actually need the ESSID, the MAC (BSSID) should be sufficient and is not usually not identifying (I don't think vendors can track where each wifi routeur went).

christian said...

I am not sure I understand why you collect SSIDs if you have the MAC address. SSIDs are totally irrelevant, could change at any time and potentially contain personal data (like your name).

The other thing I am wondering whether you collect "hidden" WLAN SSIDs. They can usually not be seen but we know some tools make them visible. Are such hidden Accesspoint information collected ? That would seem unfair since the owner marked an interest to not be visible.

mandavi said...

Is there a possibility to have a certain SSID deleted? I included my email in it and don't want anyone to be able to connect my email address to where I live.

Comp said...

@Mark and @mandavi

If you are so worried about your names, email addresses, etc. Why would you put them on something anyone driving by with an iphone will see?

If you don't want people knowing your private information, stop advertising it. Your SSID only needs to be unique enough so YOU know where to connect to.

@tonfa and @christian

ESSID is a far safer piece of information to reveal. If people are shown MAC addresses and given the way that ethernet works, people could then start listening for packets destined for that MAC address. With that you can relate data packets to physical addresses without even being in the area. The SSID is only useful if your within 1/4 mile of the hotspot.

DooMMasteR said...

@Mark: if it is hidden thy can only catch your MAC and that is it
as the MAC does not reveal your identity there is no harm done

but as some chalanges do reveal your networks name, you should maybe consider to change it

Guillermo Lo Coco said...

I can do the same.

Why German politics are so st....d ?

Why Google should release these information when you can do the same by your self?

If you want google SSID information, pay it ! or take a car and pay a driver to scan all the city. The result will be the same.

Best regards.

ronfar said...

I'm amazed at some of the posts here...

If you are concerned about your WLAN SSID, you are either not hiding it, you are using its standard name, or putting things in the name which should not be there. As I've just read, what is an email address doing in an SSID? Don't put it there in the first place!

I think Google is really doing us all a favor by offering maps and other services for free. They cannot do that without collecting data, lots of it. I've always said, as long as Google is not bought by a bank, I will not be concerned. Banks, governments, even ISP's (all your Internet traffic passes through your ISP!), they all have more personal info about you than Google.

And get real. It's 2010. Information cannot be hidden anymore. Even 30 years ago, when electronic phone exchanges made their appearance, just picking up the phone -that's without even dialing a number-, even that was logged somewhere. How many people know that many ISDN phones can be activated remotely to listen to what's happening in the room where the phone is located???

And you're concerned about SSID names being collected by Google??? A bit of common sense, please.

bobzilla said...

Netstumbling has been around for almost a decade now...wigle.net

bboissin said...

I'm talking about the BSSID: http://en.wikipedia.org/wiki/Service_set_(802.11_network)#Basic_service_set_identifier_.28BSSID.29

I don't think base station often travel, and the user don't put information in it (while he might do it in the ESSID), and contrary to the ESSID it is supposed to be unique.

jazzsp8 said...

I don't really see the problem at all, if you've a bit of common sense then none of this will effect you.

If your broadcasting a signal with your WiFi then you should be aware that anyone who walks near your house with an enabled phone in their pocket can pick up on your device, therefore putting any kind of personal information in that broadcast is kinda dumb.

Most of the people that are running around screaming about privacy and violations thereof aren't using common sense and really need to start educating themselves about the digital age.

I sincerely hope and doubt that nobody would leave there wallet outside on the driveway with a post it note attached with the relevant Pin numbers inside for the credit cards; as far as I'm concerned this is pretty much the same thing, don't give anyone walking by the opportunity to get in your stuff by making it blatantly obvious how to get it.

mandavi said...

@Comp sure, in a city full of crime one could say, if you are worried to leave your apartment, just don't - what is pizza delivery for? fact is, i had reasons to put my email in the SSID (give the possibility to others to share my connection) and i don't worry about the guy with his i-phone, i worry about data mining. now a company comes and wants to take this possibility from me and others. with what right?

jcollar said...

@mandavi

Sharing your connection is no reason to use your e-mail address as your SSID. Do you mean your WEP or WPA key? Those keys cannot be collected by a scan.

Any company has a right to take any data which you freely publicize. Is the postal service not allowed to know your house number even though it is posted on the front of your home?

bboissin said...

@jcollar

you might want people to be able to mail you, and ask you for the key (and to this end you can put your mail in your ESSID).

J.delanoy said...

And if you care that anyone walking by can see your email address, just create a new Gmail account and set it to forward all mail to your real account, then put the new gmail address as your SSID.

Gmail's spam filter will nuke anyone who tries to take advantage of your address, and your real email address will remain hidden. I use this strategy all the time when I have a reason to publicly post my email address somewhere.

Derek Kerton, The Kerton Group said...

Oh Noeth!

I hath learned that one italian by the name of Marco Polo is sailing his ship around the world, visiting foreign lands, and mapping out the shores.

Some of those shores are OURS. People, rise against the tyranny of gathering information for personal gain, sharing among interested parties, and the abdication of our privacy to keep our shorelines obscure!

And lo, if it is not far more evil! If we put a big sign at our port with the name of the port, and advertising the availability of some form of service in it's sweet, safe harbours, this Marco Polo, cartographer from Hades, will take a note of that name and offering, and include it in his ships logs!! The collection of such private data is a sin before our lord. Consider our knickers to be in a painful twist!

Some unwise people, in the messages above, have even put private data on these ocean-facing signs. But they only intended those for their own consumption - how could they have expected someone else's ship to sail by the coast, and invade the sanctity of that most private information which they had broadcast out to sea?

Written in the year of our lord, 2010.

PS: Google, please stop re-enforcing the mistake that "triangulation" is at all at play here. There are no angles used in the position calculations. It is "Tri-lateration", where relative distances are used.

D. Reed Hall said...

As I read these comments, I am stunned by people's ignorance of the word "public". If you are in a "public" place, I have the right to photograph you. If you are in a public place, I have the right to record what you say and do. The airwaves are public property. It has long been understood that cell phone conversations captured by devices available at Radio Shack are permissible. So what is the problem with Google paying their employees or third party employees to collect MAC addresses over "public" airwaves? If confidentiality is required, their are ways to encrypt data. But, for most of us, we are content to broadcast our cell phone conversations and our computer data over the "public" airwaves. And, if a company, any company, wants to pay people to go around and collect the data floating in the public airwaves, then they have the right to do so.
David Hall
A civil libertarian who understands the use of the word "public".

bboissin said...

@D. Reed Hall

And that is were the european version of public and privacy really conflict. Even if something is public, even if you can take a picture of me, you don't have to right to publish it.

And if instead of just taking a picture, it's thousand of picture, with a lot of people, then you're building a digital file with personally identifying data, and that's even more protected and you'll have to ask for permission from a data privacy authority (e.g. CNIL in France).

Zelden said...

Well i read most of this well what i have to say is they are going to do it anyway and even if there is law against it well they have plenty of lawyers that can get around them.

Igor said...

@christian [April 28, 2010 9:38 AM]:

I believe Google collects SSIDs for the same reason redundancies are generally used in digital technology: error correction.

The WiFi topology is in constant change: people move in and out, buy new routers, and change their SSIDs. You need some redundancy to offset that.

For example, if you bought a new router, you'd get a new MAC address. But, if you already have a couple of computers with set up WiFi connectivity, you'd probably give your new router the same SSID the old one had, right? That's why SSID might be interesting for Google...

steeleweed said...

I am shocked (but not surprised) at the irrational mental processes by which people conclude that public data is somehow private. Should I sue everybody in NYC for seeing me as I walk down the street? For hearing me as I speak? In fact, if all of you reading this will be so kind as to identify yourselves, you can expect a letter from my lawyer.

Steve said...

@ Peter Fleischer

Please don't use the abbreviation DPA for the german Datenschutzbeauftragten (data protection authorities). In Germany the three letters DPA stand for Deutsche Presse-Agentur (German News Agency). The usage of DPA for the "Datenschutzbeauftragten" might be confusing

bboissin said...

@steeleweed

So you don't see the difference between walking somewhere, and posting a public comment on a global media?

And yes, personality rights in Europe protects people differently than in the US, and they are not likely too change...

mandavi said...

@bboissin thanks for answering correctly
@J.delanoy if google would have cared to announce in advance what they are planning to do - yes, i would have done so... now that they announce it when they finished already it seems a little late for that :(

Alexandru said...

I agree the payload data may be too much. Imagine Google also wiretapping your phone conversations while taking images for Google HomeView.

But otherwise this is a great free service for any potential homebuyers.

Aatch said...

I'm not sure what the issue is here... While there is a difference between private data and public data. And a difference between data collection and observation, what Google have done is not particularly bad. If you put personally identifying information in a publicly broad-casted SSID, then that information is free for anyone to use.

You can't stop people from using your email address, or.. you know what, i'm not sure what you can do with just a last name...

Anyway, the point is that people knowing your email address and abusing that fact is the same as the issue of somebody ordering pizza to your house because they know your address. In fact its less of an issue because you can easily hide and change your email address, you can't hide where you live...

Renegade said...

Hey!

Would all of you please stop arguing and discussing these issues!?

It is really time to get the new StreetView features online and running!

I'm tired of just reading about it and never being able to use it!

Cheers

Onno said...

Enough with Google. The payload thing IMHO is a serious privacy violation and I hope the EU will take legal actions against Google for this. "Do no evil", riight.
Google -once a respected company driving innovation and providing a revolutionary search- has dropped to zero on my list.
Google, you're evil. I hate you more than Apple now.

Dumbass said...

Oops. "Huh, where do these gigabytes of private emails come from?"
Yeah. I once stole a car, spied my neighbors, murdered a stranger - but I didn't notice! Shit happens, nevermind, it was only by accident...

The fact that you have methods implemented in your code that spies on wifi-connections speaks for itself...

noor iddeen said...

If you guys are really so much worried about your privacy, you should stop using emails from any provider, as they all have access to your emails that have been sent or received, and therefore they collect private data.........I'm so much with google, they are brining our world many cool technologies, mostly for free.

bboissin said...

@noor iddeen:

well, usually mail hosters provide terms of service for their customers, and they respect privacy laws...

The problem is not about collecting private data, but collecting them without authorization and/or safeguards.

myspcars said...

Well i read most of this well what i have to say is they are going to do it anyway and even if there is law against it well they have plenty of lawyers that can get around them.

p3n73s7 said...

This is extremely stupid discussion talking about collecting ESSID which is available to anyone passing by,as long as google doesn't collect any payload data it is absloutely legal and they are doing a great service to you for free.So imho we should respect that and yes there is no privacy....

ThaNerd said...

I have a problem with this. I'm using a mi-fi (a 3G modem coupled with a wifi access point), and apparently crossed one of those google cars when riding in town, about 10 kilometers away from home. Now, whenever i'm using my mifi device and it is the only wifi network in sight for my iPod, the google maps application considers i am where the google car actually detected my mifi... So if i'm 100km away from that spot and have no wifi AP in sight, the maps application "moves me" 100km away. I'd like to have a publicly available form where i could ask google to actually blacklist my mifi's MAC address... Is there such thing?

Linker said...

wow i didn't know that Google cars collect so much data, this is very interesting and I wonder how the data is used