Information Processing: internet

Showing posts with label internet. Show all posts

Tuesday, January 15, 2019

2019 Digital Business Projections

Love this guy.

Monday, September 03, 2018

PanOpticon in my Pocket: 0.35GB/month of surveillance, no charge!

Your location is monitored roughly every 10 minutes, if not more often, thanks to your phone. There are multiple methods: GPS or wifi connections or cell-tower pings, or even Bluetooth. This data is stored forever and is available to certain people for analysis. Technically the data is anonymous, but it is easy to connect your geolocation data to your real world identity -- the data shows where you sleep at night (home address) and work during the day. It can be cross-referenced with cookies placed on your browser by ad networks, so your online activities (purchases, web browsing, social media) can be linked to your spatial-temporal movements.

Some quantities which can be easily calculated using this data: How many people visited a specific Toyota dealership last month? How many times did someone test drive a car? Who were those people who test drove a car? How many people stopped / started a typical 9-5 job commute pattern? (BLS only dreams of knowing this number.) What was the occupancy of a specific hotel or rental property last month? How many people were on the 1:30 PM flight from LAX to Laguardia last Friday? Who were they? ...

Of course, absolute numbers may be noisy, but diffs from month to month or year to year, with reasonable normalization / averaging, can yield insights at the micro, macro, and individual firm level.

If your quant team is not looking at this data, it should be ;-)

Google Data Collection
Professor Douglas C. Schmidt, Vanderbilt University
August 15, 2018

... Both Android and Chrome send data to Google even in the absence of any user interaction. Our experiments show that a dormant, stationary Android phone (with Chrome active in the background) communicated location information to Google 340 times during a 24-hour period, or at an average of 14 data communications per hour. In fact, location information constituted 35% of all the data samples sent to Google. In contrast, a similar experiment showed that on an iOS Apple device with Safari (where neither Android nor Chrome were used), Google could not collect any appreciable data (location or otherwise) in the absence of a user interaction with the device.

e. After a user starts interacting with an Android phone (e.g. moves around, visits webpages, uses apps), passive communications to Google server domains increase significantly, even in cases where the user did not use any prominent Google applications (i.e. no Google Search, no YouTube, no Gmail, and no Google Maps). This increase is driven largely by data activity from Google’s publisher and advertiser products (e.g. Google Analytics, DoubleClick, AdWords)11. Such data constituted 46% of all requests to Google servers from the Android phone. Google collected location at a 1.4x higher rate compared to the stationary phone experiment with no user interaction. Magnitude wise, Google’s servers communicated 11.6 MB of data per day (or 0.35 GB/month) with the Android device. This experiment suggests that even if a user does not interact with any key Google applications, Google is still able to collect considerable information through its advertiser and publisher products.

f. While using an iOS device, if a user decides to forgo the use of any Google product (i.e. no Android, no Chrome, no Google applications), and visits only non-Google webpages, the number of times data is communicated to Google servers still remains surprisingly high. This communication is driven purely by advertiser/publisher services. The number of times such Google services are called from an iOS device is similar to an Android device. In this experiment, the total magnitude of data communicated to Google servers from an iOS device is found to be approximately half of that from the Android device.

g. Advertising identifiers (which are purportedly “user anonymous” and collect activity data on apps and 3rd-party webpage visits) can get connected with a user’s Google identity. This happens via passing of device-level identification information to Google servers by an Android device. Likewise, the DoubleClick cookie ID (which tracks a user’s activity on the 3rd-party webpages) is another purportedly “user anonymous” identifier that Google can connect to a user’s Google Account if a user accesses a Google application in the same browser in which a 3rd-party webpage was previously accessed. Overall, our findings indicate that Google has the ability to connect the anonymous data collected through passive means with the personal information of the user.

Wednesday, August 30, 2017

Normies Lament

This interview with the Irish Times (not Ezra Klein) is much better than the one I originally linked to below.

###############

Ezra Klein talks to Angela Nagle. It's still normie normative, but Nagle has at least done some homework.

Click the link below to hear the podcast.

From 4Chan to Charlottesville: where the alt-right came from, and where it's going

Angela Nagle spent the better part of the past decade in the darkest corners of the internet, learning how online subcultures emerge and thrive on forums like 4chan and Tumblr.

The result is her fantastic new book, Kill All the Normies: Online Culture Wars From 4Chan And Tumblr to Trump and the Alt-Right, a comprehensive exploration of the origins of our current political moment.

We talk about the origins of the alt-right, and how the movement morphed from transgressive aesthetics on the internet to the violence in Charlottesville, but we also discuss PC culture on the left, demographic change in America, and the toxicity of online politics in general. Nagle is particularly interested in how the left's policing of language radicalizes its victims and creates space for alt-right groups to find eager recruits, and so we dive deep into that.

Books:

Civilization and Its Discontents by Sigmund Freud

This Is Why We Can't Have Nice Things: Mapping the Relationship between Online Trolling and Mainstream Culture by Whitney Phillips

The Net Delusion: The Dark Side of Internet Freedom by Evgeny Morozov

Monday, September 05, 2016

A secret map of the world (Venkatesh Rao / Ribbonfarm)

This is Venkatesh Rao's conceptual map of the world (as seen from Silicon Valley / the internet). Details in the video and this blog post.

In case you can't make out all the features on the map, here is a hi-res version. See also this other map.

Some places of note:

Isle of Deep Learning
Isle of Physics
Moldbug's Lair
Alt-Right Hills
Dark Enlightenment Volcano
Paleo Crossing
Satoshi Mines
Secret Cloud Empire of Amazon
Fjords of Sisu
Algomonopolia (Google, Facebook, ...)
a16z Unicorn Hunting Ground
Lean Startup Town
SJW Cathedral
Manosphere Tar Pit
Global Bro-Science Laboratory
NSA
Academia
Efficient Market Temple
Graveyard of Boomer Dreams
Ghost of Industrial Past

If these memes are unfamiliar, you need to spend more time on the internet or in the bay area :-)

Sunday, August 07, 2016

Podcast: Clay Shirky on tech and the internet in China

Highly recommended. Unfortunately I can't embed the podcast here so you'll have to click through.

In this episode of Sinica, Clay Shirky, the author of Here Comes Everybody who has written about the internet and its effects on society since the 1990s, joins Kaiser and Jeremy to discuss the strengths and weaknesses of China’s tech industry and the extraordinary advances the nation has made in the online world.

The hour-long conversation delves into the details and big-picture phenomena driving the globe’s largest internet market, and includes an analysis of Xiaomi’s innovation, the struggles that successful Chinese companies face when taking their brands abroad and the nation’s robust ecommerce offerings.

Clay has written numerous books, including Little Rice: Smartphones, Xiaomi, and the Chinese Dream in addition to the aforementioned Here Comes Everybody: The Power of Organizing Without Organizations. He is also a Shanghai-based associate professor with New York University’s Arthur L. Carter Journalism Institute and the school’s Interactive Telecommunications Program.

Related: NYTimes video explaining WeChat. (The future is here, it's just not evenly distributed.)

Friday, May 30, 2014

Internet trends 2014

KPCB Internet trends 2014 from Kleiner Perkins Caufield & Byers

Sunday, April 14, 2013

Why blog? A professor responds

A colleague responds to my earlier post Blogging professors, on how universities might encourage more faculty blogging.

What I had in mind was a university-wide platform that would aggregate the output of participating faculty. This kind of branded expert channel might have a place amid the economic collapse in journalism we are currently experiencing. If Huffington Post is worth $315 million (OK, not really, just another dumb move by AOL), what might a platform showcasing 100 clever faculty from a major research university be worth? 100 bloggers (say, each posting once every 10 days or so = 10 new posts per day) out of 2000 MSU faculty doesn't sound too crazy, does it?

Hi Steve,

I liked reading your "Blogging Professors" post, since I've thought several times, "Should I write a blog?" But I've also thought, "Why does anyone bother to write a blog?" The reasons to write are, as you note, to propagate one's "fabulous ideas and opinions worthy of wider attention and discussion" and to create dialogs and conversations. My own reasons not to write have been (1) that it would take time, and I have too little time as it is, and (2) that I doubt I'd be likely to make even the slightest ripple in the vast pool of the internet.

Reason (1) is, I'm sure, obvious. It's hard to find "work time" between experiments, meetings, classes, seminars, journal clubs, staring at data, writing analysis code, talking to students, planning classes, teachings classes, reading papers, reading books, and probably several other things I'm forgetting. And "free time" has its own constraints, and any new activities would have to compete with things I'm very fond of, like wandering the public library with the kids, or playing games with them in random taquerias, or painting pictures myself (which, sadly, has been steadily dwindling in frequency).

Of course, I'm sure most commenters will point out that it's all a matter of incentives: I have no incentive, as a faculty member, to blog. This is true, but not very explanatory in itself. We all do plenty of things that don't have concrete incentives. This past week, I've spent about two hours reviewing a paper. Next week I'll spend at least half an hour with a postdoc (not from my lab) starting a faculty position (elsewhere) giving advice on grants. Later this term, I'll probably put a lot of work into a talk on [ geeky science topic involving microscopy; unspecified to preserve anonymity ] for a journal club I don't usually attend -- it's a fascinating topic I've gotten increasingly involved with. I certainly don't get any reward from the University (or even the department) for doing these sorts of things. So why do them? In all these cases, there's some combination of reciprocity (I publish articles in journal X, so I should review papers for journal X), or personal interactions (I like to have conversations with colleagues), or both. Is any of this the case for blogging?

I'd guess -- though I have no data on this -- that most blogs, especially new ones, have very little readership. Certainly one often stumbles on blogs with a total absence of comments. (Not that blog comments in general are often worth reading…) And even if posts are read, is there likely to be much interaction or dialog, compared to the other activities noted above?

As you note, one way out of this would be group blogs, which might expand readership and reduce writing effort. Another would be if the university actively promoted blogs. (I'm constantly amazed at how little work the university puts into describing to the public what faculty do, and how ineptly what little they do is done.)

And, of course, another solution is to simply look at blogging as a way of recording and refining one's thoughts -- regardless of whether they're read or not. I've toyed with this; maybe I'll take it up…

I certainly view blogging as a means of recording and organizing my thoughts. Sometimes I get really thoughtful and insightful feedback in the comments (although sometimes not). There's also the pleasure of self-expression! As James Salter wrote

There comes a time when you realize that everything is a dream, and only those things preserved in writing have any possibility of being real.

Wednesday, August 22, 2012

Beating down hash functions

The state of the art in GPU- and statistics-enhanced password cracking. Crackers beating down information entropy just like in the old days at Bletchley Park! (Trivia question: what are "bans" and "cribs"? Answers)

Ars technica: ... An even more powerful technique is a hybrid attack. It combines a word list, like the one used by Redman, with rules to greatly expand the number of passwords those lists can crack. Rather than brute-forcing the five letters in Julia1984, hackers simply compile a list of first names for every single Facebook user and add them to a medium-sized dictionary of, say, 100 million words. While the attack requires more combinations than the mask attack above—specifically about 1 trillion (100 million * 104) possible strings—it's still a manageable number that takes only about two minutes using the same AMD 7970 card. The payoff, however, is more than worth the additional effort, since it will quickly crack Christopher2000, thomas1964, and scores of others.

"The hybrid is my favorite attack," said Atom, the pseudonymous developer of Hashcat, whose team won this year's Crack Me if You Can contest at Defcon. "It's the most efficient. If I get a new hash list, let's say 500,000 hashes, I can crack 50 percent just with hybrid."

With half the passwords in a given breach recovered, cracking experts like Atom can use Passpal and other programs to isolate patterns that are unique to the website from which they came. They then write new rules to crack the remaining unknown passwords. More often than not, however, no amount of sophistication and high-end hardware is enough to quickly crack some hashes exposed in a server breach. To ensure they keep up with changing password choices, crackers will regularly brute-force crack some percentage of the unknown passwords, even when they contain as many as nine or more characters.

"It's very expensive, but you do it to improve your model and keep up with passwords people are choosing," said Moxie Marlinspike, another cracking expert. "Then, given that knowledge, you can go back and build rules and word lists to effectively crack lists without having to brute force all of them. When you feed your successes back into your process, you just keep learning more and more and more and it does snowball."

Monday, March 12, 2012

Back in the day: startup CEO

I recently came across the audio from a talk I gave at Def Con 9, July 2001 in Las Vegas: itunes; Def Con (couldn't get the video to work). Very interesting to hear myself forecast the future over a decade ago.

I wonder how many people have spoken at Def Con and also given technical briefings in the bowels of Langley ;-)

When I visited China after starting (and exiting) SafeWeb, I thought I might have an unpleasant surprise waiting for me when entering the country. But luckily this cloak and dagger stuff is overblown.

SafeWeb's Triangle Boy: IP spoofing and strong encryption in service of a free Internet

SafeWeb is an encrypted (SSL) anonymous proxy service, used approximately 100 million times per month by hundreds of thousands of people worldwide. Triangle Boy is an Open Source program that lets volunteers turn their PCs into entry points into the SafeWeb network, thereby foiling censorship in countries like China and Iran. Triangle Boy uses IP spoofing and innovative packet routing to minimize the load on volunteer machines. I discuss SafeWeb's goals and technologies, its involvement with the CIA through In-Q-Tel (the agency's venture fund) and the Internet as a catalyst for social transformation in China.

Stephen Hsu is the CEO and co-founder of SafeWeb.

Wednesday, September 28, 2011

Amazon Silk

I wonder what Apple's response will be to this. Perhaps we'll see a "split-browser" update of (mobile) Safari soon. On the desktop I switched over to Chrome 1-2 years ago because it feels faster and it runs Google apps flawlessly. If Silk tries to do things too aggressively it might break a few applications or web pages (very tough to QA stuff like that). But probably there are speedups (e.g., smart pre-caching of popular content) that can be achieved without risk of breaking functionality and which can be exploited within a more conservative approach. Users will probably be forgiving because it's running on a $199 device with a 7" screen (Amazon Fire). The Silk team blog is here.

Saturday, April 09, 2011

Update on NYTimes paywall

I posted before on miserly strategies to avoid buying a NYTimes subscription. It now appears to me their paywall is even wimpier than I had originally suspected. When I wrote the earlier post I hadn't yet experienced the paywall (either it wasn't on or I hadn't reached my limit of free articles for the month; I suspect the former). Having played around with it a bit, I've found the following.

If I try to read, for example, the Sidney Lumet obituary (btw, I highly recommend Dog Day Afternoon :-), the browser url bar shows the following when the subscription page has finally loaded:

http://www.nytimes.com/2011/04/10/movies/sidney-lumet-director-of-american-classics-dies-at-86.html?hp&gwh=A8B811D09B0F452C9A3F74E25512D060

If I eliminate all the cruft after "html" so that the url bar reads

http://www.nytimes.com/2011/04/10/movies/sidney-lumet-director-of-american-classics-dies-at-86.html

then reloading lets me read the article for free. This has worked for every article I've tried -- probably 20 or so by now.

Tuesday, March 29, 2011

Misers' methods for reading the NYTimes

Some people have pointed out to me that I am the cheapest (as in most miserly) person they know in my net worth category. I plead guilty.

The Times wants to charge me $35/month for unlimited digital access (that means on multiple devices, like mobile, tablet, computer). Now, I'm all for supporting journalism, and the Times in particular, but it seems kind of high to me. Let's see how it all works out for the Grey Lady. Perhaps a micropayment scheme would be better? (Has Google rolled their version out yet?)

Apparently they won't limit access to articles reached via link (i.e., from blogs, Twitter, search engine; see below for more details). This is strategic: they want their articles to be read, and to be influential, so don't want to frustrate potential readers who arrive via search or social network.

Therefore, I think you can just type the following into Google to get (free) access to daily NYTimes content (up to 5 articles per day; see note at bottom):

site:nytimes.com < today's date > < keywords >

i.e.,

site:nytimes.com march 29 2011 japan reactor

or

site:nytimes.com 2011/03/29 japan reactor

Soon someone will write a little web or mobile app to do exactly this kind of thing, mashing a nice graphical display with links that connect via Google or Twitter or whatever. Hmm ...

Here is a Twitter feed someone has already put up for this purpose. See also links in comments below.

*** It looks like search engine links are only good for 5 articles a day:

9. Can I still access NYTimes.com articles through Facebook, Twitter, search engines or my blog?

Yes. We encourage links from Facebook, Twitter, search engines, blogs and social media. When you visit NYTimes.com through a link from one of these channels, that article (or video, slide show, etc.) will count toward your monthly limit of 20 free articles, but you will still be able to view it even if you've already read your 20 free articles.

Like other external links, links from search engine results will count toward your monthly limit. If you have reached your monthly limit, you'll have a daily limit of 5 free articles through a given search engine. This limit applies to the majority of search engines.

Tuesday, February 08, 2011

You say you want a revolution

An interview with the Google exec whose Facebook page helped trigger the demonstrations in Egypt. Finally all of those naive and idealistic predictions about the power of the internet are coming true :-)

NYTimes: ... some new demonstrators said they had joined the protests after watching an emotional television interview on Monday night with Wael Ghonim, a Google marketing executive who was snatched off the street nearly two weeks ago, for his role in helping to organize the revolt as the administrator of a popular Facebook page.

One protester in Tahrir Square on Tuesday, Ahmed Meyer El Shamy, an executive with an international pharmaceutical company, told The Times, “many, many people” had resolved to join the demonstration “because of what they saw on TV last night.”

During that interview, Mr. Ghonim acknowledged that he had been the anonymous administrator of the Facebook page We Are All Khaled Said, dedicated to the memory of a 28-year-old Egyptian man beaten to death by the police in Alexandria on June 6, 2010, which helped spark the protests.

More video at the NYTimes link above. (Sorry, I just realized the version below doesn't have subtitles. Unless you speak Arabic you have to click through to the Times; the last video shows Ghonim's emotional reaction when shown pictures of protestors who died.)

Wednesday, January 20, 2010

Aurora uses Chinese error-checking algorithm?

See Operation Aurora: Clues in the Code.

... "Operation Aurora" is the latest in a series of attacks originating out of Mainland China. Previous attacks have been known as – "GhostNet" and "Titan Rain." Operation Aurora takes its name directly from the hackers this time – the name was coined after virus analysts found unique strings in some of the malware involved in the attack. These strings are debug symbol file paths in source code that has apparently been custom-written for these attacks.

... The compiler often offers other clues to a malware sample’s origin. For instance, if the binary uses a PE resource section, the resource’s headers will often provide a language code. The Hydraq component does use a resource section, but in this case, the author was careful to either compile the code on an English-language system, or they edited the language code in the binary after-the-fact. So outside of the fact that PRC IP addresses have been used as control servers in the attacks, there is no "hard evidence" of involvement of the PRC or any agents thereof.

There is one interesting clue in the Hydraq binary that points back to mainland China, however. While analyzing the samples, I noticed a CRC (cyclic redundancy check) algorithm that seemed somewhat unusual. CRCs are used to check for errors that might have been introduced into stored or transferred data. There are many different CRC algorithms and implementations of those algorithms, but this is one I had not previously seen in any of my reverse-engineering efforts.

... The CRC algorithm used in Hydraq uses a table of only 16 constants; basically a truncated version of the typical 256-value table. By decompiling the algorithm and searching the Internet for source code with similar constants, operations and a 16-value CRC table size, I was able to locate one instance of source code that fully matched the structural code implementation in Hydraq and also produced the same output when given the same input ...

... This source code was created to implement a 16-bit CRC algorithm compatible with the implementation known as "CRC-16 XMODEM", while requiring only a 16-value CRC table. It is actually a clever optimization of the standard CRC-16 reference code that allows the CRC-16 algorithm to be used in applications where memory is at a premium, such as hobby microcontrollers. Because the author used the C "int" type to store the CRC value, the number of bits in the output is dependent on the platform on which the code is compiled. In the case of Hydraq, which is a 32-bit Windows DLL, this CRC-16 implementation actually outputs a 32-bit value, which makes it compatible with neither existing CRC-16 nor CRC-32 implementations.

Perhaps the most interesting aspect of this source code sample is that it is of Chinese origin, released as part of a Chinese-language paper on optimizing CRC algorithms for use in microcontrollers. The full paper was published in simplified Chinese characters, and all existing references and publications of the sample source code seem to be exclusively on Chinese websites. This CRC-16 implementation seems to be virtually unknown outside of China, as shown by a Google search for one of the key variables, "crc_ta[16]". At the time of this writing, almost every page with meaningful content concerning the algorithm is Chinese ...

Thursday, January 14, 2010

"Aurora" doesn't sound very Chinese

McAfee dissects the exploit used against Google and other companies operating in China. Given the "social vector" used for initial penetration -- sending emails that appear to come from close associates -- there is an obvious motivation for these hackers to get access to gmail accounts. What's the evidence that this had anything to do with the Chinese government? The McAfee report is careful not to speculate.

If I were a Chinese hacker, wouldn't the filepaths on my development machine have non-English (unicode) characters? I'm sure some readers of this blog would know -- if you develop software in a Chinese language environment, do you use English words or Chinese characters for path and directory names?

Of course, it's possible the attackers just bought the malware from a black hat developer somewhere or have deliberately obfuscated the origin of their code. We need some more forensic information...

McAfee Security Insights Blog: ... the intruders gained access to an organization by sending a tailored attack to one or a few targeted individuals. We suspect these individuals were targeted because they likely had access to valuable intellectual property. These attacks will look like they come from a trusted source, leading the target to fall for the trap and clicking a link or file. That’s when the exploitation takes place, using the vulnerability in Microsoft’s Internet Explorer.

Once the malware is downloaded and installed, it opens a back door that allows the attacker to perform reconnaissance and gain complete control over the compromised system. The attacker can now identify high value targets and start to siphon off valuable data from the company. ...

Operation “Aurora”

I am sure you are wondering about the name “Aurora.” Based on our analysis, “Aurora” was part of the filepath on the attacker’s machine that was included in two of the malware binaries that we have confirmed are associated with the attack. That filepath is typically inserted by code compilers to indicate where debug symbols and source code are located on the machine of the developer. We believe the name was the internal name the attacker(s) gave to this operation. ...

Google dead in China?

WSJ reports on the decision process at Google. There are still a number of open questions:

1. What were Google's prospects in China? Are they really hopelessly behind Baidu? I've seen market share estimates ranging from 15-30% (no agreement even on the sign of the derivative!), and also the claim that the most sophisticated users (i.e., the ones with the most disposable income in the long run) tended to use Google. Perhaps no reason to throw in the towel -- but then why did Kai Fu Lee resign in September? Was it just the opportunity to run his own investment fund? (Here is an earlier post on Baidu, with a talk given by founder Robin Li.)

2. How serious is the state-sponsored security threat to companies operating in China? Did this play a big role in Google's decision? Coordinated attacks by state-run intelligence are significantly harder to deal with than ordinary hackers or even corporate espionage. An intelligence agency only has to turn a few key employees to get at important source code that necessarily would have to be available to researchers and operations people at Google China. It would be difficult to justify the risks of operating in an environment that hostile. (Needless to say, it would be long-run detrimental for China to create an environment that hostile to foreign companies.) On the other hand, snooping around for information about a few email users is hardly a threat of the same proportions.

WSJ: Google Inc.'s startling threat to withdraw from China was an intensely personal decision, drawing its celebrated founders and other top executives into a debate over the right way to confront the issues of censorship and cyber security.

The blog post Tuesday that revealed Google's very public response to what it called a "highly sophisticated and targeted attack on our corporate infrastructure originating from China" was crafted over a period of weeks, with heavy involvement from Google's co-founders, Larry Page and Sergey Brin.

For the two men, China has always been a sensitive topic. Mr. Brin has long confided in friends and Google colleagues of his ambivalence in doing business in China, noting that his early childhood in Russia exacerbated the moral dilemma of cooperating with government censorship, people who have spoken to him said. Over the years, Mr. Brin has served as Google's unofficial corporate conscience, the protector of its motto "Don't be Evil."

The investigation into the cyber intrusion began weeks ago, although how Google detected it remains unclear. As Google employees gathered more evidence they believed linked the attack to China and Chinese authorities, Chief Executive Eric Schmidt, along with Messrs. Page and Brin, began discussing how they should respond, entering into an intense debate over whether it was better to stay in China and do what they can to change the regime from within, or whether to leave, according to people familiar with the discussions. A Google spokesman said Messrs. Page, Brin and Schmidt wouldn't comment.

Mr. Schmidt made the argument he long has, according to these people, namely that it is moral to do business in China in an effort to try to open up the regime. Mr. Brin strenuously argued the other side, namely that the company had done enough trying and that it could no longer justify censoring its search results.

How the debate ultimately resolved itself remains unclear. The three ultimately agreed they should disclose the attack publicly, trying to break with what they saw as a conspiratorial culture of companies keeping silent about attacks of this nature, according to one person familiar with the matter.

Soon, Google's vice president of public policy and communications, Rachel Whetstone, began crafting and revising a number of versions of a possible statement the company planned to release publicly, these people said, sharing it with the three.

The top three agreed that in addition to discussing the attack, the blog post should contain some language about human rights, the strongest statement of which is a clause in the penultimate paragraph of the post.

The section said they had reached the decision to re-evaluate their business in China after considering the attacks "combined with the attempts over the past year to further limit free speech on the web."

Concerned about potential retribution against Google employees in China, the founders and their advisors agreed to include a line saying that the move was "driven by our executives in the United States, without the knowledge or involvement of our employees in China."

... Veteran observers of trade between the countries suggest that Google, and the U.S. generally, has little leverage to press China to back down on Internet censorship or other issues.

Some expressions of support for Google's position flowed in from around the world, including from consumers in China as well as some U.S. companies—including rival Yahoo Inc.—and politicians. Secretary of State Hillary Clinton Tuesday issued a statement saying Google's allegations "raise very serious concerns and questions," and that "we look to the Chinese government for an explanation."

Odds are high Google could be left largely on its own in taking concrete steps to confront the Chinese government. Veteran observers of trade between the countries suggest that Google, and the U.S. generally, has very little leverage to press China to back down on Internet censorship or other issues.

Besides the Google.cn Web site, Google has a range of other business initiatives and partnerships in China that could be affected by its decision. By snubbing Chinese authorities so publicly, the company risks government retaliation against itself or its partners. The decision also affects local competitors who could benefit from any retreat. Shares of Google's biggest Chinese rival, Baidu Inc., surged following the news.

Google's blog post Tuesday said cyber-attacks on its infrastructure resulted in "the theft of intellectual property," stating that it found evidence to suggest that a primary goal of the attackers was accessing the Gmail accounts of Chinese human-rights activists.

Again, snooping on activists and stealing core IP are two very different activities. Which was more disturbing to Google? Previous discussion here.

“Don’t Be Evil” always did sound a bit to me like tikkun olam, or repairing the world (see this profile of Brin and Page). Not sure whether CEO Schmidt is down with that ;-)

Wednesday, January 13, 2010

What's up with Google and China?

Here is Google's official statement that caused all the commotion. (Note it appeared on a blog :-)

Some good comments at TechCrunch:

1. Google’s business was not doing well in China. Does anyone really think Google would be doing this if it had top market share in the country? For one thing, I’d guess that would open them up to shareholder lawsuits. Google is a for-profit, publicly-held company at the end of the day. When I met with Google’s former head of China Kai-fu Lee in Beijing last October, he noted that one reason he left Google was that it was clear the company was never going to substantially increase its market share or beat Baidu. Google has clearly decided doing business in China isn’t worth it, and are turning what would be a negative into a marketing positive for its business in the rest of the world.

2. Google is ready to burn bridges. This is not how negotiations are done in China, and Google has done well enough there to know that. You don’t get results by pressuring the government in a public, English-language blog post. If Google were indeed still working with the government this letter would not have been posted because it has likely slammed every door shut, as a long-time entrepreneur in China Marc van der Chijs and many others said on Twitter. This was a scorched earth move, aimed at buying Google some good will in the rest of the world; Chinese customers and staff were essentially just thrown under the bus.

Actually, recent reports estimate their 2009 search market share at around 30 percent, which is nothing to sneeze at. They could have had a good business in China, although I agree with Kai Fu Lee that the government would never let them dominate the market there the way they do in the rest of the world.

The hacking used trojans injected via a zero-day vulnerability in Adobe (PDF file attachments) [Edit: or was it an IE browser problem? And were the hackers really Chinese -- why codename Aurora?]. The claim that these attacks on multiple companies were coordinated by Chinese intelligence services is plausible but far from proven.

It's important to emphasize that the Chinese government is not monolithic. The parts of the government concerned with economic growth and technology development will be asking some hard questions of the intelligence apparatus about this. No economic planner wants high tech companies like Google or Adobe to stop operating in China as a consequence of security risks.

Thursday, January 07, 2010

Wikipedia: emergent phenomenon?

Is Wikipedia a magical aggregator and filter of expertise from millions of different contributors? Or is it more like traditional encyclopedia projects, with a thousand or so core Wikipedians doing most of the work? The distribution of edits (a typical power law) supports the latter interpretation, but a detailed analysis of particular articles shows that important knowledge is injected by individuals who are not part of the core group.

Aaron Swartz: I first met Jimbo Wales, the face of Wikipedia, when he came to speak at Stanford. Wales told us about Wikipedia’s history, technology, and culture, but one thing he said stands out. “The idea that a lot of people have of Wikipedia,” he noted, “is that it’s some emergent phenomenon — the wisdom of mobs, swarm intelligence, that sort of thing — thousands and thousands of individual users each adding a little bit of content and out of this emerges a coherent body of work.”† But, he insisted, the truth was rather different: Wikipedia was actually written by “a community … a dedicated group of a few hundred volunteers” where “I know all of them and they all know each other”. Really, “it’s much like any traditional organization.”

The difference, of course, is crucial. Not just for the public, who wants to know how a grand thing like Wikipedia actually gets written, but also for Wales, who wants to know how to run the site. “For me this is really important, because I spend a lot of time listening to those four or five hundred and if … those people were just a bunch of people talking … maybe I can just safely ignore them when setting policy” and instead worry about “the million people writing a sentence each”.

So did the Gang of 500 actually write Wikipedia? Wales decided to run a simple study to find out: he counted who made the most edits to the site. “I expected to find something like an 80-20 rule: 80% of the work being done by 20% of the users, just because that seems to come up a lot. But it’s actually much, much tighter than that: it turns out over 50% of all the edits are done by just .7% of the users … 524 people. … And in fact the most active 2%, which is 1400 people, have done 73.4% of all the edits.” The remaining 25% of edits, he said, were from “people who [are] contributing … a minor change of a fact or a minor spelling fix … or something like that.” ...

[But what if we analyze the amount of text contributed by each person, not just the number of edits? See original for analysis of edit patterns of specific articles, including amount of text added.]

... When you put it all together, the story become clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site — the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it’s the outsiders who provide nearly all of the content.

And when you think about it, this makes perfect sense. Writing an encyclopedia is hard. To do anywhere near a decent job, you have to know a great deal of information about an incredibly wide variety of subjects. Writing so much text is difficult, but doing all the background research seems impossible.

On the other hand, everyone has a bunch of obscure things that, for one reason or another, they’ve come to know well. So they share them, clicking the edit link and adding a paragraph or two to Wikipedia. At the same time, a small number of people have become particularly involved in Wikipedia itself, learning its policies and special syntax, and spending their time tweaking the contributions of everybody else.

Monday, June 16, 2008

Neal Stephenson on wiring the world

I'd thought I'd share a link to this somewhat obscure WIRED article written in 1996 by science fiction author Neal Stephenson, about the laying of transcontinental fiber. (Warning: the article is very, very long.) Ever wonder how, exactly, your packets get to that web server in Japan or India? I found it quite inspiring at the time.

Mother Earth Mother Board

The hacker tourist ventures forth across the wide and wondrous meatspace of three continents, chronicling the laying of the longest wire on Earth.

By Neal Stephenson

Many of the themes from the WIRED article also appear in my favorite Stephenson novel, Cryptonomicon (Google Books version). I have a particularly soft spot for the novel, since its plot parallels some spooky, crypto aspects of my own startup experience.

Friday, March 04, 2005

Times on China Internet censorship

The Shanghai bureau chief called me about this in January after Zhao Zhiyang died. I'm actually quoted in the article.

The PRC government is expending a lot of resources on this, and is in many ways quite successful. But, around the edges, there is no stopping the flow of information. While there is no effective political organization in China beyond the government, ordinary people (or, at least, the few hundred million people with direct or indirect access to the Internet) have greater and greater access to uncensored information.

Fairly soon the expectations of the average person in China for democracy and personal freedom will be no different than in other parts of the world. There will be a consensus view that it is "normal" for the government to implement democratic reforms, if only in a gradual way.

Expectations for better governance are increasing everywhere (well, perhaps not in the US ;-) As shown in Georgia, Ukraine and Lebanon, fewer and fewer soldiers are willing to shoot peaceful demonstrators in support of an unpopular government, and the demonstrators know this. Perhaps satellite TV deserves as much credit for this as the Internet, but both are playing an important role.

An interesting (and optimistic) quote from the article: "All of the big mistakes made in China since 1949 have had to do with a lack of information," said Guo Liang, an Internet expert at the Chinese Academy of Social Sciences in Beijing. "Lower levels of government have come to understand this, and I believe that since the SARS epidemic, upper levels may be beginning to understand this, too."

About Me