Friday, June 20, 2014

The World = Dhaka, Bangladesh?

In January, I wrote about ZappyLab’s painful experience with accidentally acquiring fake “likes” on Facebook (ZappyLab is a life sciences software startup that I co-founded). It led me to investigate “likes” of various startups and pharmaceuticals, and I admit I felt much better when I realized that Pfizer, Novartis, and Sanofi seem to have fallen into the same trap.

Facebook has been for years denying that the fake likes are a problem for advertisers. In 2012, Facebook responded to a BBC investigation:
 “We’ve not seen evidence of a significant problem,” said a spokesman.“Neither has it been raised by the many advertisers who are enjoying positive results from using Facebook.All of these companies have access to Facebook’s analytics which allow them to see the identities of people who have liked their pages, yet this has not been flagged as an issue [my emphasis].A very small percentage of users do open accounts using pseudonyms but this is against our rules and we use automated systems as well as user reports to help us detect them.”

My post inspired Derek Muller to make a brilliant video on this topic. Facebook response to the video?
Fake likes don’t help us. For the last two years, we have focused on proving that our ads drive business results and we have even updated our ads to focus more on driving business objectives. Those kinds of real-world results would not be possible with fake likes. In addition, we are continually improving the systems we have to monitor and remove fake likes from the system.
Just to be clear, he created a low quality Page about something a lot of people like – cats. He spent $10 and got 150 people who liked cats to like the Page. They may also like a lot of other Pages which does not mean that they are not real people – lots of real people like lots of things [my emphasis].

And a few days ago I noticed that many universities seem to have accumulated fake “likes” as well. I blogged about it, Business Insider covered it, and Facebook replied with:

There are many ways people find Pages to like on Facebook – from someone navigating directly to the Page to like it, to people seeing advertising campaigns, or from Pages appearing in Pages You May Like. Some Pages, including universities with an international reputation, often receive a large number of likes from people around the world and have fans that are dispersed geographically and demographically. These pages are often also featured in Pages You May Like and receive likes from people who aspire to attend or visit the university.

 Since I singled out Harvard in my post, Harvard also commented:

“Harvard is an internationally recognized institution with students, faculty, alumni, and other followers around the world,” a spokesperson said in a statement. “Global interest in Harvard is validated by engagement across all our platforms. Social media is among the many tools we use to connect with the Harvard community and with many others interested in the teaching, learning, and research at Harvard.”

So the Facebook claim is “these are legitimate likes”. If you are a global brand, your “likes” will be global. If you advertise without geo-restrictions, your likes will come from all over the world. Fair enough. Makes sense.

The puzzling part is that the most popular city for the fans of Harvard, Oxford, and Cambridge is Dhaka, Bangladesh. You might even say this is slightly suspicious since Dhaka is known as the “hub of click-farms” that produce the fake likes. But hey, coincidences happen, and I suppose many in Bangladesh may "aspire to attend or visit" Harvard.

Three well-know universities are adored by the residents of Dhaka. Fine. Anyone else? Well, it turns out the following global brands also just happen to have their most loyal following in Dhaka, Bangladesh:

If this has nothing to do with click-farms, then the residents of Dhaka, Bangladesh must be the most loving people on the planet - loving everyone and everything, en masse. The above does not fit well with Facebook's position that famous brands "often receive a large number of likes from people around the world and have fans that are dispersed geographically and demographically." Where is the dispersion? Well, perhaps I am cherry-picking too much. Perhaps other famous pages have likes from everywhere? Not so much.

Jakarta, Indonesia: Citi, Intel, Shell
Bangkok, Thailand: CNN, Dell, Honda, KFC, Levis

I am not sure if this is what Facebook means by "dispersion". Since "Egypt, India, the Philippines, Pakistan, Bangladesh, Indonesia, Nepal and Sri Lanka, [are] all countries where click farms are common," the "likes" seem to simply be dispersed between the click-farm locations (with some curious clustering by industry). Is it possible that this overlap is due to chance? Certainly possible. Likely? I don't think so. Since Facebook has lots of data scientists, I'll let them figure out the p-value for the probability that the above is just a coincidence.

Tuesday, June 17, 2014

Facebook's Most Popular University

I am not a fan of rankings. ZappyLab built PubChase because of our intense dislike of the impact factor (metric commonly misused and abused in academia to judge the quality of research based on the journal where it is published instead the quality of the article itself). Two years ago, I put a lot of effort into Thoughts on Choosing a College to argue that undergraduate college rankings are mostly meaningless and a terrible metric for parents and students in choosing the right institution.

So, it is not surprising that the recent report "Where are the most popular universities on Facebook?" did not impress me much. A toxic combination of rankings and Facebook popularity, particularly in light of my editorial on fake Facebook "likes" that plague companies [1]. I found it amusing that a Malaysian university took pride in "beating Cambridge" in terms of the number of "likes" and touted it in press and on social media, with ambitious plans to double their total FB likes.

However, the real story here is not Limkokwing University but the schools like Harvard, Yale, Oxford, and Cambridge. I found it peculiar how erratic and unpredictable the top 15 list was. Why weren't Princeton and Berkeley in that list? This inspired me to look up the sources of the "likes" for the different top colleges. Turns out that similarly to life science companies and pharmaceuticals, many colleges are paying Facebook to acquire fake likes.

Very easy to tell which ones fell victim to this by looking at the number and source of the "likes"; they all match the location of the school for the universities with fewer than 300,000 "likes". But the schools that made it to the "most popular" list have "likes" from Dhaka, Bangladesh (Cambridge, Oxford, UofPeople, Harvard) and Addis Abeba, Ethiopia (Yale) [2].

The most stunning example here is Harvard with 3.3 million "likes". Probably about three million of these are fakes. I just hope they did not pay for this, like we did on PubChase [3]. If the cost per "like" is similar to ours ($50-$100 per thousand), Harvard might have paid Facebook between $150,000-$300,000 for fake likes [4].

It is pretty clear by now that the problem with the purchase of fake likes, directly from Facebook, is pervasive [1]. I just don't know the scale of this. If startups, corporations, and universities are paying for this, how much exactly is Facebook earning from the fakes? And could this revenue be the reason why Facebook consistently denies the existence of this problem and makes it impossible to delete the fake likes once you purchase them?


1. Business Insider has great coverage of the fake "likes". "Facebook Advertisers Complain Of A Wave Of Fake Likes Rendering Their Pages Useless".

2. Below are the numbers for the above graph.
University Total Likes Location
Princeton, NJ
UC Berkeley 241,00 Berkeley,CA
Caltech 83,000 Pasadena,CA
Columbia 139,000 New York, NY
ICL 66,000 London, UK
Univ. of Chicago 110,000 Chicago, IL
Cornell 171,000 Ithaca, NY
Univ. of Toronto 175,000 Toronto, CA
Univ. of Singapore 103,000 Singapore, Singapore
Univ. of Cambridge 751,000 Dhaka, Bangladesh
Univ. of Oxford 1,500,000 Dhaka, Bangladesh
Univ. of the People 1,214,000 Dhaka, Bangladesh
Harvard 3,300,000 Dhaka, Bangladesh
Yale 877,000 Addis Abeba, Ethiopia

3. This example makes it clear how a page can suddenly be flooded with fake likes, completely beyond the control of the owners.

4. While the fake "likes" might have flooded Harvard and Yale organically, the following posts suggest that Harvard is very proud and cares a lot about the "number of likes".

Particularly this quote is revealing, 
"Harvard still has a ways to go to match the current Facebook page Leader—Texas Hold'em Poker has more than 53 million likes—but we are at least more popular online than our neighbors in New Haven. Only 49,171 people have liked Yale University's official Facebook page."
So Yale had to catch up and grow its "likes" total from "only 49,171" to the almost million from Ethiopia that it has now. If I had to bet on it, I'd wager that both Harvard and Yale stumbled into the same problem as we did with PubChase, paying to promote their pages and getting empty fakes in return.


Wednesday, June 11, 2014

What's the shape of a gene?

One of the rewards we promised in our Kickstarter campaign for is a dozen of "Lenny's Gene-shaped Black Russian Espresso Cookies." When we added the reward, I had exon-intron-exon shape in mind.

Well, after extensive experimenting, this structure clearly won't hold up during shipping. It got me to ponder what exactly a gene looks like. With all of my bioinformatics and comp. bio. training, I keep thinking of genome browsers. But what does a gene look like deep inside the nucleus. How do others imagine gene shapes?

I considered putting ZappyLab on hold to do a 5-year postdoc and figure out the shape of a gene, but Alexei (co-founder) and our board of directors protested. Then I remembered the gene-looping papers from the lab of Nick Proudfoot. Looked up his papers, and viola! Problem solved.
Gene looping model, figure 6D  (Tan-Wong SM, et. al., Science, 2012)
Please do let me know if you have other visions of gene shapes!

The "gene-shaped" part of the reward promise is clearly complicated. However, I should also disclose that "Lenny's..." should be "Lenny's and Eeva's." My older daughter is very excited about this project and is helping the hand-crafted, small-batch, reward baking.

Monday, June 9, 2014

The Open Access and Science Visionaries

Dylan Tweney has just written a terrific piece at VentureBeat, "Watch this multi-billion-dollar industry evaporate overnight," on the current and future state of the academic publishing industry. He perfectly captures the scam that the subscription publishers are running and is spot-on regarding the abyss into which their profits will plunge in the future. Mr. Tweney clearly highlights the three fundamental components that are key in revamping the publishing industry – shift to open access, post-publication peer review, and altmetrics (referred to as “imprimatur of publishing” in the article, but more broadly, the effort to measure the quality of the research based on the article itself, rather than where it is published).

The only thing missing from this article is the list of people who have been at the forefront of this effort. Strangely, is profiled as the major force reshaping the publishing landscape. I don't know a single person in the open access movement who has ever mentioned or ResearchGate as a serious part of the fight against the 300-year-old publishing paradigm. So, in the spirit of post-publication review and discussion, mentioned in this article, I would like to create the list of true luminaries in this movement.

I will start it off with a subset of the visionaries and tireless advocates who have devoted themselves to fixing this broken system over the past 10, 15, and even 20 years. More importantly, I hope that the community will contribute and help to curate this list, so we can give credit to those who deserve it.

The Visionary Veterans

The late Jean-Claude Bradley is the father of the Open Science Movement. See here and here for tributes to him.

Pat Brown and Michael Eisen are the other co-founders of PLOS. They have been working since 2000 on pushing towards open access. What many forget is that PLOS is not just a publisher. It is an organization of dedicated people whose passion is the open and free dessimination of research. The amount of work that is done by PLOS to advocate and push open science via innovation in publishing, outreach, legislation, and mandates from funding agencies – it is simply staggering. 

Pete Binfield helped to turn PLOS One into the most prolific biomedical journal and is one of the early altmetrics pioneers. Moreover, he is a co-founder of PeerJ, a new open access journal whose ultimate goal is to be a place where research is both - free to read and free to publish.

The late Frederick Friend, as the head librarian of the University College of London was an influential and dedicated promoter of open access for nearly two decades.

Paul Ginsparg single-handedly transformed publishing in physics by launching the world's most famous pre-print server in 1991. The biomedical research community has been trying to emulate this preprint success and to shift the publishing culture in this direction for over two decades now.

Stevan Harnad authored the "Subversive Proposal" in 1994, calling for self-archiving of research literature. Just to list his contributions over the last two decades to open access would take a separate blog post.

Heather Joseph has been the executive director of Scholarly Publishing and Academic Resources Coalition (SPARC) since 2005. In that role, she has been incredibly effective in pushing the White House, NIH, and Congress for legislative change to promote OA.

David Lipman has been the director of the National Center for Biotechnology Information since 1989. He was honored by the White House as a "Champion of Change" for open science in 2013. It is  not possible to list all of the projects that David has overseen in his 25 years at NCBI, but his lead on the creation of the PubMed Central database to facilitate free and open access to government-funded biomedical research must be highlighted.

Peter Murray-Rust could be on this list just for the quote "Open Access saves lives". But like everyone else listed here, he has been aggressively pushing open data and open science for a decade or more.

Cameron Neylon is currently the advocacy director of PLOS, but he has been one of the most vocal supporters of open science and open access for about a decade, and is also one of the earliest advocates of the altmetrics movement.

Henry Rzepa saw the potential for open data via internet starting in the late 1980s and has been pushing the boundaries ever since.

Stuart Shieber is the co-director, along with Peter Suber, of the Harvard Office for Scholarly Communication. He was the main push for open access at Harvard and is responsible for the university's widely-copied 2008 self-archiving mandate.

Peter Suber wrote the book on Open Access. In all senses (the book is actually called "Open Access" and appropriately available openly for free). He gave up a tenured professor position in order to focus all of his energy on pushing for open access. Peter also wrote the draft of the 2001 Budapest Open Access Initiative, a pivotal moment in the open access history.

Vitek Tracz established the first open access publishing empire BioMed Central fourteen years ago. Right now, he is investing heavily in another bold experiment – F1000 Research, an open access journal that has open post-publication peer reviews  and versioning of articles.

Harold Varmus began to push for open access in the 1990s, when he was the director of NIH. He helped to start PubMed Central and played an active role in the founding of the Public Library of Science (PLOS).
Community-Suggested Leaders

Matthew Cockerill co-founded BioMed Central with Vitek Tracz in 1999 and was it's managing director through 2013. He clearly did an amazing job, as BMC continues as a successful OA journal under Springer and has proven to all of the critics that open access publishing can be profitable/sustainable (this interview with him is fascinating).
*suggested by Casey Bergman

Richard Poynder is a journalist who has been thoughtfully and thoroughly covering the open access movement for a decade. Stevan Harnad wrote about Richard in 2009, "The OA movement is fortunate indeed to have Richard Poynder as its chronicler, conscience, and gadfly laureate." (After wikipedia, most of my links above point to Richard's website.)
*suggested by Jason Noble

John Willinsky founded in 1998 the Public Knowledge Project, dedicated to promoting open access and to improving scholarly publishing and communication.

1. I need to add Heather Joseph, Melissa Hagemann, Leslie Chan, Mark Patterson - will do so over the next few days. (This project has taken me 8 hours already, and I have to take a sick daughter to the doctor, but I want to publish immediately to begin collecting the crowdsourced entries.)

2. I know that my list is woefully incomplete. Please e-mail me at "lenny at zappylab dot com" for additions and corrections. Please comment directly below this post to add more people. I am simply seeding this list.

3. I was co-advised Michael Eisen during my PhD and am clearly biased towards PLOS; I am also a biologist, and so my view of the publishing world and innovation in it is mostly limited to biomedical research.