Matthew Salganik: The Open Review of Bit by Bit, Part 3—Increased access to knowledge

Bit

This is the third post in a three part series about the Open Review of Bit by Bit: Social Research in the Digital Age. This post describes how Open Review led to increased access to knowledge. In particular, I’ll provide information about the general readership patterns, and I’ll specifically focus on readership of the versions of the manuscript that were machine translated into more than 100 languages. The other posts in this series describe how Open Review led to a better book and higher sales.

Readership

During the Open Review period, people from all over the world were able to read Bit by Bit before it was even published. The map at the top of the page shows the locations of readers around the world.

In total, we had 23,514 sessions and 79,426 page views from 15,650 users. Also, unlike annotations, which decreased over time, there was a relatively constant level of traffic, averaging about 500 sessions per week.

Bit

How does these sessions begin? The most common channels were direct navigation followed by organic search. Only about 20% of the traffic came from referrals (following links) and social.

Bit

What devices were people using? About 30% of sessions were on mobile phones. Therefore, responsive design is important to ensure access.  

Bit

In fact, mobile was more common for users from developing countries. For example, in the US, there were about 6 desktop sessions for every 1 mobile session. In India, however, there were about 3.5 mobile sessions for every desktop session. Also, there were more mobile sessions from India than mobile sessions from US.  Here are the top 10 country-platform combinations.

Bit

Machine Translations

In addition to posting the book in English, we also machine translated the book into more than 100 languages using Google Translate. Of course, Google Translate is not perfect, but reading a bad translation might be better than no translation at all. And because Google Translate is getting better quickly, a few years from now machine translation might be a viable approach for many languages.

So, did these machine translations get used? No and yes. In terms of page views, no other single language had more than 2%. So, this seems to argue against the value of machine translation. On the other hand, if you add up all the page views in languages other than English it becomes a sizable number. The non-English page lead to a 20% increase in page views (65,428 English to 79,426 Total).

Bit

If you are considering Open Review of your manuscript, you might be wondering if machine translation was worth it. There were two main costs: adjusting the website to handle multiple languages and the money we had to pay Google for the translations. Now that we’ve open sourced our code, you won’t need to work about the fixed cost related to website design. But, we did pay approximately $3000 USD to Google for translations in August 2016 (I expect that the cost of machine translation will come down). In terms of benefits, they are not really clear. I don’t know if people actually learned anything from these machine translations, and I don’t think they did much to support the other goals of the Open Review: better books and higher sales.  But, it did certainly capture people’s attention when I said that the book was available in 100 languages, and it showed a commitment to access. Future authors and publishers will have to decide what makes sense in their case, but as machine translation continues to improve, I’m optimistic that multiple languages will be part of the Open Review process in some way in the future.

This post is the third post in a three part series about the Open Review of Bit by Bit: Social Research in the Digital Age. The other posts in this series describe how Open Review led to a better book and higher sales.

You can put your own manuscript through Open Review using the Open Review Toolkit, either by downloading the open-source code or hiring one of the preferred partners. The Open Review Toolkit is supported by a grant from the Alfred P. Sloan Foundation.

Matthew J. Salganik is professor of sociology at Princeton University, where he is also affiliated with the Center for Information Technology Policy and the Center for Statistics and Machine Learning. His research has been funded by Microsoft, Facebook, and Google, and has been featured on NPR and in such publications as the New Yorker, the New York Times, and the Wall Street Journal.

Matthew Salganik: The Open Review of Bit by Bit, Part 2—Higher sales

This post is the second in a three part series about the Open Review of Bit by Bit: Social Research in the Digital Age. This post describes how Open Review led to higher sales. The other posts in this series describe how Open Review led to a better book and increased access to knowledge.

Before talking about sales in more detail, I think I should start by acknowledging that it is a bit unusual for authors to talk about this stuff. But sales are an important part of the Open Review process because of one simple and inescapable fact: publishers need revenue. My editor is amazing, and she’s spent a lot of time making Bit by Bit better, as have her colleagues that do production and design. These people need to be paid salaries, and those salaries have come from somewhere. If you want to work with a publisher—even a non-profit publisher—then you have to be sensitive to the fact that they need revenue to be sustainable. Fortunately, in addition to better books and increased access to knowledge, Open Review also helps sell books. So for the rest of this post, I’m going to provide a purely economic assessment of the Open Review process.

One of the first questions that some people ask about Open Review is: “Aren’t you giving your book away for free?”  And the answer is definitely no. Open Review is free like Google is free.

Notice that Google makes a lot of money without ever charging you anything. That’s because you are giving Google something valuable, your data and your attention. Then, Google monetizes what you provide them. Open Review is the same.

In addition to improving the manuscript, which should lead to more sales, there are three main that Open Review increases sales: collecting email addresses, providing market intelligence, and promoting course adoptions.

Email addresses

After discussions with my editor, we decided that the main business metric during the Open Review of Bit by Bit was collecting email addresses of people who wanted to be notified when the book was complete. These addresses are valuable to the publisher because they can form the basis of a successful launch for the book. 

How did we collect email address?  Simple, we just asked people like this:

Bit

During the Open Review process we collected 340 unique valid emails address. Aside from a spike at the beginning, these arrive at a pace of about 1 per day with no sign of slowing down.

Bit

Who are these people? One quick way to summarize it is to look at the email ending (.com, .edu, .jp, etc). Based on this data, it seems that that Open Review helped us collect email address from people all over the world.

Bit

Another way to summarize the types of people who provided their email address is to look at the email suffixes (everything that comes after @). This shows, for example, which schools and companies are most represented.

Just collecting 340 email addresses was enough to significantly increase sales of Bit by Bit. And, in future Open Review projects, authors and publishers can get better at collecting email addresses. Just as Amazon is constantly running experiments to get you to buy more stuff, and the New York Times is running experiments to get you to click on more headlines, we were running experiments to collect more addresses. And unlike the experiments by Amazon and the New York Times, our experiments were overseen by Princeton’s Human Subjects Institutional Research Board.  

We tried six different ways to collect email addresses, and then we let Google Analytics use a multi-armed bandit approach find the best one. Here’s how they compared:

Bit

These differences are not huge, but they illustrate that Open Review websites can use the same kind of conversation optimization techniques that are common on modern, commercial websites. And I’m confident that future Open Review projects could be have an even higher rate of email sign-ups with additional design improvements and experimentation.

Market intelligence

In addition to collecting email addresses, the Open Review process also provides market intelligence that helped tailor the marketing of the book. For example, using a tool called Google Webmaster you can see which parts of your book are being linked to:

Bit

From this information, we learned that in addition to the book itself, people were most interested in the Open Review process and the chapter on Ethics. Then, when we were developing marketing copy for the book, we tried to emphasize this chapter.

Using Google Webmaster, you can also see which search terms are leading people to your book. In my case, you will see that 9 of the top 10 terms are not in English (in fact 48 of the top 50 terms are not in English). This is because of the machine translation process, which I talk about more in the post on increased access to knowledge. I was hoping that we would receive more organic search traffic in English, but as learned during this project: it is very hard to show up in the top 10 in organic search for most keywords.

Bit

In case you are curious, গবেষণা নকশা means “research design” in Bengali (Bangla).  

A final way that this market intelligence was helpful was in selling foreign rights to the book. For example, I provide this map of global traffic to representatives from Princeton University Press before they went to the London Book Fair to sell the foreign rights to Bit by Bit. This traffic shows in a very concrete way that there was an interest in the book outside of the United States.

Bit

 

 

Course adoptions

Finally, in addition to email addresses to help launch the book and market intelligence, Open Review accelerates course adoptions. My understanding is there is typically a slow ramp-up in course adoptions over the period of several years. But that slow ramp-up would be problematic for my book, which is freshest right when published and will gradually go stale over time. Given that the lifespan for this edition is limited, early course adoptions are key, and Open Review helped with that. I know of about 10 courses (list here) that have adopted the book in whole or in part during the Open Review process. This helped prime the pump for course adoptions when the book went on sale.

In this post, I’ve tried to describe the business case for Open Review, and I’ve shown how Open Review can help with collecting email addresses, gathering market intelligence, and speeding course adoptions. I think that purely on economic terms Open Review makes sense for publishers and authors for some books. If more people explore and develop Open Review as a model, I expect that these economic benefits would increase.  Further, this simply economic analysis does not count the benefits that come from better books and increased access to knowledge, two things that both authors and publishers value.

This post is the second in a three part series about the Open Review of Bit by Bit. You can also read more about how the Open Review of Bit by Bit led to a better book and increased access to knowledge. And, you can put your own manuscript through Open Review using the Open Review Toolkit, either by downloading the open-source code or hiring one of the preferred partners. The Open Review Toolkit is supported by a grant from the Alfred P. Sloan Foundation.

Matthew J. Salganik is professor of sociology at Princeton University, where he is also affiliated with the Center for Information Technology Policy and the Center for Statistics and Machine Learning. His research has been funded by Microsoft, Facebook, and Google, and has been featured on NPR and in such publications as the New Yorker, the New York Times, and the Wall Street Journal.

Matthew Salganik: The Open Review of Bit by Bit, Part 1—Better books

My new book Bit by Bit: Social Research in the Digital Age is for social scientists who want to do more data science, data scientists who want to do more social science, and anyone interesting in the combination of these two fields. The central premise of Bit by Bit is that the digital age creates new opportunities for social research. As I was writing Bit by Bit, I also began thinking about how the digital age creates new opportunities for academic authors and publishers. The more I thought about it, the more it seemed that we could publish academic books in a more modern way by adopting some of the same techniques that I was writing about. I knew that I wanted Bit by Bit to be published in this new way, so I created a process called Open Review that has three goals: better books, higher sales, and increased access to knowledge. Then, much as doctors used to test new vaccines on themselves, I tested Open Review on my own book.

This post is the first in a three part series about the Open Review of Bit by Bit. I will describe how Open Review led to a better book. After I explain the mechanics of Open Review, I’ll focus on three ways that Open Review led to a better book: annotations, implicit feedback, and psychological effects. The other posts in this series describe how Open Review led to higher sales and increased access to knowledge.

How Open Review works

When I submitted my manuscript for peer review, I also created a website that hosted the manuscript for a parallel Open Review. During Open Review, anyone in the world could come and read the book and annotate it using hypothes.is, an open source annotation system. Here’s a picture of what it looked like to participants.

Bit

In addition to collecting annotations, the Open Review website also collected all kinds of other information. Once the peer review process was complete, I used the information from the peer review and the Open Review to improve the manuscript.

Bit

In the rest of this post, I’ll describe how the Open Review of Bit by Bit helped improve the book, and I’ll focus three things: annotations, implicit feedback, and psychological effects.

Annotations

The most direct way that Open Review produced better books is through annotations. Readers used hypothes.is, an open source annotation system, to leave annotations like those shown in the image at the top of this post.

During the Open Review period, 31 people contributed 495 annotations. These annotations were extremely helpful, and they led to many improvements in Bit by Bit. People often ask how these annotations compare to peer review, and I think it is best to think of them as complementary. The peer review was done by experts, and the feedback that I received often pushed me to write a slightly different book. The Open Review, on the other hand, was done by a mix of experts and novices, and the feedback was more focused on helping me write the book that I was trying to write. A further difference is the granularity of the feedback. During peer review, the feedback often involved removing or adding entire chapters, whereas doing Open Review the annotations were often focused on improving specific sentences.

The most common annotations were related to clunky writing. For example, an annotation by differentgranite urged me avoid unnecessarily switching between “golf club” and “driver.” Likewise an annotation by fasiha pointed out that I was using “call data” and “call logs” in a way that was confusing. There were many, many small changes like these helped improve the manuscript.

In addition to helping with writing, some annotations showed me that I had skipped a step in my argument. For example an annotation by kerrymcc pointed out that when I was writing about asking people questions, I skipped qualitative interviews and jumped right to surveys. In the revised manuscript, I’ve added a paragraph that explains this distinction and why I focus on surveys.

The changes in the annotations described above might have come from a copy editor (although my copy editor was much more focused on grammar than writing). But, some of the annotations during Open Review could not have come from any copy editor. For example, an annotation by jugander pointed me to a paper I had not seen that was a wonderful illustration of a concept that I was trying to explain. Similarly, an annotation by pkrafft pointed out a very subtle problem in the way that I was describing the Computer Fraud and Abuse Act. These annotations were both from people with deep expertise in computational social science and they helped improve the intellectual content of the book.

A skeptic might read these examples and not be very impressed.  It is certainly true that the Open Review process did not lead to massive changes to the book. But, these examples—and dozens of other—are small improvements that I did make. Overall, I think that these many small improvements added up to a major improvement.

Here are a few graphs summarizing the annotations.

Annotations by person: Most annotations were submitted by a small number of people.

Bit

Annotations by date: Most annotations were submitted relatively early in the process. The spike in late November occurred when a single person read the entire manuscript and made many helpful annotations.

Bit

Annotations by chapter: Chapters later in the book received fewer annotations, but the ethics chapter was somewhat of an exception.

Bit

Annotations by url: Here are the 20 sections of the book that received the most annotations.  In this case, I don’t see a clear pattern, but this might be helpful information for other projects.

Bit

One last thing to keep in mind about these annotations is that they underestimate the amount of feedback that I received because they only count annotations that received through the Open Review website. In fact, when people heard about Open Review, they sometimes invited me to give a talk or asked for a pdf of the manuscript on which they could comment. Basically, the Open Review website is a big sign that says “I want feedback” and that feedback that comes in a variety of forms in addition the annotations.

One challenge with the annotations is that they come in continuously, but I tended to make my revisions in chunks. Therefore, there was often a long lag between when the annotation was made and when I responded. I think that participants in the Open Review process might have been more engaged if I had responded more quickly. I hope that future Open Review authors can figure out a better workflow for responding to and incorporating annotations into the manuscript.

Implicit feedback

In addition to the annotations, the second way that Open Review can lead to better books is through implicit feedback. That is, readers were voting with their clicks about which parts of the book are interesting or boring. And this “reader analytics” are apparently a hot thing in the commercial book publishing world. To be honest, this feedback proved less helpful than I had hoped, but that might be because I didn’t have a good dashboard in place. Here are five elements that I’d recommend for an Open Review dashboard (and all of them are possible with Google Analytics):

  • Which parts of the book are being read the most?
  • What are the main entry pages?
  • What are the main exit pages?
  • What pages have the highest completion rate (based on scroll depth)?
  • What pages have lowest completion rate (based on scroll depth)?

Psychological effects

There is one last way that Open Review led to a better a book: it made me more energized to make revisions. To be honest, for me, writing Bit by Bit was frustrating and exhausting. It was a huge struggle to get the point where the manuscript was ready for peer review and Open Review. Then, after receiving the feedback from peer review, I needed to revise the manuscript. Without the Open Review process—which I found exciting and rejuvenating—I’m not sure if I would have had the mental energy that was need to make revisions.

In conclusion, Open Review definitely helped make Bit by Bit better, and there are many ways that Open Review could be improved.

I want to say again that I’m grateful to everyone that contributed to the Open Review process:

benzevenbergen, bp3, cfelton, chase171, banivos, DBLarremore, differentgranite, dmerson, dmf, efosse, fasiha, huntr, jboy, jeschonnek.1, jtorous, jugander, kerrymcc, leohavemann, LMZ, Nick_Adams, nicolemarwell, nir, person, pkrafft, rchew, sculliwag, sjk, Stephen_L_Morgan, toz, vnemana

You can also read more about how the Open Review of Bit by Bit lead to higher sales and increased access to knowledge. And, you can put your own manuscript through Open Review using the Open Review Toolkit, either by downloading the open-source code or hiring one of the preferred partners. The Open Review Toolkit is supported by a grant from the Alfred P. Sloan Foundation.

Matthew J. Salganik is professor of sociology at Princeton University, where he is also affiliated with the Center for Information Technology Policy and the Center for Statistics and Machine Learning. His research has been funded by Microsoft, Facebook, and Google, and has been featured on NPR and in such publications as the New Yorker, the New York Times, and the Wall Street Journal.

Matthew Salganik: Invisibilia, the Fragile Families Challenge, and Bit by Bit

Salganik

This week’s episode of Invisibilia featured my research on the Fragile Families Challenge. The Challenge is a scientific mass collaboration that combines predictive modeling, causal inference, and in-depth interviews to yield insights that can improve the lives of disadvantaged children in the United States. Like many research projects, the Fragile Families Challenge emerged from a complex mix of inspirations. But, for me personally, a big part of the Fragile Families Challenge grew out of writing my new book Bit by Bit: Social Research in the Digital Age. In this post, I’ll describe how Bit by Bit helped give birth to the Fragile Families Challenge.

Bit by Bit is about social research in the age of big data. It is for social scientists who want to do more data science, data scientists who want to do more social science, and anyone interested in the combination of these two fields. Rather than being organized around specific data sources or machine learning methods, Bit by Bit progresses through four broad research designs: observing behavior, asking questions, running experiments, and creating mass collaboration. Each of these approaches requires a different relationship between researchers and participants, and each enables us to learn different things.

As I was working on Bit by Bit, many people seemed genuinely excited about most of the book—except the chapter on mass collaboration. When I talked about this chapter with colleagues and friends, I was often greeted with skepticism (or worse). Many of them felt that mass collaboration simply had no place in social research. In fact, at my book manuscript workshop—which was made up of people that I deeply respected—the general consensus seemed to be that I should drop this chapter from Bit by Bit.  But I felt strongly that it should be included, in part because it enabled researchers to do new and different kinds of things. The more time I spent defending the idea of mass collaboration for social research, the more I became convinced that it was really interesting, important, and exciting. So, once I finished up the manuscript for Bit by Bit, I set my sights on designing the mass collaboration that became the Fragile Families Challenge.

The Fragile Families Challenge, described in more detail at the project website and blog, should be seen as part of the larger landscape of mass collaboration research. Perhaps the most well known example of a mass collaboration solving a big intellectual problem is Wikipedia, where a mass collaboration of volunteers created a fantastic encyclopedia that is available to everyone.

Collaboration in research is nothing new, of course. What is new, however, is that the digital age enables collaboration with a much larger and more diverse set of people: the billions of people around the world with Internet access. I expect that these new mass collaborations will yield amazing results not just because of the number of people involved but also because of their diverse skills and perspectives. How can we incorporate everyone with an Internet connection into our research process? What could you do with 100 research assistants? What about 100,000 skilled collaborators?

As I write in Bit by Bit, I think it is helpful to roughly distinguish between three types of mass collaboration projects: human computation, open call, and distributed data collectionHuman computation projects are ideally suited for easy-task-big-scale problems, such as labeling a million images. These are projects that in the past might have been performed by undergraduate research assistants. Contributions to human computation projects don’t require specialized skills, and the final output is typically an average of all of the contributions. A classic example of a human computation project is Galaxy Zoo, where a hundred thousand volunteers helped astronomers classify a million galaxies. Open call projects, on the other hand, are more suited for problems where you are looking for novel answers to clearly formulated questions. In the past, these are projects that might have involved asking colleagues. Contributions to open call projects come from people who may have specialized skills, and the final output is usually the best contribution. A classic example of an open call is the Netflix Prize, where thousands of scientists and hackers worked to develop new algorithms to predict customers’ ratings of movies. Finally, distributed data collection projects are ideally suited for large-scale data collection. These are projects that in the past might have been performed by undergraduate research assistants or survey research companies. Contributions to distributed data collection projects typically come from people who have access to locations that researchers do not, and the final product is a simple collection of the contributions. A classic example of a distributed data collection is eBird, in which hundreds of thousands of volunteers contribute reports about birds they see.

Given this way of organizing things, you can think of the Fragile Families Challenge as an open call project, and when designing the Challenge, I draw inspiration from the other open call projects that I wrote about such as the Netflix Prize, Foldit, and Peer-to-Patent.

If you’d like to learn more about how mass collaboration can be used in social research, I’d recommend reading Chapter 5 of Bit by Bit or watching this talk I gave at Stanford in the Human-Computer Interaction Seminar. If you’d like to learn more about the Fragile Families Challenge, which is ongoing, I’d recommend our project website and blog.  Finally, if you are interested in social science in the age of big data, I’d recommend reading all of Bit by Bit: Social Research in the Digital Age.

Matthew J. Salganik is professor of sociology at Princeton University, where he is also affiliated with the Center for Information Technology Policy and the Center for Statistics and Machine Learning. His research has been funded by Microsoft, Facebook, and Google, and has been featured on NPR and in such publications as the New Yorker, the New York Times, and the Wall Street Journal.