Rachel Schutt (A.B. ’97) commutes by subway from the Upper West Side of Manhattan to her office in the News Corporation building, just around the corner from Times Square. She heads through lobby security and up the elevator to the eighth floor, passing 21st Century Fox at one end of the building and crossing the hall toward her News Corp office. Near the entrance are walls covered by giant television screens and slanting shelves stacked with the day’s newspapers published by News Corp’s nine companies: the New York Post, the Wall Street Journal, the Times of London, the Daily Telegraph, and others.
As chief data scientist, Schutt has been defining her role ever since she arrived at the company two years ago. Simply put, her job is to answer this question: How can data help shape the future of news?
“For me, it’s about building a sustainable business model for journalism,” says Schutt, who believes strongly in journalism as a social good. “But it’s been very heavily disrupted by the Internet. So how do we get revenue in order to pay journalists so that we can keep newspapers going and stories coming?”
The answer involves big data. Think of the 15 million unique visitors to the Wall Street Journal website each day, lingering for x minutes, clicking on y and z related articles, each click carrying a timestamp and origin. That’s the kind of high-volume, fast-accumulating, miscellaneous information that Schutt has to sift through. Add those to yet more data on News Corp’s other online publications and, say, stats about the print subscriptions, and you start to understand the skill and creativity required to make sense of it all.
Schutt uses those data to inform and build best practices for the business and the newsroom. She searches for patterns amid the data points, “attempting to create order out of chaos,” as she puts it. Based on patterns of user behavior, Schutt can automate strategies to retain newspaper subscribers and generate ad revenue. She can use statistics to predict whether visitors are likely to renew their subscriptions; if not, the marketing team creates messaging and incentives to keep readers coming back. That’s the business side.
“But we’re also interested in the design element,” she continues. “On the newsroom side, there are examples where the data science teams help the newsroom tell stories with data. To me, that’s where the future is, in terms of creative opportunities.” For instance: producing interactive infographics, or quickly processing the text in 30,000 emails released by former Secretary of State and presidential hopeful Hillary Clinton to see whether the emails contain a newsworthy pattern.
“Any of the world’s data becomes something we can work on,” Schutt says. “There are so many new kinds of data, which we can use to find stories anywhere in the world.”
Schutt’s winding career path—from honors math in LSA to News Corp in NYC—took her through a couple of master’s degrees and a Ph.D.; included stints as a consultant, high school teacher, and statistician at Google; and eventually led her to publish a seminal textbook on the emerging field of data science.
The book, called Doing Data Science, grew out of a course she taught at Columbia University. “Teaching the course was an effort to explore this new thing for myself—it was a vehicle to research this new area called big data,” Schutt says. “So I did it at night, while I was working at Google. It felt like I was onto something.”
Schutt, the book’s co-author, Cathy O’Neil, and students in the course sensed the opportunity to influence future conversations about working with big data. They thought deeply about not just the technical, math-laden aspects of the field, but also the human side of the numbers, the values and ethics required to do data science right. Throughout the book, Schutt and O’Neil pose thought experiments that reflect some of these ethical considerations. “In a statistics or computer science class, ethics doesn’t normally come up. Just doing that probably was an innovation,” Schutt says. “The philosophical and human element, I’ve always felt, is completely missing in other books in similar fields. And it still is.”