Some View From Hadley

  1. Pragmatically, if you’re a data scientist, learning the basics of SQL is really important.
  1. You should also have a minimal reading knowledge of R and Python, because so many data science teams use both .
  1. Then I think you’re better off specializing in one of these two and getting really good at it, rather than spreading yourself too thin and being mediocre at several languages.

The second book that you chose is The Algorithm Design Manual . Algorithms are a big part of computer science knowledge. Can you explain why you think it’s still important to learn about them, when most of them have been implemented for data scientists by people like you?

To me, this book is an illustration of the power of names. Today, in the era of Google, if you know the name of something, you can find out about it with a simple search. But if you don’t know of what you’re looking for, it suddenly becomes much harder to find it. Having in the back of your head the names of common algorithms that help you solve problems is really powerful. When you identify a new problem, it helps you to come up with ideas, for example to use breadth-first search, or a binary tree, etc.

Knowing how to write clearly helps you to write code clearly, and also helps you writing good documentation and explain the intent of what you’re doing. Even very good code will only ever tell you how something has been implemented; it won’t tell you why a particular technique has been chosen. Writing well and describing things well is very valuable to a good programmer, and even more to a data scientist. It doesn’t matter how wonderful your data analysis is, if you can’t explain to somebody else what you’ve done, why it makes sense, and what to take away from it.