IS 407 Introduction to Data Science

This course introduces students to data science approaches that have emerged from recent advances in programming and computing technology. They will learn to collect and use data from a variety of sources, including the web, in a modern statistical inference and visualization paradigm. The course will be based in the programming language R, but will also use HTML, regular expressions, basic Unix tools, XML, and SQL. Supervised and unsupervised statistical learning techniques made possible by recent advances in computing power will also be covered.

Learning objectives

  • Work at a Unix prompt
  • Use the programming language R to process, visualize, and persist large data sets
  • Use database technologies including SQL