IS 407 Introduction to Data Science

This course introduces students to data science approaches that have emerged from recent advances in programming and computing technology. They will learn to collect and use data from a variety of sources, including the web, in a modern statistical inference and visualization paradigm. The course will be based in the programming language R, but will also use HTML, regular expressions, basic Unix tools, XML, and SQL. Supervised and unsupervised statistical learning techniques made possible by recent advances in computing power will also be covered.

Previously IS457.

Learning objectives

  • Work at a Unix prompt
  • Use the Python programming language to process, visualize, and persist large data sets
  • Use database technologies including SQL

Recent syllabus

Textbooks and Course Materials

Available from the Illinois Union Bookstore (IUB).