You are here

Introduction to Data Mining

Data Mining

Next Offering

Start Date: February 26, 2020
End Date:  April 15, 2020

 

Introduction to Data Mining is one of five non-credit courses in the Certification in Practice of Data Analytics (CPDA) program. This course can be taken individually, or as one of four courses required to receive the CPDA certificate of completion.

Introduction to Data Mining is the second course in the sequence of the CPDA program. After learning how to analyze data statistically, students learn how to sort through large datasets to identify trends, patterns, and relationships and discover insights previously unknown and to leverage them in business operations. The course is delivered in 100% distance learning format and includes instructional material equivalent to a one semester credit hour class.  


Course Description

This course is an introduction to data mining fundamentals and algorithms. Students will develop an appreciation for data preparation and transformation, an understanding of the data requirements for the various algorithms and learn when it is appropriate to use which algorithm. This is a project based course where students take their business problem through a data mining methodology. Specific topics also include; Distance/Similarity Measurement; Anomaly Detection; and Association, Classification, Clustering, and Pattern Algorithms.

4 CEUs are granted upon successful completion of the course.


You will learn to:

  1. Understand the data mining methodology and why it is important to data science.
  2. Describe how analytics and data can solve business problems and plan an applicable project.
  3. Use data preparation techniques on analytic data sets to gain an understanding of the data and prepare it for use in modeling.
  4. Articulate the differences in most common models and when to use them.
  5. Objectively evaluate model results and how they were achieved.
  6. Present project results and determine an effective model deployment plan.


Prerequisites 

College level coursework in statistics is required. If you are pursuing the CPDA Certification it is required that students will complete Introductory Statistics for Data Analytics before taking this course. Please contact the program with questions or for clarification.   

This class requires you to use the statistical software package called R (The R Project for Statistical Computing;(http://www.r-project.org/). This software package is available as Free Software.

Students must complete this free training before starting the Data MIning class.

  • Hands-on tutorials are available in the Swirl system, which you can learn about at http://swirlstats.com/. In particular, “R Programming: The basics of programming in R” is an appropriate first tutorial for students who have never used R.
  • An easier to use interface to R is available in the software package RStudio. This package is available for Windows, Mac, and Linux and can be downloaded for free from http://rstudio.org.
  • From the CRAN archive at https://cran.r-project.org, you can download R for Windows, Mac, and Linux.
  • An in-depth "Introduction to R" training manual is available at http://cran.r-project.org/doc/manuals/R-intro.pdf


Students are also required to learn Structured Query Language (SQL) before starting this course. Free online training that will prepare you for this course can be found at: 


Click Here to learn more about how this course is delivered 100% online!
 

Expected Time Commitment to Complete this Course

Each course is equivalent to a one semester credit hour class. Therefore each class consists of approximately 40 hours of class time that includes 12-13 hours of recorded faculty lectures and 23-24 hours of additional course work. Each course is seven weeks in length, so each week there is 5.7 hours of combined class time (40 hrs / 7 weeks). The average student should allow a 2:1 study-to-class-time ratio to complete the course. This means you should plan to study two hours for each one hour of class time. This equates to 11-12 hours each week to complete all course work. (5.7 hrs X 2 = 11-12 hrs).  Based on a person's own personal strengths and experience, you should increase or decrease the ratio.

Cancellations and Refunds

A full refund minus a $50 administrative fee will be made if cancellation is received three weeks prior to the start of the course. No refunds within three weeks of the course start date. 

Course Offering Dates

Each course offering in this program is faculty lead, therefore it operates with a specific start date and end date. Students must complete each course during the specific time frame. Access to the online course and materials is removed when the course ends.