Data Mining

Data Mining Course

Next Offering

Start Date: October 14, 2019
End Date:  December 2, 2019

Data Mining is one of five non-credit courses in the Certification in Practice of Data Analytics (CPDA) program. The course is delivered in 100% distance learning format and includes instructional material equivalent to a one semester credit hour class.  

This course can be taken individually, or as one of four courses required to receive the CPDA certificate of completion. It is required that participants take the Introductory Statistics for Data Analytics first, followed by Data Mining. Practical Application to Advanced Analytics, Machine Learning or Visualization Analytics and Sensemaking can follow in any order. 


Course Description

This course is an introduction to data mining fundamentals and algorithms. Students will develop an appreciation for data preparation and transformation, an understanding of the data requirements for the various algorithms and learn when it is appropriate to use which algorithm. Specific topics also include; Distance/Similarity Measurement; Anomaly Detection; and Association, Classification, Clustering, and Pattern Algorithms.

4 CEUs are granted upon successful completion of the course.

You will learn to:
  1. Describe and critically assess the importance of a methodology and data preparation.
  2. Evaluate, identify, and process different measures of similarity and basics of visualization (do's and don'ts).
  3. Identify and evaluate various classification approaches, clustering techniques, articulate how they work, and determine when to use them.
  4. Utilize association analysis, anomaly detection, and linear regression, articulate how they work, and determine when to use them.


College level coursework in statistics is required. If you are pursuing the CPDA Certification and do not already have that background, it is required that students will complete Introductory Statistics for Data Analytics before taking this course. Please contact the program with questions or for clarification.   

This class requires you to use the statistical software package called R (The R Project for Statistical Computing;( This software package is available as Free Software.

  • From the CRAN archive at, you can download R for Windows, Mac, and Linux.
  • An in-depth introduction to R is available at
  • Hands-on tutorials are available in the Swirl system, which you can learn about at In particular, “R Programming: The basics of programming in R” is an appropriate first tutorial for students who have never used R.
  • An easier to use interface to R is available in the software package RStudio. This package is available for Windows, Mac, and Linux and can be downloaded for free from

Students are also required to learn Structured Query Language (SQL). Free online training that will prepare you for this course can be found at:


Click Here to learn more about how this course is delivered 100% online!

Expected Time Commitment to Complete this Course

Each course is equivalent to a one semester credit hour class. Therefore each class consists of approximately 40 hours of class time that includes 12-13 hours of recorded faculty lectures and 23-24 hours of additional course work. Each course is seven weeks in length, so each week there is 5.7 hours of combined class time (40 hrs / 7 weeks). The average student should allow a 2:1 study-to-class-time ratio to complete the course. This means you should plan to study two hours for each one hour of class time. This equates to 11-12 hours each week to complete all course work. (5.7 hrs X 2 = 11-12 hrs).  Based on a person's own personal strengths and experience, you should increase or decrease the ratio. 

Cancellations and Refunds

A full refund minus a $50 administrative fee will be made if cancellation is received three weeks prior to the start of the course. No refunds within three weeks of the course start date. 

Course Offering Dates

Each course offering in this program is faculty lead, therefore it operates with a specific start date and end date. Students must complete each course during the specific time frame. Access to the online course and materials is removed when the course ends.