Start Date: February 27, 2019
End Date: April 17, 2019
Data Mining is one of five non-credit courses in the Certification in Practice of Data Analytics (CPDA) program. The course is delivered in 100% distance learning format and includes instructional material equivalent to a one semester credit hour class.
This course can be taken individually, or as one of four courses required to receive the CPDA certificate of completion. It is expected that participants take the Foundations of Statistics first, followed by Data Mining. Practical Application to Advanced Analytics, Machine Learning or Visualization Analytics and Sensemaking can follow in any order.
This course is an introduction to data mining fundamentals and algorithms. Students will develop an appreciation for data preparation and transformation, an understanding of the data requirements for the various algorithms and learn when it is appropriate to use which algorithm. Specific topics also include; Distance/Similarity Measurement; Anomaly Detection; and Association, Classification, Clustering, and Pattern Algorithms.
4 CEUs are granted upon successful completion of the course.
You will learn to:
- Describe and critically assess the importance of a methodology and data preparation.
- Evaluate, identify, and process different measures of similarity and basics of visualization (do's and don'ts).
- Identify and evaluate various classification approaches, clustering techniques, articulate how they work, and determine when to use them.
- Utilize association analysis, anomaly detection, and linear regression, articulate how they work, and determine when to use them.
College level coursework in statistics is required. If you are pursuing the CPDA Certification and do not already have that background, it is expected that students will complete Foundation of Statistics before taking this course. Please contact the program with questions or for clarification.
This class requires you to use the statistical software package called R (The R Project for Statistical Computing;(http://www.r-project.org/). This software package is available as Free Software.
- From the CRAN archive at https://cran.r-project.org, you can download R for Windows, Mac, and Linux.
- An in-depth introduction to R is available at http://cran.r-project.org/doc/manuals/R-intro.pdf
- Hands-on tutorials are available in the Swirl system, which you can learn about at http://swirlstats.com/. In particular, “R Programming: The basics of programming in R” is an appropriate first tutorial for students who have never used R.
- An easier to use interface to R is available in the software package RStudio. This package is available for Windows, Mac, and Linux and can be downloaded for free from http://rstudio.org.
Students are also required to learn Structured Query Language (SQL). Free online training that will prepare you for this course can be found at:
Click Here to learn more about how this course is delivered 100% online!
Expected Time Commitment to Complete this Course
Each course is equivalent to a one semester credit hour class. Therefore each class consists of approximately 40 hours of class time that includes 12-13 hours of recorded faculty lectures and 23-24 hours of additional course work. Each course is seven weeks in length, so each week there is 5.7 hours of combined class time (40 hrs / 7 weeks). The average student should allow a 2:1 study-to-class-time ratio to complete the course. This means you should plan to study two hours for each one hour of class time. This equates to 11-12 hours each week to complete all course work. (5.7 hrs X 2 = 11-12 hrs). Based on a person's own personal strengths and experience, you should increase or decrease the ratio.
Cancellations and Refunds
A full refund minus a $50 administrative fee will be made if cancellation is received three weeks prior to the start of the course. No refunds within three weeks of the course start date.
Course Offering Dates
Each course offering in this program is faculty lead, therefore it operates with a specific start date and end date. Students must complete each course during the specific time frame. Access to the online course and materials is removed when the course ends.