# MAT5153

## Mathematical Foundations of Data Analytics MDC4153/MAT5153

Catalog entry

Prerequisite: MAT2243 Applied Linear Algebra or (MAT2233 Linear Algebra and MAT2214/MAT2213 Calculus III).

Content: Data Analytics refers to classic data analysis plus all other areas that support it. This immersive Data Analytics course equips students with the essential mathematical skills and knowledge required to analyze, visualize, and interpret complex datasets. Students will be exposed to the entire life cycle of data analysis. Throughout the course, participants will explore basic operations in scripting languages, delve into advanced visualization techniques, and investigate linear discriminants, generalized regressions, time series analysis, and non-linear discriminants, and clustering. Students will program essential algorithms, instead of using toolboxes, to explore the discrete Fourier transform, generalized regressions, clustering algorithms, and artificial neural networks. This course uses generative AI tools to aid in data analysis.

Furthermore, the course will provide an understanding of relational databases and their integration with programming environments, as well as guidance on creating effective data analysis plans. Emphasis will be placed on solution architecture, reproducibility, configuration management, and generating standardized reports.

By the end of the course, students will have a strong foundation in data analytics, allowing them to transform raw data into valuable insights for decision-making.

## Course Content

Week Source Topic Prerequisites SLOs
1 Description of the course project. None. Understanding of government databases. Conduct basic data exploration. Identity questions answerable with data available for a specific problem.
2 Scripts vs. compiled code. Previous exposure to computer programming in any language. Basic numeric operations in scripting vs. compiled code. Clarity about the differences between interpreted and compiled code, and how it impacts data analysis. Setting up environments.
3 Ethics in data analysis. None. Identification of biases introduced during data collection, storage, analysis, and access.
4 - 5 Linear discriminants Linear Algebra and Calculus I Ability to minimize an equation involving matrices and vectors. Mastery of Principal Component Analysis (PCA), Fisher's linear discriminant, and multiple discriminant analysis. Mastery in multi-linear operations in scripting and compiled languages. Understanding of the balance between computational performance and development time.
6 Visualization (basic and advanced). Scripts vs. compiled code (week 2) Understanding of different families of visualization techniques. Ability to create Circos plots.
7 Generalized regressions Linear discriminants (week 4-5) Understanding of mathematical approaches to produce an infinite family of regressions for the purpose of data smoothing.
8 Relational databases Scripts vs. compiled code (week 2) Ability to create, access, and use relational databases from within programming environments. Understanding of when to use relational databases
9 Clustering Generalized regressions (week 7) Ability to create basic clusters using multiple definitions of distance.
10 Solution architecture & reproducibility. Scripts vs. compiled code (week 2) Capacity to design a complex data analysis solution that guarantees reproducibility, interoperability, and maintainability,
11 Non-linear discriminants (i.e. artificial neural networks). Clustering (week 9) Capability to program a fully-connected feed-forward artificial neural network from scratch. Understanding of the effect of multiple activation functions,
12 Management of the configuration. Scripts vs. compiled code (week 2) Ability to program in collaborative multi-layered environments. Capacity to resolve conflicts in code, create code branches, and propagate effectively code changes across multiple environments such as development, test production, etc.
13 Data analysis plans & standardized reports. Scripts vs. compiled code (week 2) Dexterity to break a data analysis problems into multiple interconnected components, and then produce automated reports targeting specific audiences.
14 Project presentations Entire course Exposure to presentation of results in front an audience of experts.