Difference between revisions of "MDC5153"

From Department of Mathematics at UTSA
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Data Analytics MDC4153/MDC5153=
+
 
 +
== Mathematical Foundations of Data Analytics MDC4153/MDC5153 ==
  
 
'''Catalog entry'''
 
'''Catalog entry'''
  
''Prerequisite'': [[MAT2243]] Applied Linear Algebra OR [[MAT2233]] Linear Algebra + [[MAT2214]]/[[MAT2213]] Calculus III.
+
''Prerequisite'': [[MAT2253]] Applied Linear Algebra or ([[MAT2233]] Linear Algebra and [[MAT2214]]/[[MAT2213]] Calculus III).
  
''Content'': This immersive Data Analytics course equips students with the essential skills and knowledge required to analyze, visualize, and interpret complex datasets. Students will learn the importance of ethics in data analysis and how to set up a suitable environment for efficient processing. Throughout the course, participants will explore basic operations in Python, delve into advanced visualization techniques, and investigate linear and non-linear discriminants, such as artificial neural networks.
+
''Content'': Data Analytics refers to classic data analysis plus all other areas that support it. This immersive Data Analytics course equips students with the essential mathematical skills and knowledge required to analyze, visualize, and interpret complex datasets. Students will be exposed to the entire life cycle of data analysis. Throughout the course, participants will explore basic operations in scripting languages, delve into advanced visualization techniques, and investigate linear discriminants, generalized regressions, time series analysis, and non-linear discriminants, and clustering. Students will program essential algorithms, instead of using toolboxes, to explore the discrete Fourier transform, generalized regressions, clustering algorithms, and artificial neural networks.  
  
Furthermore, the course will provide an understanding of relational databases and their integration with programming environments, as well as guidance on creating effective data analysis plans. Emphasis will be placed on solution architecture, reproducibility, configuration management, and generating standardized reports. By the end of the course, students will have a strong foundation in data analytics, allowing them to transform raw data into valuable insights for decision-making.
+
This course uses generative AI tools to aid in data analysis. Furthermore, the course will provide an understanding of relational databases and their integration with programming environments, as well as guidance on creating effective data analysis plans. Emphasis will be placed on solution architecture, reproducibility, configuration management, and generating standardized reports.  
 +
 
 +
By the end of the course, students will have a strong foundation in data analytics, allowing them to transform raw data into valuable insights for decision-making.
  
 
== Course Content==
 
== Course Content==
Line 13: Line 16:
 
! Week !! Source !! Topic !! Prerequisites !! SLOs
 
! Week !! Source !! Topic !! Prerequisites !! SLOs
 
|-
 
|-
| 1 ||  || Description of the course project. || ||  
+
| 1 ||  || Description of the course project. || None. || Understanding of government databases. Conduct basic data exploration. Identity questions answerable with data available for a specific problem.  
|-
 
| 2 ||  || Ethics in data analysis. ||  ||
 
 
|-
 
|-
| 3 ||  || Environment setup. || ||  
+
| 2 ||  || Scripts vs. compiled code.  || Previous exposure to computer programming in any language. || Basic numeric operations in scripting vs. compiled code. Clarity about the differences between interpreted and compiled code, and how it impacts data analysis. Setting up environments.
 
|-
 
|-
| 4 ||  || Basic operations in python. || ||  
+
| 3 ||  || Ethics in data analysis. || None. || Identification of biases introduced during data collection, storage, analysis, and access.
 
|-
 
|-
| 5 ||  || Visualization (basic and advanced). || ||  
+
| 4 - 5 ||  || Linear discriminants || Linear Algebra and Calculus I || Ability to minimize an equation involving matrices and vectors. Mastery of Principal Component Analysis (PCA), Fisher's linear discriminant, and multiple discriminant analysis. Mastery in multi-linear operations in scripting and compiled languages. Understanding of the balance between computational performance and development time.
 
|-
 
|-
| 6 ||  || Linear discriminants & regressions. ||  ||  
+
| 6 ||  || Visualization (basic and advanced). ||  Scripts vs. compiled code (week 2) || Understanding of different families of visualization techniques. Ability to create Circos plots.
 
|-
 
|-
| 7 ||  || Relational databases. || ||  
+
| 7 ||  || Generalized regressions || Linear discriminants (week 4-5) || Understanding of mathematical approaches to produce an infinite family of regressions for the purpose of data smoothing.
 
|-
 
|-
| 8 ||  || Relational databases from python. ||  ||
+
| 8 ||  || Relational databases || Scripts vs. compiled code (week 2) || Ability to create, access, and use relational databases from within programming environments. Understanding of when to use relational databases
 
|-
 
|-
| 9 ||  || Non-linear discriminants (i.e. artificial neural networks). ||  ||  
+
| 9 ||  || Clustering || Generalized regressions (week 7) || Ability to create basic clusters using multiple definitions of distance.
 
|-
 
|-
| 10 ||  || Data analysis plans. || ||  
+
| 10 ||  || Solution architecture & reproducibility. || Scripts vs. compiled code (week 2) || Capacity to design a complex data analysis solution that guarantees reproducibility, interoperability, and maintainability,
 
|-
 
|-
| 11 ||  || Solution architecture & reproducibility. || ||  
+
| 11 ||  || Non-linear discriminants (i.e. artificial neural networks). || Clustering (week 9) || Capability to program a fully-connected feed-forward artificial neural network from scratch. Understanding of the effect of multiple activation functions, 
 
|-
 
|-
| 12 ||  || Management of the configuration. || ||  
+
| 12 ||  || Management of the configuration. || Scripts vs. compiled code (week 2) || Ability to program in collaborative multi-layered environments. Capacity to resolve conflicts in code, create code branches, and propagate effectively code changes across multiple environments such as development, test production, etc.
 
|-
 
|-
| 13 ||  || Standardized reports.  || ||  
+
| 13 ||  || Data analysis plans & standardized reports. || Scripts vs. compiled code (week 2) || Dexterity to break a data analysis problems into multiple interconnected components, and then produce automated reports targeting specific audiences.
 
|-
 
|-
| 14 ||  || Project presentations
+
| 14 ||  || Project presentations || Entire course || Exposure to presentation of results in front an audience of experts.
 
|}
 
|}

Latest revision as of 11:17, 30 June 2023

Mathematical Foundations of Data Analytics MDC4153/MDC5153

Catalog entry

Prerequisite: MAT2253 Applied Linear Algebra or (MAT2233 Linear Algebra and MAT2214/MAT2213 Calculus III).

Content: Data Analytics refers to classic data analysis plus all other areas that support it. This immersive Data Analytics course equips students with the essential mathematical skills and knowledge required to analyze, visualize, and interpret complex datasets. Students will be exposed to the entire life cycle of data analysis. Throughout the course, participants will explore basic operations in scripting languages, delve into advanced visualization techniques, and investigate linear discriminants, generalized regressions, time series analysis, and non-linear discriminants, and clustering. Students will program essential algorithms, instead of using toolboxes, to explore the discrete Fourier transform, generalized regressions, clustering algorithms, and artificial neural networks.

This course uses generative AI tools to aid in data analysis. Furthermore, the course will provide an understanding of relational databases and their integration with programming environments, as well as guidance on creating effective data analysis plans. Emphasis will be placed on solution architecture, reproducibility, configuration management, and generating standardized reports.

By the end of the course, students will have a strong foundation in data analytics, allowing them to transform raw data into valuable insights for decision-making.

Course Content

Week Source Topic Prerequisites SLOs
1 Description of the course project. None. Understanding of government databases. Conduct basic data exploration. Identity questions answerable with data available for a specific problem.
2 Scripts vs. compiled code. Previous exposure to computer programming in any language. Basic numeric operations in scripting vs. compiled code. Clarity about the differences between interpreted and compiled code, and how it impacts data analysis. Setting up environments.
3 Ethics in data analysis. None. Identification of biases introduced during data collection, storage, analysis, and access.
4 - 5 Linear discriminants Linear Algebra and Calculus I Ability to minimize an equation involving matrices and vectors. Mastery of Principal Component Analysis (PCA), Fisher's linear discriminant, and multiple discriminant analysis. Mastery in multi-linear operations in scripting and compiled languages. Understanding of the balance between computational performance and development time.
6 Visualization (basic and advanced). Scripts vs. compiled code (week 2) Understanding of different families of visualization techniques. Ability to create Circos plots.
7 Generalized regressions Linear discriminants (week 4-5) Understanding of mathematical approaches to produce an infinite family of regressions for the purpose of data smoothing.
8 Relational databases Scripts vs. compiled code (week 2) Ability to create, access, and use relational databases from within programming environments. Understanding of when to use relational databases
9 Clustering Generalized regressions (week 7) Ability to create basic clusters using multiple definitions of distance.
10 Solution architecture & reproducibility. Scripts vs. compiled code (week 2) Capacity to design a complex data analysis solution that guarantees reproducibility, interoperability, and maintainability,
11 Non-linear discriminants (i.e. artificial neural networks). Clustering (week 9) Capability to program a fully-connected feed-forward artificial neural network from scratch. Understanding of the effect of multiple activation functions,
12 Management of the configuration. Scripts vs. compiled code (week 2) Ability to program in collaborative multi-layered environments. Capacity to resolve conflicts in code, create code branches, and propagate effectively code changes across multiple environments such as development, test production, etc.
13 Data analysis plans & standardized reports. Scripts vs. compiled code (week 2) Dexterity to break a data analysis problems into multiple interconnected components, and then produce automated reports targeting specific audiences.
14 Project presentations Entire course Exposure to presentation of results in front an audience of experts.