Sentience

Sentience

A workshop for improving general understanding of artificial intelligence. Cheng Yu Tung College.

v 1

Introduction

Welcome to Sentience!

Description:

This workshop is designed to empower students to use artificial intelligence as a fundamental tool for innovation and problem-solving, ensuring they remain indispensable in a world that’s increasingly reliant on AI. Students will be expected to engage in live, hands-on practice to adopt these technologies effectively, much as prior generations integrated calculators and computers. The goal is to demystify AI, share the joy of learning with these new tools, and provide the foundational knowledge to explore and apply emerging technologies in their own field of study. If you already know a topic, you’re welcome to also share your knowledge with the class, as teaching is the best way to solidify your foundation.

Prerequisite:

NO PRE-REQUISITE KNOWLEDGE IS REQUIRED.

  • Own a computer / able to access the internet
  • A foundational understanding of programming. While it is highly encouraged, it’s not needed for this course as students have enough to deal with in their core subject without programming.
  • Familiarity with basic mathematical concepts, including linear algebra and calculus is useful, but considering you all have done high school mathematics, the basics are generally fulfilled.

Individual Meeting Information:

  • Weekly Schedule: Wednesdays, 11:00 AM - 12:00PM (~1 Hour) Class will be taught in English, Chinese can be spoken to encourage understanding, but it will rely on students translating it themselves

Passing Policy:

Trying is worth more than copy-pasting answers, thus the grading below reflects this ideology.

  • Marks to pass: 30
  • Attempting a practice: 2 / Completing a practice: 3
  • Attendance: 2 points per session (+ College Activity) It means you can get the certificate by completing all the assignments without coming to a single session. There are many combinations of ‘strats’ to pass, just keeping active and trying to learn will be enough.

Resources:

  • Guide to Google Colab CN: link
  • Channel for many ML and Mathematical Concepts CN: link
v 1

Week 0 - Introduction to AI, LLMs, and Agents

Introduction to AI, LLMs, and Agents

Lesson Summary

This session establishes the philosophical and conceptual foundation for the course, emphasizing a hands-on, trial-and-error learning approach in an open, interactive classroom environment. The core message frames AI as a tool to be understood and wielded rather than an oracle to blindly trust. The lesson demystifies the "Big 3" hierarchy: Artificial Intelligence as the broad field, Machine Learning as the core methodology, and Deep Learning as the specialized technique powering modern breakthroughs. A significant focus is placed on Large Language Models, explaining their foundation in human language, mathematics, and vast datasets, while introducing AI agents as autonomous systems that perceive, decide, and act in environments.

Key Topics

  • AI as a Tool: Emphasizes understanding and wielding AI through hands-on experimentation rather than blind trust, promoting learning by doing as the core pedagogical approach.
  • The "Big 3" Hierarchy: Explains the nested relationship from AI (broad field) → ML (core methodology) → DL (specialized technique) that structures modern AI development.
  • LLMs Explained: Details how Large Language Models generate human-like text based on language patterns, mathematical foundations, and training on massive datasets.
  • AI Agents: Introduces autonomous systems that perceive environments, make decisions, and act independently, bridging theory to practical applications.

Materials

View Week 0 Slides

v 1

Week 1 - Data Storage: The Foundation of AI

The First Step for Anything Technology: Data Storage and File Formats

Lesson Summary

Week 1 focuses on the practical bedrock of AI: data storage and file formats. While common formats like .docx or .jpg are well-known, the real power in AI lies in understanding specialized formats used for models and datasets. The session centers on .safetensors, a secure and efficient format developed by Hugging Face for sharing machine learning models (weights, biases, embeddings), presented as a safer, faster alternative to potentially unsafe formats like PyTorch's .pt (which can execute arbitrary code) or slower formats like TensorFlow's SavedModel or NumPy's .npz. The lesson also covers tensors as multi-dimensional arrays, LoRA for fine-tuning, checkpoints for training states, VAEs for latent compression, GGUF for local deployment, JSON for configurations, .path for file management, and .parquet for efficient large-scale data storage.

Key Topics

  • .safetensors: Secure, efficient format for sharing ML models that prevents code execution and offers faster loading compared to .pt or .pth files.
  • Tensor Fundamentals: Multi-dimensional arrays (scalars, vectors, matrices) that form the fundamental data structure for neural network operations.
  • Model Storage Formats: LoRA adapters for lightweight fine-tuning, checkpoints (.ckpt, .pt, .pth) for saving training progress, VAEs for data compression, and GGUF for quantized local deployment.
  • Data Management: JSON for human-readable configurations, .path files for centralized path management, and .parquet for compressed, column-based large dataset handling.

Materials

View Week 1 Slides

v 1

Week 2 - Classical Machine Learning

What alternatives to transformers are there?

Lesson Summary

Week 2 examines the practical split between supervised learning with labeled examples and unsupervised learning that discovers structure from unlabeled data, grounding both with everyday applications like spam filtering, fraud detection, and recommendation systems. The session emphasizes method selection trade-offs, favoring simpler, interpretable baselines when appropriate and moving to ensembles or margin-based methods as complexity and performance needs grow. Key algorithms covered include linear/polynomial/multivariate regression, logistic regression, decision trees, random forests, gradient boosting, SVMs for supervised tasks, and k-means clustering, PCA for dimensionality reduction, and association rule mining for unsupervised pattern discovery. The hands-on lab applies these concepts to bank customer churn prediction and segmentation.

Key Topics

  • Supervised Learning: Regression (linear, polynomial, multivariate), classification (logistic regression, decision trees, random forests, gradient boosting, SVMs) with emphasis on interpretability vs performance trade-offs.
  • Unsupervised Learning: k-means clustering for customer segmentation, PCA for dimensionality reduction, and association rule mining for discovering hidden patterns in data.
  • Practical Model Selection: Mapping problems to labeled vs unlabeled setups, selecting models based on data size, interpretability requirements, and performance needs.

Materials

View Week 2 SlidesOpen Colab Code

v 1

Week 3 - LLMs: Tokens to Fine-Tuning

From tokens to fine‑tuning

Lesson Summary

Week 3 builds practical intuition from tokens to training, then applies it in a PyTorch fine-tuning lab. The session begins with tokenization mechanics—how raw text transforms into token IDs—through live demos that reveal how LLMs "see" text. It reviews training fundamentals including loss functions that drive learning via backpropagation, and how activation functions and optimizers shape convergence and model capacity, motivating PyTorch's flexibility for rapid NLP experimentation. The hands-on lab walks through end-to-end fine-tuning of DistilBERT for binary sentiment classification, covering environment setup, dataset loading, model/optimizer configuration, training loops with per-epoch logging, validation, and accuracy tracking. Finally, the lesson surfaces common hallucination causes and practical mitigations including retrieval augmentation, task grounding, and confidence signaling to reduce failure modes in deployment.

Key Topics

  • Tokenization Mechanics: Converting raw text to token IDs, understanding how LLMs process textual input at the subword level.
  • Training Fundamentals: Loss functions driving backpropagation, activations, and optimizers shaping convergence; why PyTorch excels for rapid NLP prototyping.
  • End-to-End Fine-Tuning: Complete workflow for DistilBERT sentiment classification including setup, data loading, model configuration, training loops, validation, and accuracy analysis.
  • Hallucinations & Mitigations: Understanding fluent-but-incorrect generations, common triggers (missing context, domain drift), and practical guardrails like retrieval augmentation and confidence cues.

Materials

Open Colab Code

v 1

Week 4 - Transformers and Attention

How modern models understand context

Lesson Summary

Traditional RNN and seq2seq models struggled with long-range dependencies and slow sequential processing, whereas transformers attend to all tokens at once and learn relevance across entire sequences. Core ideas include self-attention with queries, keys, and values; positional encodings to restore word order; multi-head attention for diverse relational patterns; and architectural components like residual connections and LayerNorm/RMSNorm for stability and depth. The lesson explains why parallel attention overcomes RNN bottlenecks in speed and context handling, derives scaled dot-product attention with the $\sqrt{d_k}$ scaling factor, distinguishes encoder-decoder from decoder-only architectures, and describes how positional encodings, residuals, and normalization enable deep, stable training.

Key Topics

  • Self-Attention Mechanism: Query-key-value scoring that allows parallel processing of entire sequences, capturing long-range dependencies impossible for RNNs.
  • Scaled Dot-Product Attention: Mathematical formulation $\text{Attention}(Q, K, V) = \softmax\left( \frac{QK^\top}{\sqrt{d_k}} \right) V$ and why scaling stabilizes gradients during training.
  • Positional Encodings & Multi-Head Attention: Sine/cosine patterns to encode order and relative distance; multiple attention heads capturing complementary relationships in parallel.
  • Architectural Stabilizers: Residual connections and LayerNorm/RMSNorm enabling deeper networks with improved gradient flow and activation stability.

Materials

View Week 4 Slides

v 1

Week 5 - Modern AI: MoE, RLHF, and Diffusion

Transformer Architecture 2

Lesson Summary

The lesson contrasts East‑vs‑West enterprise AI workflows, then traces GPT's trajectory from pretraining and supervised fine‑tuning through reward modeling, RLHF, and "thinking longer" at inference, highlighting why models appear better even at similar parameter scales. DeepServe as a case study for MoE routing, multi‑head latent attention, and distillation to cut compute while maintaining quality, followed by a practical primer on diffusion for images and videos. The session culminates in three key takeaways: bigger isn't always better, compression isn't a compromise, and cross‑domain learning works effectively for generalization.

Key Topics

  • GPT Evolution & RLHF: Timeline from GPT-1's pretraining to GPT-4's multimodality and MoE, with RLHF optimizing for human preferences to mitigate harmful behavior.
  • Test-Time Compute & Efficiency: How models "think longer" per prompt at inference to improve reasoning without simply increasing parameter counts.
  • MoE Mechanics: DeepSeek's implementation of Mixture-of-Experts with specialized subnetworks, gating routers, sparse activation, and load-balancing to avoid expert collapse.
  • Diffusion for Generation: Forward noising and learned reverse denoising processes in Stable Diffusion (VAE, CLIP guidance, U-Net attention) and video generation (Sora's 3D patches, Veo's multimodal training).

Materials

View Week 5 Slides

v 1

Week 6 - Modern AI: RAG and Knowledge Augmentation

Retrieval Augmented Generation

Lesson Summary

The lesson explains why RAG is essential: reducing model hallucinations, overcoming training cut-off dates, and enabling fact-checking by providing sources. It details the RAG process—embedding, retrieval, and augmentation—and contrasts it with real-time web search in terms of efficiency and context limitations. The session surveys how RAG is used in modern AI systems and explores how students from business, social science, art, and science majors can apply RAG in their fields through major-specific adaptations.

Key Topics

  • Why RAG: Reduces hallucinations by grounding responses in retrieved evidence, provides updated knowledge beyond training data, and enables source verification for trustworthiness.
  • RAG Workflow: Complete pipeline from query → embedding → vector database retrieval → prompt augmentation → LLM generation with semantic similarity matching.
  • Technical Implementation: Embeddings converting text to numerical vectors; dense (semantic), sparse (keyword), and hybrid retrieval methods; chunking strategies and reranking for context optimization.
  • Real-World Applications: Search engines, translation systems, recommendation engines, chatbots, legal/medical document lookup, and major-specific use cases (business finance docs, social science sentiment analysis, art style mixing, science molecule discovery).

Materials

View Week 6 Slides

v 1

Week 7 - AI Agents, APIs, and MLOps

AI Agents, Tool Usage, and Production Deployment

Lesson Summary

The lesson explains how AI agents extend LLM capabilities through tool usage, enabling real-time data access, code execution, and multi-step research. It details the ReAct framework (Receive input → Think → Act → Evaluate) and demonstrates API integration for data retrieval, contrasting traditional REST APIs with MCP for safer, more universal AI-tool interaction. The session introduces MLOps as the automation backbone for maintaining AI products, showing how continuous integration pipelines can detect issues, redesign interfaces, and deploy updates automatically. Finally, it covers essential software development protocols and standards including PEP 8, version control with Git, SOLID principles, and modularity through config files.

Key Topics

  • AI Agents & ReAct Framework: Autonomous systems that use tools (APIs, databases) to extend LLMs beyond training data via structured Receive-Think-Act-Evaluate reasoning cycles.
  • API Integration: REST, GraphQL, WebHooks, and gRPC formats for data sharing via JSON; public/private API authentication; MCP (Model Context Protocol) as Anthropic's solution for safer, universal AI-tool interaction.
  • MLOps Automation: End-to-end product maintenance pipeline from user log analysis → heatmap review → vision model evaluation → code rewriting → containerization → A/B testing → deployment.
  • Software Engineering Standards: PEP 8 Python conventions, Git version control, SOLID principles (Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion), and YAML/JSON config files for hyperparameters and API keys.

Materials

View Week 7 Notebook