COURSE OUTLINE
Day 1: Foundations of NLP and the Generative Shift
Focus: From linguistics to Large Language Models.
- NLP Fundamentals: Text normalization, tokenization, and part-of-speech tagging in the modern era.
- The Evolution of Language Models: From N-grams to Word2Vec, and the breakthrough of the Transformer architecture.
- Generative AI Concepts: Understanding probability distributions in text and how Gemini "predicts" the next token.
- Hands-on: Setting up a Python environment and performing basic text analysis with the Gemini API.
Day 2: Advanced Prompt Engineering for NLP Tasks
Focus: Directing LLMs to perform complex linguistic work.
- The Prompting Hierarchy: Zero-shot vs. Few-shot learning for specialized NLP tasks.
- Structured Information Extraction: Using Gemini to extract JSON-formatted data from messy, unstructured text.
- Text Transformation: Automated translation, style transfer (formal to casual), and long-form summarization.
- Hands-on Lab: Building an automated "Email Intelligence" tool that categorizes, summarizes, and extracts action items from threads.
Day 3: Embeddings and Vector Databases
Focus: Mapping language to mathematical space.
- Understanding Embeddings: How text is converted into high-dimensional vectors.
- Semantic Similarity: Using Cosine Similarity to find related documents or concepts.
- Vector Search Infrastructure: Introduction to Vertex AI Vector Search and local alternatives like FAISS.
- Hands-on Lab: Creating a semantic recommendation engine that finds similar news articles based on meaning rather than tags.
Day 4: Retrieval-Augmented Generation (RAG)
Focus: Connecting AI to private knowledge bases.
- The RAG Architecture: Orchestrating the flow between User Query → Vector Search → LLM Context.
- Document Processing: Advanced chunking strategies and handling different file types (PDF, Markdown, HTML).
- Grounding and Verification: Reducing hallucinations by forcing Gemini to cite its sources.
- Hands-on Lab: Developing a "Corporate Wiki Assistant" that answers HR and Policy questions using internal company documents.
Day 5: Fine-Tuning and Domain Adaptation
Focus: Specialized models for specialized industries.
- When to Fine-Tune: Evaluating the trade-offs between RAG and Parameter-Efficient Fine-Tuning (PEFT).
- Dataset Preparation: Cleaning and formatting domain-specific data for model training.
- Evaluation Metrics: Understanding BLEU, ROUGE, and using "LLM-as-a-Judge" for qualitative assessment.
- Hands-on Lab: Fine-tuning a smaller model (Gemma) for a specific medical or legal terminology task.
Day 6: Deployment, Ethics, and Governance
Focus: Shipping reliable and responsible AI.
- AI Security: Protecting against prompt injection and ensuring data privacy on Google Cloud.
- Responsible AI Frameworks: Implementing Google’s safety filters and bias detection.
- Production Workflows: Using Vertex AI Pipelines to monitor model performance and data drift.
- Final Project: Building a complete, end-to-end "Insight Engine" that ingests real-time data and provides AI-driven analysis.