This course will introduce algorithmic techniques from Machine Learning (ML) for identifying useful and relevant patterns, associations, and relationships in and from natural language and text data in order to automate the process of learning from these types of data. The student will learn how ideas and methods from probability theory, mathematical statistics, learning theory, optimization, and computational complexity theory are combined to design these algorithmic techniques. Fundamental methods from Natural Language Processing (NLP) such as word and text embeddings, classification, supervised learning, generalization theory, and the model reduction will be introduced. Methods for query relevance assessment and relevance-ranking will be discussed. Specific examples of industry and business use cases for NLP will be given in the course.
The student is required to work on course projects by using modern data analysis software and cases studies. This course will focus on the implementation of NLP algorithms using the Python language.
To learn how computational methods and techniques are employed in Natural Language Processing and text mining and to learn the analytical, theoretical, and intuitive ideas that underpin them.
To understand and become familiar with the implementation details of NLP algorithms.
To gain hands-on experience with NLP tools in the Python language.
Week 1: Natural Language Processing Overview and Text Representation
Week 2: Bag-of-Words Approach and Word Embeddings
Week 3: ML classification algorithms for NLP and text mining
Week 4: Introduction to Artificial Neural Networks for NLP
Week 5: Support Vector Machines for NLP
Week 6: Ensemble learning, Boosting, and Bayesian ML for text mining
Week 7: Testing, Verification, Validation, and Visualization for text mining
Week 8: Information retrieval and text ranking
Instructor
Daniel Zanger, Ph.D
.
Daniel Zanger, Ph.D., has over 20 years of experience in both industry and the federal government, working extensively in the fields of theoretical and applied machine learning, data analysis, optimization, statistical database privacy, cryptology, quantum computing, and others. He has applied techniques from these fields to problems in such areas as text mining, image processing, operations research, and multi-sensor fusion. Dr. Zanger has authored numerous publications in refereed journals and conference proceedings in various technical fields including mathematics (partial differential equations), probability theory, information retrieval, statistical learning theory (applied to finance), operations research, and database privacy. He holds a Ph.D. in Mathematics from the Massachusetts Institute of Technology (MIT) as well as a B.A. (with Highest Honors), also in Mathematics, from the University of California at Berkeley