Difference between revisions of "Fall 2018 CS190I 291A Introduction to Natural Language Processing"

From courses
Jump to: navigation, search
(Syllabus)
Line 20: Line 20:
  
 
====Syllabus====
 
====Syllabus====
* Introduction & logistics, and NLP applications
+
* 09/25 Introduction & logistics, and NLP applications
* Basic text processing  
+
* 09/27 Basic text processing  
* N-grams & language models & HW1 out
+
* 10/02 N-grams & language models & HW1 out
* Text classification: naive Bayes  
+
* 10/04 Text classification: naive Bayes  
* Voted perceptron and logistic regression
+
* 10/09 Voted perceptron and logistic regression
* Part-of-speech tagging: HMMs
+
* 10/11 Part-of-speech tagging: HMMs
* HMMs and MEMMs & HW1 due / HW2 out
+
* 10/16 HMMs and MEMMs & HW1 due / HW2 out
* Conditional Random Fields
+
* 10/18 Conditional Random Fields
* In-class midterm exam
+
* 10/23 In-class midterm exam
* Natural language parsing
+
* 10/25 Natural language parsing
* Word sense disambiguation & HW2 due (extended to 02/15 midnight) / HW3 out
+
* 10/30 Guest lecture on Word sense disambiguation & HW2 due / HW3 out
* Guest lecture on Social Media: Vivek Kulkarni (Stony Brook)
+
* 11/01 Guest lecture on Computational Social Science
* Distributional semantics (1): sparse representation
+
* 11/06 Distributional semantics (1): sparse representation
* Distributional semantics (2): dense representation
+
* 11/08 Distributional semantics (2): dense representation
* Machine translation & HW3 due / HW4 out
+
* 11/13 Machine translation & HW3 due / HW4 out
* Question answering  
+
* 11/15 Question answering  
* Deep learning for NLP: RNNs
+
* 11/20 Deep learning for NLP: RNNs
* Deep learning for NLP: CNNs  
+
* 11/22 Thanksgiving: no class.
* Course review and class evaluation & HW4 due
+
* 11/27 Deep learning for NLP: CNNs  
* In-class final exam
+
* 11/29 Final project presentation group
 +
* 12/04 Course review and class evaluation & HW4 due
 +
* 12/06 In-class final exam
  
 
====Course Description====
 
====Course Description====

Revision as of 10:27, 17 September 2018

Instructor and Venue

  • Instructor: William Wang
  • TA: Jing Qian jing_qian@cs.ucsb.edu
  • Time: T R 1:00am - 2:15pm
  • Location: PHELPS 3526
  • TA Office Hours: TBA
  • Instructor Office Hours: Tu 2:30-3:30pm starting Oct 2nd at HFH 1115
  • Prerequisites:
    • Good programming skills and knowledge of data structure (e.g., CS 130A)
    • Basic understanding about automata and parsing (e.g., CS 138)
    • Advance knowledge in machine learning (CS 165B), artificial intelligence (CS 165A), linear algebra, probability, and calculus.

Course Objective

At the end of the quarter, students should have a good understanding about basic NLP tasks, and should be able to implement some fundamental algorithms for simple problems in NLP. Students will also develop an understanding of the open research problems in NLP.

Piazza

Please enroll if you haven't: piazza.com/ucsb/fall2018/cs190i291a

Syllabus

  • 09/25 Introduction & logistics, and NLP applications
  • 09/27 Basic text processing
  • 10/02 N-grams & language models & HW1 out
  • 10/04 Text classification: naive Bayes
  • 10/09 Voted perceptron and logistic regression
  • 10/11 Part-of-speech tagging: HMMs
  • 10/16 HMMs and MEMMs & HW1 due / HW2 out
  • 10/18 Conditional Random Fields
  • 10/23 In-class midterm exam
  • 10/25 Natural language parsing
  • 10/30 Guest lecture on Word sense disambiguation & HW2 due / HW3 out
  • 11/01 Guest lecture on Computational Social Science
  • 11/06 Distributional semantics (1): sparse representation
  • 11/08 Distributional semantics (2): dense representation
  • 11/13 Machine translation & HW3 due / HW4 out
  • 11/15 Question answering
  • 11/20 Deep learning for NLP: RNNs
  • 11/22 Thanksgiving: no class.
  • 11/27 Deep learning for NLP: CNNs
  • 11/29 Final project presentation group
  • 12/04 Course review and class evaluation & HW4 due
  • 12/06 In-class final exam

Course Description

Have you ever used intelligent virtual assistants such as Google Now, Apple Siri, Amazon Alexa or Microsoft Cortana? Wondering what are the technologies behind such systems? How did IBM's Watson beat top human Jeopardy players? Or you are just curious about how Google Translate works? Understanding human language is an important goal for Artificial Intelligence, and this course introduces fundamental theories and practical applications in Natural Language Processing (NLP). In particular, this course will focus on the design of basic machine learning algorithms (e.g., classification and structured prediction) for core NLP problems. The concentration of this course is on the mathematical, statistical and computational foundations for NLP.

Throughout the course, we will cover classic lexical, syntactic, and semantic processing topics in NLP, including language modeling, sentiment analysis, part-of-speech tagging, parsing, word sense disambiguation, distributional semantics, question answering, information extraction, and machine translation. The parallel theme on machine learning algorithms for NLP will focus on classic supervised learning, semi-supervised learning, and unsupervised learning models, including naive Bayes, logistic regression, hidden Markov models, maximum entropy Markov models, conditional random fields, feed-forward neural networks, recurrent neural networks, and convolutional neural networks. Throughout the course, we will study the implicit assumptions made in each of the machine learning models, and understand the pros and cons of these modern statistical tools for solving NLP problems. A key emphasis of this course is on empirical and statistical analysis of large text corpora, and distill useful structured knowledge from large collections of unstructured documents.

Text Book

No official text book is required for this class, but the following optional text book is recommended:

  • Speech and Language Processing (2nd ed.), Dan Jurafsky and James H. Martin.

The following website provides a free draft version of a new edition of this book. [1].

Grading

    • Undergraduates: There will be four homework assignments (40%), one mid-term exam (20%), and a final exam (40%). Four late days are allowed with no penalty. After that 50% will be deducted if it is within 4 days after the due day, unless you have a note from the doctors' office. Homework assignment submissions that are five days late will receive zero credits. Your grade can be found on GauchoSpace.
    • Graduates: There will be four homework assignments (40%), one mid-term exam (20%), a final exam (20%), and a project (20%). Four late days are allowed with no penalty. After that 50% will be deducted if it is within 4 days after the due day, unless you have a note from the doctors' office. Homework assignment submissions that are five days late will receive zero credits. Your grade can be found on GauchoSpace. For the project, you will need to submit a one-page project proposal to the TA via email by Oct 9th before class, a one-page midterm report to the TA via email by Oct 30th before class, and a four-page final report to the TA via email by Dec 8 23:59pm PT. You will need to be in one-person teams and present your project results for 10 mins (8 slides max + 2 mins for QA) in class in the first week of Dec.

Academic Integrity

We follow UCSB's academic integrity policy from UCSB Campus Regulations, Chapter VII:``Student Conduct and Discipline"):

  • It is expected that students attending the University of California understand and subscribe to the ideal of academic integrity, and are willing to bear individual responsibility for their work. Any work (written or otherwise) submitted to fulfill an academic requirement must represent a student’s original work. Any act of academic dishonesty, such as cheating or plagiarism, will subject a person to University disciplinary action. Using or attempting to use materials, information, study aids, or commercial “research” services not authorized by the instructor of the course constitutes cheating. Representing the words, ideas, or concepts of another person without appropriate attribution is plagiarism. Whenever another person’s written work is utilized, whether it be a single phrase or longer, quotation marks must be used and sources cited. Paraphrasing another’s work, i.e., borrowing the ideas or concepts and putting them into one’s “own” words, must also be acknowledged. Although a person’s state of mind and intention will be considered in determining the University response to an act of academic dishonesty, this in no way lessens the responsibility of the student.

More specifically, we follow Stefano Tessaro and William Cohen's policy in this class:

You cannot copy the code or answers to homework questions or exams from your classmates or from other sources; You may discuss course materials and assignments with your classmate, but you cannot write anything down. You must write down the answers / code independently. The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved, on the first page of their assignment. Specifically, each assignment solution must start by answering the following questions:

  • (1) Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
    • If you answered 'yes', give full details: (e.g. ``Jane explained to me what is asked in Question 3.4")
  • (2) Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
    • If you answered 'yes', give full details: (e.g. ``I pointed Joe to section 2.3 to help him with Question 2".
  • No electronics are allowed during exams, but you may prepare an A4-sized note and bring it the exam.
  • If you have questions, often ask the teaching staff.

Academic dishonesty will be reported to the highest line of command at UCSB. Students who engage in such activities will receive an F grade automatically.

Accessibility

Students with documented disability are asked to contact the DSP office to arrange the necessary academic accommodations.

Discussions

All discussions and questions should be posted on our course Piazza site.