Difference between revisions of "Winter 2018 CS291A Deep Learning for NLP"

Latest revision as of 19:31, 27 February 2018

Instructor: William Wang
Time: T R 1:00pm - 2:50pm
Location: PHELPS 2510
Reader: Ke Ni, ke00@ucsb.edu
Instructor Office Hours: Tu 3-4pm HFH 1115 starting 01/23.
Prerequisites:
- Machine Learning (CS 165B) or equivalent
- Good programming skills and knowledge of data structure (e.g., CS 130A)
- Solid background in machine learning, linear algebra, probability, and calculus.
- Comfortable with deep learning platforms such as TensorFlow, Torch, Theano, MXNet, Caffe etc.
- Prior experiences with AWS / Google Cloud is not required, but could be very useful.

In-class Presentation

01/25 Word embeddings
01/30 Neural network basics (Project proposal due to Grader: Ke Ni < ke00@ucsb.edu> , HW1 out)
02/01 Recursive Neural Networks
02/06 RNNs
02/08 LSTMs/GRUs
02/13 Sequence-to-sequence models and neural machine translation (HW1 due and HW2 out)
02/15 Attention mechanisms
02/20 Convolutional Neural Networks (Mid-term report due to Grader: Ke Ni <ke00@ucsb.edu>)
02/22 Language and vision
02/27 Deep Reinforcement Learning 1 (HW2 due)
03/01 Deep Reinforcement Learning 2
03/06 Unsupervised Learning

Course Objective

At the end of the quarter, students should have a good understanding about basic deep learning models, and should be able to implement some fundamental algorithms for simple problems in deep learning. Students will also develop an understanding of the open research problems in deep learning, and be able to conduct cutting-edge research with novel contributions to improve existing solutions.

Piazza

http://www.piazza.com/ucsb/winter2018/cs291a

Syllabus

Winter 2018 CS291A Syllabus

Course Description

Deep learning has revolutionized many subfields within AI. DeepMind's AlphaGo combined convolutional neural networks together with deep reinforcement learning and MCTS, and won many games against top human Go players. In computer vision, most of the leading systems in ImageNet competitions are based on deep neural networks. Deep learning has also changed the game in NLP: for example, Google has recently replaced their phrase-based machine translation system with neural machine translation system. Throughout the quarter, we will go over some of the basics in neural networks, and we will also go through the deep learning revolution after 2006. In this graduate class, we will also emphasize on the development of graduate student's paper reading and presentation abilities: each student will need to present research papers related to this topic. Last but not least, the most important aspect of this course is for students to work on a novel research project in open problems related to NLP and deep learning, and gain hands-on experiences of doing cutting-edge research.

Text Book

No textbook is required, but the following optional textbook is recommended:

Deep Learning, An MIT Press book, Ian Goodfellow and Yoshua Bengio and Aaron Courville
HTML version of the book: http://www.deeplearningbook.org/

Project

One key aspect of this class is to have students to gain hands-on experiences in open research problems. To do this, each student will need to propose a research project. The teaching staff will provide the feedback on the proposal, and track the progress of each student. Computing resources will be provided: each team will be provided with sufficient amount of Google Cloud credits for their projects.

In the project proposal, each team must clearly mention the following aspects of their project:

What is the motivation of the problem?
What is the exact definition of the problem? How do we formulate the problem in machine learning?
What are some existing approaches to this problem?
What are some existing datasets that you can work on?
What is the novelty in your project? New problem? New approach? New dataset?
How are you going to implement your approach and verify the idea?

Good places to look for project inspirations:

Recent papers from ACL, EMNLP, NAACL from ACL Anthology: http://aclweb.org/anthology/
Recent papers from ICML, NIPS, and ICLR conferences: http://jmlr.org/proceedings/ http://papers.nips.cc/

FAQ: Can I use my existing research projects / thesis research project as the project in this class? A: I would prefer students to get out of their comfort zone and try something new in this class. If you are using existing techniques from your existing project, it is unlikely that you will be able to learn anything new during the course project. However, you may still draw the inspiration from your research problem to formulate your class project.

Available Datasets

Wikipedia Harassment/Personal Attack Dataset https://figshare.com/projects/Wikipedia_Talk/16731 Ex Machina: Personal Attacks Seen at Scale, https://arxiv.org/abs/1610.08914
Stance Detection / Fake News Detection / Automated Fact-Checking, email William.
Deep learning for abstractive humor generation. Dataset: https://www.cs.ucsb.edu/~william/papers/meme.pdf
NELL Knowledge Graph http://rtw.ml.cmu.edu/
Relation Prediction / Reasoning FB15K-237 https://www.microsoft.com/en-us/download/details.aspx?id=52312
Abstractive summarization datasets https://www.aclweb.org/anthology/D/D15/D15-1044.pdf
WikiHow: learning processes from lists and free text https://github.com/paolo7/KnowHowDataset

Grading

There will be two homework assignments (20%), one project (65%), and an in-class paper presentation (15%). The in-class presentation includes 12 minutes presentation (12 slides max) and 3 minutes QA. The breakdown of project grading includes: 1-page proposal (10%), 2-page mid-term report (10%), final presentation (15%), and a final report (30%). Four late days are allowed with no penalty. After that 50% will be deducted if it is within 4 days after the due day, unless you have a note from the doctors' office. Homework assignment submissions that are five days late will receive zero credits.

Final Report Format

You must use the ICML 2018 latex style files for writing the report. The final report must be 3-5 pages long including references. It is encouraged to include the following components in your reports (not necessarily this order): abstract, introduction (motivation, task definition, your novel contributions), related work, your technical approach, such as math formulation of the problem, algorithms, theorems (if any), experiments, discussion, and conclusion.

Academic Integrity

We follow UCSB's academic integrity policy from UCSB Campus Regulations, Chapter VII:``Student Conduct and Discipline"):

It is expected that students attending the University of California understand and subscribe to the ideal of academic integrity, and are willing to bear individual responsibility for their work. Any work (written or otherwise) submitted to fulfill an academic requirement must represent a student’s original work. Any act of academic dishonesty, such as cheating or plagiarism, will subject a person to University disciplinary action. Using or attempting to use materials, information, study aids, or commercial “research” services not authorized by the instructor of the course constitutes cheating. Representing the words, ideas, or concepts of another person without appropriate attribution is plagiarism. Whenever another person’s written work is utilized, whether it be a single phrase or longer, quotation marks must be used and sources cited. Paraphrasing another’s work, i.e., borrowing the ideas or concepts and putting them into one’s “own” words, must also be acknowledged. Although a person’s state of mind and intention will be considered in determining the University response to an act of academic dishonesty, this in no way lessens the responsibility of the student.

More specifically, we follow Stefano Tessaro and William Cohen's policy in this class:

You cannot copy the code or answers to homework questions or exams from your classmates or from other sources; You may discuss course materials and assignments with your classmate, but you cannot write anything down. You must write down the answers / code independently. The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved, on the first page of their assignment. Specifically, each assignment solution must start by answering the following questions:

(1) Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
- If you answered 'yes', give full details: (e.g. ``Jane explained to me what is asked in Question 3.4")
(2) Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
- If you answered 'yes', give full details: (e.g. ``I pointed Joe to section 2.3 to help him with Question 2".
No electronics are allowed during exams, but you may prepare an A4-sized note and bring it the exam.
If you have questions, often ask the teaching staff.

Academic dishonesty will be reported to the highest line of command at UCSB. Students who engage in such activities will receive an F grade automatically.

Accessibility

Students with documented disability are asked to contact the DSP office to arrange the necessary academic accommodations.

Discussions

All discussions and questions should be posted on our course Piazza site.

@@ Line 12: / Line 12: @@
 ** Prior experiences with AWS / Google Cloud is not required, but could be very useful.
 ====In-class Presentation====
-*1/25
+*01/25 Word embeddings
-**Arya: GloVe: Global Vectors for Word Representation
+**Conner : [https://people.cs.umass.edu/~arvind/emnlp2014.pdf Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space, Neelakantan et al., EMNLP 2014]
-*1/30
+**Sanjana : [http://www.anthology.aclweb.org/D/D14/D14-1162.pdf Glove: Global Vectors for Word Representation, J Pennington, R Socher, CD Manning - EMNLP, 2014]
-**Vivek: "Dropout: A simple way to prevent neural networks from overfitting"
+**Wenhu : [http://www.aclweb.org/anthology/P15-1173 AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Rothe and Schutze, ACL 2015]
-**Dan: An overview of gradient descent optimization algorithms, Sebastian Ruder, Arxiv 2016
+*01/30 Neural network basics (Project proposal due to Grader: Ke Ni < ke00@ucsb.edu> , HW1 out)
-*2/6
+**Jashanvir : [http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf Learning representations by back-propagating errors, Nature, 1986]
-**Yifu: Recurrent neural network based language model
+**Metehan : [https://arxiv.org/abs/1609.04747 An overview of gradient descent optimization algorithms, Sebastian Ruder, Arxiv 2016]
-**Lukas: A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Ronald J. Williams and David Zipser, 1989
+**Vivek P.: [http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al., JMLR 2014]
-*2/13
+*02/01 Recursive Neural Networks
-**Yanju: Sequence to Sequence Learning with Neural Networks, Sutskever et al., NIPS 2014
+**April : [http://www.robotics.stanford.edu/~ang/papers/emnlp12-SemanticCompositionalityRecursiveMatrixVectorSpaces.pdf Semantic Compositionality through Recursive Matrix-Vector Spaces, Socher et al., EMNLP 2012]
-*2/15
+**Zhiyu : [https://nlp.stanford.edu/pubs/SocherBauerManningNg_ACL2013.pdf Parsing with Compositional Vector Grammars, Socher et al., ACL 2013]
-**Jing: Neural Machine Translation by Jointly Learning to Align and Translate
+**Andy : [https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Socher et al., EMNLP 2013]
-**Jiawei: Sequence to Sequence Learning with Neural Networks, Sutskever et al., NIPS 2014
+*02/06 RNNs
-*2/22
+**Lukas : [https://pdfs.semanticscholar.org/8adb/8257a423f55b1f20ba62c8b20118d76a25c7.pdf A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Ronald J. Williams and David Zipser, 1989]
-**Xiyou: Deep Visual-Semantic Alignments for Generating Image Descriptions, Andrej Karpathy and Li Fei-Fei, CVPR 2015
+**Yifu : [http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf Recurrent neural network based language model]
-*3/1
+**John : [https://arxiv.org/pdf/1308.0850.pdf Generating Sequences With Recurrent Neural Networks, Alex Graves, 2013 arxiv]
-**Chani: Mastering the game of Go with deep neural networks and tree search (2016), D. Silver et al., Nature
+*02/08 LSTMs/GRUs
-**Trevor: “Playing Atari with Deep Reinforcement Learning”
+**Liu : [http://www.bioinf.jku.at/publications/older/2604.pdf Long short term memory, S. Hochreiter and J. Schmidhuber, Neural Computation, 1997]
-*3/6
+**Nidhi : [https://arxiv.org/pdf/1409.1259.pdf On the Properties of Neural Machine Translation: Encoder–Decoder Approaches, Cho et al., 2014]
-**Hongmin: Generative Adversarial Nets, Goodfellow et al., NIPS 2014
+**Vivek A.: [https://arxiv.org/pdf/1502.02367v3.pdf Gated Feedback Recurrent Neural Networks, Chung et al., ICML 2015]
+*02/13 Sequence-to-sequence models and neural machine translation (HW1 due and HW2 out)
+**Ryan : [https://arxiv.org/pdf/1406.1078.pdf Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, Cho et al., EMNLP 2014]
+**Yanju : [https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Sequence to Sequence Learning with Neural Networks, Sutskever et al., NIPS 2014]
+**Karthik : [http://www.aclweb.org/anthology/P16-1100 Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Luong and Manning, ACL 2016]
+*02/15 Attention mechanisms
+**Jing : [https://arxiv.org/pdf/1409.0473.pdf NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE, Bahdanau et al., ICLR 2015]
+**Abhay : [https://arxiv.org/abs/1506.03340 Teaching Machines to Read and Comprehend, NIPS 2015]
+**Ashwini : [http://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf End-to-end memory networks, NIPS 2015]
+*02/20 Convolutional Neural Networks  (Mid-term report due to Grader: Ke Ni <ke00@ucsb.edu>)
+**Esther : [http://ronan.collobert.com/pub/matos/2011_nlp_jmlr.pdf Natural Language Processing (Almost) from Scratch, Collobert et al., JMLR 2011]
+**Maohua : [https://arxiv.org/pdf/1510.03820.pdf A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification, Zhang and Wallace, Arxiv 2015]
+**Jiawei : [http://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences Convolutional Neural Network Architectures for Matching Natural Language Sentences, Hu et al., NIPS 2014]
+*02/22 Language and vision
+**Sai : [https://arxiv.org/pdf/1411.4555.pdf Show and Tell: A Neural Image Caption Generator, CVPR 2015]
+**Xiyou : [http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Karpathy_Deep_Visual-Semantic_Alignments_2015_CVPR_paper.pdf Deep Visual-Semantic Alignments for Generating Image Descriptions, Andrej Karpathy and Li Fei-Fei, CVPR 2015]
+**Richika : [http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Zhu_Aligning_Books_and_ICCV_2015_paper.pdf Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, Zhu et al., ICCV 2015]
+*02/27 Deep Reinforcement Learning 1 (HW2 due)
+**Sharon : [https://aclweb.org/anthology/D16-1127, Deep Reinforcement Learning for Dialogue Generation, Li et al., EMNLP 2016]
+**David : [https://arxiv.org/abs/1603.07954 Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning, Narasimh et al., EMNLP 2016]
+**Michael : [http://www.aclweb.org/anthology/P16-1153 Deep Reinforcement Learning with a Natural Language Action Space, He et al., ACL 2016]
+*03/01 Deep Reinforcement Learning 2
+**Trevor : [https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf Playing Atari with Deep Reinforcement Learning, Mnih et al., NIPS workshop 2013]
+**Calvin : [https://arxiv.org/pdf/1509.02971.pdf Continuous control with deep reinforcement learning, Lillicrap et al, ICLR 2016]
+**Chani : [https://www.nature.com/articles/nature16961 Mastering the game of Go with deep neural networks and tree search (2016), D. Silver et al., Nature]
+*03/06 Unsupervised Learning
+**Hongmin : [http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf Generative Adversarial Nets, Goodfellow et al., NIPS 2014]
+**Burak : [https://arxiv.org/abs/1312.6114 Auto-encoding variational Bayes, Kingma and Welling, ICLR 2014]
+**Pushkar : [https://arxiv.org/pdf/1511.06434.pdf%C3%AF%C2%BC%E2%80%B0 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Redford et al., 2015]
+**Liu : [http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf Semi-supervised Sequence Learning, Dai et al., NIPS 2015]
 ====Course Objective====

Difference between revisions of "Winter 2018 CS291A Deep Learning for NLP"

Latest revision as of 19:31, 27 February 2018

Contents

Instructor and Venue

In-class Presentation

Course Objective

Piazza

Syllabus

Course Description

Text Book

Project

Available Datasets

Grading

Final Report Format

Academic Integrity

Accessibility

Discussions

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools