Difference between revisions of "Winter 2018 CS291A Deep Learning for NLP"
(→In-class Presentation) |
(→In-class Presentation) |
||
(20 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
====In-class Presentation==== | ====In-class Presentation==== | ||
*01/25 Word embeddings | *01/25 Word embeddings | ||
− | ** : [https://people.cs.umass.edu/~arvind/emnlp2014.pdf Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space, Neelakantan et al., EMNLP 2014] | + | **Conner : [https://people.cs.umass.edu/~arvind/emnlp2014.pdf Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space, Neelakantan et al., EMNLP 2014] |
− | ** | + | **Sanjana : [http://www.anthology.aclweb.org/D/D14/D14-1162.pdf Glove: Global Vectors for Word Representation, J Pennington, R Socher, CD Manning - EMNLP, 2014] |
− | ** : [http://www.aclweb.org/anthology/P15-1173 AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Rothe and Schutze, ACL 2015] | + | **Wenhu : [http://www.aclweb.org/anthology/P15-1173 AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Rothe and Schutze, ACL 2015] |
*01/30 Neural network basics (Project proposal due to Grader: Ke Ni < ke00@ucsb.edu> , HW1 out) | *01/30 Neural network basics (Project proposal due to Grader: Ke Ni < ke00@ucsb.edu> , HW1 out) | ||
− | ** : [http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf Learning representations by back-propagating errors, Nature, 1986] | + | **Jashanvir : [http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf Learning representations by back-propagating errors, Nature, 1986] |
− | ** | + | **Metehan : [https://arxiv.org/abs/1609.04747 An overview of gradient descent optimization algorithms, Sebastian Ruder, Arxiv 2016] |
− | **Vivek : [http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al., JMLR 2014] | + | **Vivek P.: [http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al., JMLR 2014] |
*02/01 Recursive Neural Networks | *02/01 Recursive Neural Networks | ||
− | ** : [http://www.robotics.stanford.edu/~ang/papers/emnlp12-SemanticCompositionalityRecursiveMatrixVectorSpaces.pdf Semantic Compositionality through Recursive Matrix-Vector Spaces, Socher et al., EMNLP 2012] | + | **April : [http://www.robotics.stanford.edu/~ang/papers/emnlp12-SemanticCompositionalityRecursiveMatrixVectorSpaces.pdf Semantic Compositionality through Recursive Matrix-Vector Spaces, Socher et al., EMNLP 2012] |
− | ** : [https://nlp.stanford.edu/pubs/SocherBauerManningNg_ACL2013.pdf Parsing with Compositional Vector Grammars, Socher et al., ACL 2013] | + | **Zhiyu : [https://nlp.stanford.edu/pubs/SocherBauerManningNg_ACL2013.pdf Parsing with Compositional Vector Grammars, Socher et al., ACL 2013] |
− | ** : [https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Socher et al., EMNLP 2013] | + | **Andy : [https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Socher et al., EMNLP 2013] |
*02/06 RNNs | *02/06 RNNs | ||
**Lukas : [https://pdfs.semanticscholar.org/8adb/8257a423f55b1f20ba62c8b20118d76a25c7.pdf A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Ronald J. Williams and David Zipser, 1989] | **Lukas : [https://pdfs.semanticscholar.org/8adb/8257a423f55b1f20ba62c8b20118d76a25c7.pdf A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Ronald J. Williams and David Zipser, 1989] | ||
Line 30: | Line 30: | ||
*02/08 LSTMs/GRUs | *02/08 LSTMs/GRUs | ||
**Liu : [http://www.bioinf.jku.at/publications/older/2604.pdf Long short term memory, S. Hochreiter and J. Schmidhuber, Neural Computation, 1997] | **Liu : [http://www.bioinf.jku.at/publications/older/2604.pdf Long short term memory, S. Hochreiter and J. Schmidhuber, Neural Computation, 1997] | ||
− | ** | + | **Nidhi : [https://arxiv.org/pdf/1409.1259.pdf On the Properties of Neural Machine Translation: Encoder–Decoder Approaches, Cho et al., 2014] |
− | ** : [https://arxiv.org/pdf/1502.02367v3.pdf Gated Feedback Recurrent Neural Networks, Chung et al., ICML 2015] | + | **Vivek A.: [https://arxiv.org/pdf/1502.02367v3.pdf Gated Feedback Recurrent Neural Networks, Chung et al., ICML 2015] |
*02/13 Sequence-to-sequence models and neural machine translation (HW1 due and HW2 out) | *02/13 Sequence-to-sequence models and neural machine translation (HW1 due and HW2 out) | ||
**Ryan : [https://arxiv.org/pdf/1406.1078.pdf Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, Cho et al., EMNLP 2014] | **Ryan : [https://arxiv.org/pdf/1406.1078.pdf Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, Cho et al., EMNLP 2014] | ||
**Yanju : [https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Sequence to Sequence Learning with Neural Networks, Sutskever et al., NIPS 2014] | **Yanju : [https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Sequence to Sequence Learning with Neural Networks, Sutskever et al., NIPS 2014] | ||
− | ** : [http://www.aclweb.org/anthology/P16-1100 Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Luong and Manning, ACL 2016] | + | **Karthik : [http://www.aclweb.org/anthology/P16-1100 Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Luong and Manning, ACL 2016] |
*02/15 Attention mechanisms | *02/15 Attention mechanisms | ||
**Jing : [https://arxiv.org/pdf/1409.0473.pdf NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE, Bahdanau et al., ICLR 2015] | **Jing : [https://arxiv.org/pdf/1409.0473.pdf NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE, Bahdanau et al., ICLR 2015] | ||
Line 42: | Line 42: | ||
*02/20 Convolutional Neural Networks (Mid-term report due to Grader: Ke Ni <ke00@ucsb.edu>) | *02/20 Convolutional Neural Networks (Mid-term report due to Grader: Ke Ni <ke00@ucsb.edu>) | ||
**Esther : [http://ronan.collobert.com/pub/matos/2011_nlp_jmlr.pdf Natural Language Processing (Almost) from Scratch, Collobert et al., JMLR 2011] | **Esther : [http://ronan.collobert.com/pub/matos/2011_nlp_jmlr.pdf Natural Language Processing (Almost) from Scratch, Collobert et al., JMLR 2011] | ||
− | ** | + | **Maohua : [https://arxiv.org/pdf/1510.03820.pdf A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification, Zhang and Wallace, Arxiv 2015] |
**Jiawei : [http://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences Convolutional Neural Network Architectures for Matching Natural Language Sentences, Hu et al., NIPS 2014] | **Jiawei : [http://papers.nips.cc/paper/5550-convolutional-neural-network-architectures-for-matching-natural-language-sentences Convolutional Neural Network Architectures for Matching Natural Language Sentences, Hu et al., NIPS 2014] | ||
*02/22 Language and vision | *02/22 Language and vision | ||
Line 49: | Line 49: | ||
**Richika : [http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Zhu_Aligning_Books_and_ICCV_2015_paper.pdf Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, Zhu et al., ICCV 2015] | **Richika : [http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Zhu_Aligning_Books_and_ICCV_2015_paper.pdf Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, Zhu et al., ICCV 2015] | ||
*02/27 Deep Reinforcement Learning 1 (HW2 due) | *02/27 Deep Reinforcement Learning 1 (HW2 due) | ||
− | ** | + | **Sharon : [https://aclweb.org/anthology/D16-1127, Deep Reinforcement Learning for Dialogue Generation, Li et al., EMNLP 2016] |
**David : [https://arxiv.org/abs/1603.07954 Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning, Narasimh et al., EMNLP 2016] | **David : [https://arxiv.org/abs/1603.07954 Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning, Narasimh et al., EMNLP 2016] | ||
**Michael : [http://www.aclweb.org/anthology/P16-1153 Deep Reinforcement Learning with a Natural Language Action Space, He et al., ACL 2016] | **Michael : [http://www.aclweb.org/anthology/P16-1153 Deep Reinforcement Learning with a Natural Language Action Space, He et al., ACL 2016] | ||
Line 58: | Line 58: | ||
*03/06 Unsupervised Learning | *03/06 Unsupervised Learning | ||
**Hongmin : [http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf Generative Adversarial Nets, Goodfellow et al., NIPS 2014] | **Hongmin : [http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf Generative Adversarial Nets, Goodfellow et al., NIPS 2014] | ||
− | ** : [https://arxiv.org/abs/1312.6114 Auto-encoding variational Bayes, Kingma and Welling, ICLR 2014] | + | **Burak : [https://arxiv.org/abs/1312.6114 Auto-encoding variational Bayes, Kingma and Welling, ICLR 2014] |
**Pushkar : [https://arxiv.org/pdf/1511.06434.pdf%C3%AF%C2%BC%E2%80%B0 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Redford et al., 2015] | **Pushkar : [https://arxiv.org/pdf/1511.06434.pdf%C3%AF%C2%BC%E2%80%B0 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Redford et al., 2015] | ||
+ | **Liu : [http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf Semi-supervised Sequence Learning, Dai et al., NIPS 2015] | ||
====Course Objective==== | ====Course Objective==== |
Latest revision as of 18:31, 27 February 2018
Contents
Instructor and Venue
- Instructor: William Wang
- Time: T R 1:00pm - 2:50pm
- Location: PHELPS 2510
- Reader: Ke Ni, ke00@ucsb.edu
- Instructor Office Hours: Tu 3-4pm HFH 1115 starting 01/23.
- Prerequisites:
- Machine Learning (CS 165B) or equivalent
- Good programming skills and knowledge of data structure (e.g., CS 130A)
- Solid background in machine learning, linear algebra, probability, and calculus.
- Comfortable with deep learning platforms such as TensorFlow, Torch, Theano, MXNet, Caffe etc.
- Prior experiences with AWS / Google Cloud is not required, but could be very useful.
In-class Presentation
- 01/25 Word embeddings
- Conner : Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space, Neelakantan et al., EMNLP 2014
- Sanjana : Glove: Global Vectors for Word Representation, J Pennington, R Socher, CD Manning - EMNLP, 2014
- Wenhu : AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Rothe and Schutze, ACL 2015
- 01/30 Neural network basics (Project proposal due to Grader: Ke Ni < ke00@ucsb.edu> , HW1 out)
- 02/01 Recursive Neural Networks
- 02/06 RNNs
- 02/08 LSTMs/GRUs
- 02/13 Sequence-to-sequence models and neural machine translation (HW1 due and HW2 out)
- Ryan : Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, Cho et al., EMNLP 2014
- Yanju : Sequence to Sequence Learning with Neural Networks, Sutskever et al., NIPS 2014
- Karthik : Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Luong and Manning, ACL 2016
- 02/15 Attention mechanisms
- 02/20 Convolutional Neural Networks (Mid-term report due to Grader: Ke Ni <ke00@ucsb.edu>)
- Esther : Natural Language Processing (Almost) from Scratch, Collobert et al., JMLR 2011
- Maohua : A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification, Zhang and Wallace, Arxiv 2015
- Jiawei : Convolutional Neural Network Architectures for Matching Natural Language Sentences, Hu et al., NIPS 2014
- 02/22 Language and vision
- Sai : Show and Tell: A Neural Image Caption Generator, CVPR 2015
- Xiyou : Deep Visual-Semantic Alignments for Generating Image Descriptions, Andrej Karpathy and Li Fei-Fei, CVPR 2015
- Richika : Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, Zhu et al., ICCV 2015
- 02/27 Deep Reinforcement Learning 1 (HW2 due)
- Sharon : Deep Reinforcement Learning for Dialogue Generation, Li et al., EMNLP 2016
- David : Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning, Narasimh et al., EMNLP 2016
- Michael : Deep Reinforcement Learning with a Natural Language Action Space, He et al., ACL 2016
- 03/01 Deep Reinforcement Learning 2
- 03/06 Unsupervised Learning
- Hongmin : Generative Adversarial Nets, Goodfellow et al., NIPS 2014
- Burak : Auto-encoding variational Bayes, Kingma and Welling, ICLR 2014
- Pushkar : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Redford et al., 2015
- Liu : Semi-supervised Sequence Learning, Dai et al., NIPS 2015
Course Objective
At the end of the quarter, students should have a good understanding about basic deep learning models, and should be able to implement some fundamental algorithms for simple problems in deep learning. Students will also develop an understanding of the open research problems in deep learning, and be able to conduct cutting-edge research with novel contributions to improve existing solutions.
Piazza
http://www.piazza.com/ucsb/winter2018/cs291a
Syllabus
Course Description
Deep learning has revolutionized many subfields within AI. DeepMind's AlphaGo combined convolutional neural networks together with deep reinforcement learning and MCTS, and won many games against top human Go players. In computer vision, most of the leading systems in ImageNet competitions are based on deep neural networks. Deep learning has also changed the game in NLP: for example, Google has recently replaced their phrase-based machine translation system with neural machine translation system. Throughout the quarter, we will go over some of the basics in neural networks, and we will also go through the deep learning revolution after 2006. In this graduate class, we will also emphasize on the development of graduate student's paper reading and presentation abilities: each student will need to present research papers related to this topic. Last but not least, the most important aspect of this course is for students to work on a novel research project in open problems related to NLP and deep learning, and gain hands-on experiences of doing cutting-edge research.
Text Book
No textbook is required, but the following optional textbook is recommended:
- Deep Learning, An MIT Press book, Ian Goodfellow and Yoshua Bengio and Aaron Courville
- HTML version of the book: http://www.deeplearningbook.org/
Project
One key aspect of this class is to have students to gain hands-on experiences in open research problems. To do this, each student will need to propose a research project. The teaching staff will provide the feedback on the proposal, and track the progress of each student. Computing resources will be provided: each team will be provided with sufficient amount of Google Cloud credits for their projects.
In the project proposal, each team must clearly mention the following aspects of their project:
- What is the motivation of the problem?
- What is the exact definition of the problem? How do we formulate the problem in machine learning?
- What are some existing approaches to this problem?
- What are some existing datasets that you can work on?
- What is the novelty in your project? New problem? New approach? New dataset?
- How are you going to implement your approach and verify the idea?
Good places to look for project inspirations:
- Recent papers from ACL, EMNLP, NAACL from ACL Anthology: http://aclweb.org/anthology/
- Recent papers from ICML, NIPS, and ICLR conferences: http://jmlr.org/proceedings/ http://papers.nips.cc/
FAQ: Can I use my existing research projects / thesis research project as the project in this class? A: I would prefer students to get out of their comfort zone and try something new in this class. If you are using existing techniques from your existing project, it is unlikely that you will be able to learn anything new during the course project. However, you may still draw the inspiration from your research problem to formulate your class project.
Available Datasets
- Wikipedia Harassment/Personal Attack Dataset https://figshare.com/projects/Wikipedia_Talk/16731 Ex Machina: Personal Attacks Seen at Scale, https://arxiv.org/abs/1610.08914
- Stance Detection / Fake News Detection / Automated Fact-Checking, email William.
- Deep learning for abstractive humor generation. Dataset: https://www.cs.ucsb.edu/~william/papers/meme.pdf
- NELL Knowledge Graph http://rtw.ml.cmu.edu/
- Relation Prediction / Reasoning FB15K-237 https://www.microsoft.com/en-us/download/details.aspx?id=52312
- Abstractive summarization datasets https://www.aclweb.org/anthology/D/D15/D15-1044.pdf
- WikiHow: learning processes from lists and free text https://github.com/paolo7/KnowHowDataset
Grading
There will be two homework assignments (20%), one project (65%), and an in-class paper presentation (15%). The in-class presentation includes 12 minutes presentation (12 slides max) and 3 minutes QA. The breakdown of project grading includes: 1-page proposal (10%), 2-page mid-term report (10%), final presentation (15%), and a final report (30%). Four late days are allowed with no penalty. After that 50% will be deducted if it is within 4 days after the due day, unless you have a note from the doctors' office. Homework assignment submissions that are five days late will receive zero credits.
Final Report Format
You must use the ICML 2018 latex style files for writing the report. The final report must be 3-5 pages long including references. It is encouraged to include the following components in your reports (not necessarily this order): abstract, introduction (motivation, task definition, your novel contributions), related work, your technical approach, such as math formulation of the problem, algorithms, theorems (if any), experiments, discussion, and conclusion.
Academic Integrity
We follow UCSB's academic integrity policy from UCSB Campus Regulations, Chapter VII:``Student Conduct and Discipline"):
- It is expected that students attending the University of California understand and subscribe to the ideal of academic integrity, and are willing to bear individual responsibility for their work. Any work (written or otherwise) submitted to fulfill an academic requirement must represent a student’s original work. Any act of academic dishonesty, such as cheating or plagiarism, will subject a person to University disciplinary action. Using or attempting to use materials, information, study aids, or commercial “research” services not authorized by the instructor of the course constitutes cheating. Representing the words, ideas, or concepts of another person without appropriate attribution is plagiarism. Whenever another person’s written work is utilized, whether it be a single phrase or longer, quotation marks must be used and sources cited. Paraphrasing another’s work, i.e., borrowing the ideas or concepts and putting them into one’s “own” words, must also be acknowledged. Although a person’s state of mind and intention will be considered in determining the University response to an act of academic dishonesty, this in no way lessens the responsibility of the student.
More specifically, we follow Stefano Tessaro and William Cohen's policy in this class:
You cannot copy the code or answers to homework questions or exams from your classmates or from other sources; You may discuss course materials and assignments with your classmate, but you cannot write anything down. You must write down the answers / code independently. The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved, on the first page of their assignment. Specifically, each assignment solution must start by answering the following questions:
- (1) Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
- If you answered 'yes', give full details: (e.g. ``Jane explained to me what is asked in Question 3.4")
- (2) Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
- If you answered 'yes', give full details: (e.g. ``I pointed Joe to section 2.3 to help him with Question 2".
- No electronics are allowed during exams, but you may prepare an A4-sized note and bring it the exam.
- If you have questions, often ask the teaching staff.
Academic dishonesty will be reported to the highest line of command at UCSB. Students who engage in such activities will receive an F grade automatically.
Accessibility
Students with documented disability are asked to contact the DSP office to arrange the necessary academic accommodations.
Discussions
All discussions and questions should be posted on our course Piazza site.