WeSSLLI 2020
Deep Learning and the Nature of Linguistic
Representation
July 13-17, 2020
Shalom Lappin
University of Gothenburg, Queen Mary University of
London, and
King’s College London
All class times are
Boston EST
Class 1: Introduction to Deep Learning in NLP
July 13, 11:00am-12:20pm
1.
Machine
learning as a source of cognitive insights
2.
Basic
Elements of deep learning
3.
Types
of deep neural networks
4.
An
example application of deep learning to NLP
References
Dzmitry
Bahdanau, Kyunghyun Cho, and Yoshua Bengio (2015), “Neural
Machine
Translation by Jointly Learning to Align and Translate”, Proccedings of ICLR 2015.
Yuri Bizzoni
and Shalom Lappin (2017), “Deep Learning of Binary
and Gradient Judgements
for Semantic Paraphrase”, Proceedings
of the International Workshop on Computational
Semantics
2017, Montpellier, France, September 2017.
Alexander
Clark and Shalom Lappin (2011), Linguistic
Nativism and the Poverty of the
Stimulus, Wiley-Blackwell, Oxford.
Jacob Devlin, Ming-Wei
Chang, Kenton Lee, and Kristina Toutanova (2019), “BERT: Pre-
training of Deep Bidirectional Transformers
for Language Understanding”, Proceedings
of NAACL-HLT 2019, Minneapolis MN, pp.
4171–4186.
Jeffrey L. Elman (1990), “Finding Structure in Time”, Cognitive Science
14, pp. 179-211.
Jeffrey Pennington, Richard Socher,
and Christopher D. Manning (2014), “GloVe: Global
Vectors for Word Representation”, Proceedings of EMNLP
2014, pp. 1542-1543.
Ian Goodfellow (2016), Yoshua Bengio, and Aaron
Courville, Deep Learning, MIT Press,
Cambridge MA.
Sepp Hochreiter and Jürgen Schmidhuber
(1997), “Long Short Term Memory”, Neural
Computation 9(8), pp. 1735-80
Shalom Lappin, and Stuart Shieber
(2007), “Machine
Learning Theory and Practice as a Source
of Insight into Universal Grammar”, Journal of Linguistics 43(2),
pp. 393-427.
Tomaš Mikolov, Ilya Sutskever, Kai
Chen, Greg Corrado, and Jeffrey Dean (2013),
“Distributed
Representations of Words and Phrases and their Compositionality”,
Proceedings of NIPS 2013, Lake
Tahoe, Nevada.
Alec
Radford, Karthik Narasimha, Tim Salimans, and Ilya Sutskever (2018), “Improving
Language Understanding by Generative
Pre-Training”,
Open AI.
David E. Rumelhart,
James L. McClelland and PDP Research Group (1986), Parallel
Distributed Processing, Volume 1, MIT Press, Cambridge MA.
Ashish
Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,
and Łukasz Kaiser (2017), “Attention Is All You Need”, Proceedings of
NIPS 2017,
Long Beach CA.
Class 2: Learning
Syntactic Properties with Deep Neural Networks
July 14, 11:00am-12:20pm
1.
Using
DNNs to identify subject-verb agreement
2.
Experimenting
with DNN architecture and parameters
3.
DNNs
and hierarchical structure
4. Deep learning of tree structures
References
Marco Baroni, Silvia
Bernardini, Adriano Ferraresi, and Eros Zanchetta (2009), “The
WaCky
Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora”,
Language
Resources and Evaluation 43, 209–226.
Jean-Philippe Bernardy and
Shalom Lappin (2017), “Using
Deep Neural Networks to Learn
Syntactic Agreement”, Linguistic Issues in Language Technology 15.2, pp. 1-15.
Idan Blank,
Zuzanna Balewski, Kyle Mahowald, and Ev Fedorenko (2016) ”Syntactic
Processing is Distributed across the Language System”, NeuroImage 127, pp. 307–323.
Samuel R.
Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning (2015),
“A Large Annotated Corpus
for Learning Natural Language Inference”, Proceedings of the
2015 Conference
on Empirical Methods in Natural Language Processing (EMNLP), pp.
632–642.
Samuel
R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning,
and
Christopher Potts (2016), “A Fast Unified Model
for Parsing and Sentence
Understanding”, Proceedings
of the 54th Annual Meeting of the Association for
Computational
Linguistics (ACL), Volume 1: Long Papers, pages 1466–1477.
Kyunghyun,
Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi
Bougares, Holger Schwenk, and Yoshua Bengio (2014), “Learning Phrase Representations
Using RNN Encoder–Decoder for Statistical
Machine Translation”, Proceedings of the
2014 Conference
on Empirical Methods in Natural Language Processing (EMNLP), Doha,
Qatar,
pp. 1724–1734. Association for Computational Linguistics.
Jihun
Choi, Kang Min Yoo, and Sang-goo Lee (2018), “Learning
to Compose Task-Specific
Tree Structures”, Proceedings
of the Thirty-Second Association for the Advancement of
Artificial
Intelligence Conference on Artificial Intelligence (AAAI-18), volume
2.
Chris
Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith (2016), “Recurrent
Neural Network Grammars”, Proceedings of NAACL-HTL 2016, San Diego, CA,
pp. 199-209.
Yoav Goldberg (2019), “Assessing BERT's Syntactic
Abilities”, ArXiv 1901.05287.
Kristina Gulordava, Piotr
Bojanowski, Edouard Grave, Tal Linzen, and Marco Baroni (2018),
“Colorless Green Recurrent Networks Dream
Hierarchically” in Proceedings NAACL-HLT
2018, New Orleans LA, pp.
1195–1205.
Amulya Gupta and Zhu Zhang
(2018), “To Attend or
Not to Attend: A Case Study on
Syntactic Structures for Semantic Relatedness”, Proceedings of ACL 2018, Melbourne,
Australia , pp. 2116–2125.
John Hewitt and Christopher D. Manning (2019), “A Structural Probe for
Finding Syntax in
Word Representations” in Proceedings of
NAACL-HIT 2019, Minneapolis MN, pp. 4129–
4138.
Rafal
Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu (2016),
“Exploring the Limits of Language Modeling”, arXiv preprint arXiv:1602.02410.
Adhiguna
Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, and Phil
Blunsom
(2018), ”LSTMs Can Learn
Syntax-Sensitive Dependencies Well, But Modeling
Structure
Makes Them Better”, Proceedings of the ACL 2018, Melbourne,
Australia,
pp. 1426–1436.
Adhiguna Kuncoro, Chris Dyer, Laura Rimell,
Stephen Clark, and Phil Blunsom (2019), Scalable
“Syntax-Aware Language Models
Using Knowledge Distillation”, Proceedings of the ACL 2019,
Florence, Italy, pp. 3472–3484.
Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg (2016), “Assessing
the Ability of
LSTMs to Learn Syntax-Sensitive
Dependencies”, Transactions of the Association for
Computational Linguistics, 4, pp. 521–535.
Jean
Maillard, Stephen Clark, and Dani Yogatama (2017), “Jointly
Learning Sentence
Embeddings and Syntax with Unsupervised Tree-LSTMs”, arXiv preprint 1705.09189.
Mitchell
P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor (1999),
Treebank3,
LDC99T42. Linguistic Data Consortium.
Marie-Catherine
de Marneffe, Bill MacCartney, and Christopher D. Manning (2006),
“Generating
Typed Dependency Parses from Phrase Structure Parses”, Proceedings of
LREC 2006, Genoa, Italy.
Rebecca Marvin and Tal Linzen
(2018), “Targeted Syntactic
Evaluation of Language Models”,
Proceedings
of the 2018 Conference on Empirical Methods in Natural Language Processing,
Brussels, Belgium, pp. 1192–1202.
Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, Gülsen Eryigit,
Sandra Kübler,
Svetoslav Marinov,
and Erwin Marsi, (2007), “Maltparser: A Language-Independent
System for Data-Driven Dependency Parsing”, Natural Language Engineering, 95–135.
Helmut Schmid
(1995), “Improvements
in Part-of-Speech Tagging with an Application to
German”, Proceedings of the ACL SIGDAT-Workshop, Association for Computational
Linguistics.
Richard Socher,
Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D.
Manning (2011), “Semi-Supervised Recursive Autoencoders for Predicting Sentiment
Distributions”, Proceedings
of the 2011 Conference on Empirical Methods in Natural
Language
Processing (EMNLP), pp. 151–161.
Adina
Williams, Nikita Nangia, and Samuel R. Bowman (2018), “A Broad-Coverage
Challenge Corpus for Sentence Understanding
Through Inference”, Proceedings of the
2018
Conference of the North American Chapter of the Association for Computational
Linguistics:
Human Language Technologies (NAACL-HLT).
Adina Williams, A. Drozdov
and Samuel R Bowman (2018a), “Do Latent Tree Learning
Models Identify Meaningful Structure in
Sentences?” in Transactions of the Association of
Computational Linguistics 6, pp. 253–267.
Dani
Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, and Wang Ling (2017),
“Learning to Compose Words into Sentences with Reinforcement Learning”, Proceedings
of the International
Conference on Learning Representations (ICLR).
Class 3: Machine
learning and the Sentence Acceptability Task
July 15, 11:00am-12:20pm
1.
Gradience in sentence acceptability
2.
Predicting
acceptability with ML models
3.
Adding
tags and trees
References
David
Adger (2003), Core Syntax: A
Minimalist Aproach, Oxford University Press, Oxford.
Alexander Clark and Shalom
Lappin (2011), Linguistic
Nativism and the Poverty of the
Stimulus, Wiley-Blackwell, Oxford.
Denqi
Chen and Christopher Manning (2014), “A Fast and Accurate
Dependency Parser
using Neural Networks”, EMNLP 2014, Doha,
Qatar, pp. 740–750.
Adam Ek, Jean-Philippe
Bernard, and Shalom Lappin (2019), “Language Modeling with
Syntactic and Semantic Representation for Sentence Acceptability Predictions” in
Proceedings
of NoDaLiDa 2019, Turku, Finland, pp. 76-85.
Dan Klein and Christopher Manning
(2003a), “Accurate Unlexicalized Parsing”, Proceedings
of the
41st annual meeting of the Association for Computational Linguistics (ACL 2003,
Sapporo,
Japan, pp.
423–430.
Dan Klein and Chris Manning
(2003b), “Fast
Exact Inference with A Factored Model for
Natural Language Parsing”, Advances
in neural information processing systems 15 (NIPS-
03), Whistler,
Canada, pp. 3–10.
Jey Han Lau, Alexander
Clark, and Shalom Lappin (2014). “Measuring Gradience in
Speakers Grammaticality Judgements”, Proceedings of
the Annual Meeting of the
Cognitive
Science Society 36, Quebec City QC.
Jey Han Lau, Alexander
Clark, and Shalom Lappin (2015), “Unsupervised Prediction of
Acceptability Judgements” in Proceedings of ACL 2015, Beijing, China, pp. 1618–1628.
Jey Han Lau, Alexander
Clark, and Shalom Lappin (2017), “Grammaticality,
Acceptability,
and Probability: A Probabilistic View of Linguistic Knowledge”, Cognitive Science 41(5),
pp.
1202–1241.
Tomàš Mikolov
(2012). Statistical
Language Models Based on Neural Networks,
Unpublished doctoral dissertation, Brno University of
Technology.
T. Mikolov, S. Kombrink, A. Deoras, L. Burget, and J. Eernocky (2011), “RNNLM-
Recurrent Neural Network Language Modeling Toolkit”, IEEE Automatic
Speech
Recognition
and Understanding
Workshop, Big
Island, Hawaii.
Joakim
Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič,
Christopher D. Manning, Ryan McDonald, Slav
Petrov, Sampo Pyysalo, Natalia Silveira,
Reut Tsarfaty, Daniel Zeman (2016), “Universal Dependencies v1: A Multilingual
Treebank Collection”, LREC.
Joakim
Nivre, Agić Željko, Lars Ahrenberg, and et al (2017), Universal
dependencies 2.0,
LINDAT/CLARIN
digital library at the Institute of Formal and Applied Linguistics
(UFAL),
Faculty of ´ Mathematics and Physics, Charles University, Prague, The Czech
Republic.
Adam
Pauls and Dan
Klein (2012), “Large-Scale Syntactic
Language Modeling with
Treelets”, Proceedings of the 50th Annual Meeting of the Association
for Computational
Linguistics, Jeju,
Korea, pp. 959–968.
David
Vilares and Carlos Gómez-Rodríguez (2018), “A Transition-based
Algorithm for
Unrestricted AMR Parsing”, Proceedings of NAACL-HLT 2018, New Orleans, Louisiana
pp. s
142–149.
Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman, (2019), “Neural
Network
Acceptability Judgments”, Transactions of the Association for Computational
Linguistics 7, pp. 625–641.
Class 4: Predicting
Human Acceptability Judgments in Context
July 16, 11:00am-12:20pm
1.
Judgments
in context
2.
Two
sets of experiments
3.
The
compression effect and discourse coherence
4.
Predicting
acceptability with different DNNs
References
Jean-Philippe Bernardy,
Shalom Lappin, and Jey Han Lau (2018), “The Influence of Context
on Sentence Acceptability Judgements” in Proceedings of ACL 2018, Melbourne Australia.
Yuri Bizzoni and Shalom
Lappin (2019), “Predicting Metaphor
Paraphrase Judgements in
Context” Proceedings of the
13th International Conference on Computational Semantics,
Gothenburg,
Sweden, pp.165-175.
Mickaël
Causse, Vsevolod Peysakhovich, and Eve F. Fabre (2016), “High
Working Memory
Load Impairs Language Processing During a
Simulated Piloting Task: An ERP and
Pupillometry Study”, Frontiers in Human Neuroscience 10.
Jacob Devlin, Ming-Wei
Chang, Kenton Lee, and Kristina Toutanova (2019), “BERT: Pre-
training of Deep Bidirectional Transformers
for Language Understanding”, Proceedings
of NAACL-HLT 2019, Minneapolis MN, pp.
4171–4186.
Felix
Hill, Roi Reichart, and Anna Korhonen (2015), “SimLex-999: Evaluating Semantic
Models with (Genuine) Similarity Estimation”, Computational Linguistics 4, pp. 665–695.
Aine
Ito, Martin Corley, and Martin J. Pickering (2018), “A
Cognitive Load Delays
Predictive Eye Movements Similarly During
L1 and L2 Comprehension”, Bilingualism:
Language and Cognition 21(2), pp. 251-264.
Philipp
Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico,
Nicola Bertoldi,
Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer,
Ondrej
Bojar, Alexandra Constantin, and Evan Herbst (2007), “Moses: Open Source
Toolkit for Statistical Machine Translation”, Proceedings
of the 45th Annual Meeting of the
Association
for Computational Linguistics. Prague, Czech Republic, pp.
177–180.
Jey Han
Lau, Timothy Baldwin, and Trevor Cohn (2017), “Topically Driven Neural
Language Model”, Proceedings
of the 55th Annual Meeting of the Association for
Computational
Linguistics, Vancouver,
Canada, pp. 355–365.
Jey Han Lau, Carlos
Armendariz, Shalom Lappin, Matthew Purver, and Chang Shu
(2020),
“How
Furiously Can Green Ideas Sleep: Sentence Acceptability in Context”, Transactions
of the Association for
Computational Linguistics 8, pp. 296-310.
Hyangsook
Park, Jun-Su Kang, Sungmook Choi, and Minho Lee (2013), “Analysis
of
Cognitive Load for Language Processing
Based on Brain Activities”, Neural Information
Processing,
Berlin, Heidelberg. Springer Berlin Heidelberg, pp. 561–568.
Alec Radford,
Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever (2019),
“Language
Models are Unsupervised Multitask Learners”, OpenAI.
John
Sweller (1988), “Cognitive
Load During Problem Solving: Effects on Learning”,
Cognitive
Science, 12(2), pp. 257–285.
Zhilin
Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and
Quoc
V. Le (2019), “XLNet:
Generalized Autoregressive
Pretraining for Language
Understanding”, NIPS 2019, Vancouver, BC.
Class 5: Cognitively
Viable Computational Models of Linguistic Knowledge
July 17, 11:00am-12:20pm
1. How useful are linguistic theories for NLP applications
2. Machine learning models vs formal grammars
3. Explaining language acquisition
4. Conclusions and future work
References
David
Adger (2003), Core Syntax: A
Minimalist Aproach, Oxford University Press, Oxford.
Alexander Clark and Shalom
Lappin (2011), Linguistic
Nativism and the Poverty of the
Stimulus, Wiley-Blackwell, Oxford.
Jean-Philippe Bernardy and
Shalom Lappin (2017), “Using
Deep Neural Networks to Learn
Syntactic Agreement”, Linguistic Issues in Language Technology 15.2, pp. 1-15.
Jean-Philippe Bernardy,
Shalom Lappin, and Jey Han Lau (2018), “The Influence of Context
on Sentence Acceptability Judgements” in Proceedings of ACL 2018, Melbourne Australia.
Jihun Choi,
Kang Min Yoo, and Sang-goo Lee (2018), “Learning
to Compose Task-Specific
Tree Structures”, Proceedings
of the Thirty-Second Association for the Advancement of
Artificial
Intelligence Conference on Artificial Intelligence (AAAI-18), volume
2.
Noam Chomsky (1957), Syntactic Structure, Mouton, The Hague.
Alexander
Clark (2003), “Combining
Distributional and Morphological Information for Part
of Speech Induction”, Proceedings
of the Tenth Conference on European Chapter of the
Association
for Computational Linguistics – Volume 1, EACL ’03, pp. 59–66.
Alexander
Clark (2015), “Canonical
Context-Free Grammars and Strong Learning: Two
Approaches”, Proceedings of the 14th Meeting on the Mathematics of Language (MoL 14),
Chicago, IL, pp.
99–111.
Alexander Clark and Shalom
Lappin (2011), Linguistic
Nativism and the Poverty of the
Stimulus, Wiley-Blackwell, Oxford.
Alexander Clark and Shalom
Lappin (2013), “Complexity in Language Acquisition”, Topics
in Cognitive Science, 5(1), pp. 89–110.
Alexander
Clark and Ryo Yoshinaka (2014), “Distributional
Learning of Parallel Multiple
Context-Free Grammars”, Machine Learning 96, pp. 5-31.
Stephen Crain
and Rosalind.
Thornton (1998), Investigations
in Universal Grammar: A Guide
to Experiments in the Acquisition of Syntax and Semantics, MIT Press, Cambridge, MA.
Adam Ek, Jean-Philippe
Bernard, and Shalom Lappin (2019), “Language Modeling with
Syntactic and Semantic Representation for Sentence Acceptability Predictions” in
Proceedings
of NoDaLiDa 2019, Turku, Finland, pp. 76-85.
Jerry Fodor (2000), The Mind Doesn’t
Work that Way, MIT Press, Cambridge, MA.
Jerry Fodor and Zenon Pylyshyn
(1988), “Connectionism
and Cognitive Architecture: A
Critical Analysis”, Cognition 28, pp. 3-71.
Edward Gibson and Evelina Fedorenko
(2013), “The
Need for Quantitative Methods in
Syntax and Semantics Research”, Language and Cognitive Processes 28, pp. 88–124.
Edward Gibson, Steven T. Piantadosi, and Evelina Fedorenko (2013), “Quantitative
Methods
in Syntax/Semantics Research: A Response to
Sprouse and Almeida (2013)”, Language
and
Cognitive Processes 28, pp. 229–240.
E. Mark Gold (1967), “Language Identification in the Limit,” Information and
Control 10(5),
pp. 447–474.
John Hewitt and Christopher D. Manning (2019), “A Structural Probe for
Finding Syntax in
Word Representations” in Proceedings of
NAACL-HIT 2019, Minneapolis MN, pp. 4129–
4138.
Jennifer Hu, Jon
Gauthier, Peng Qian, Ethan Wilcox, and Roger P. Levy (2020), “A Systematic
Assessment of Syntactic Generalization in
Neural Language Models”, Proceedings of
the 58th
Annual Meeting of the Association for
Computational Linguistics, pp. 1725–1744.
Yoon Kim (2014), “Convolutional Neural Networks for Sentence
Classification”,
Proceedings
of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP), , Doha,
Qatar, pp.
1746–1751.
Adhiguna Kuncoro,
Chris Dyer, Laura Rimell, Stephen Clark, and Phil Blunsom (2019), Scalable
“Syntax-Aware Language Models
Using Knowledge Distillation”, Proceedings of the ACL 2019,
Florence, Italy, pp.
3472–3484.
Adhiguna
Kuncoro, Lingpeng Kong,
Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, and
Phil Blunsom
(2020), “Syntactic Structure
Distillation Pretraining For Bidirectional Encoders”,
ArXiv:2005.13482.
Shalom Lappin and Jey Han
Lau (2018), “Gradient
Probabilistic Models vs Categorical
Grammars: A Reply to Sprouse et al. (2018)”, The Science of Language (Ted Gibson blog).
Jey Han Lau, Alexander
Clark, and Shalom Lappin (2014). “Measuring Gradience in
Speakers Grammaticality Judgements”, Proceedings of
the Annual Meeting of the
Cognitive
Science Society 36, Quebec City QC.
Jey Han Lau, Alexander
Clark, and Shalom Lappin (2015), “Unsupervised Prediction of
Acceptability Judgements” in Proceedings of ACL 2015, Beijing, China, pp. 1618–1628.
Jey Han Lau, Alexander
Clark, and Shalom Lappin (2017), “Grammaticality,
Acceptability,
and Probability: A Probabilistic View of Linguistic Knowledge”, Cognitive Science 41(5),
pp.
1202–1241.
Jey Han Lau, Carlos
Armendariz, Shalom Lappin, Matthew Purver, and Chang
Shu (2020),
“How Furiously Can Green Ideas Sleep:
Sentence Acceptability in Context”, Transactions
of the Association for Computational
Linguistics 8, pp. 296-310.
R. Thomas McCoy, Robert Frank, and Tal Linzen (2020), “Does
Syntax Need to Grow on Trees?
Sources of Hierarchical Inductive Bias in
Sequence-to-Sequence Networks”, Transaction of the
Association
for Computational Linguistics 8, pp. 125-140.
Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and
Roberto Zamparelli (2014), “SemEval-2014 Task 1:
Evaluation of Compositional
Distributional Semantic Models on Full
Sentences through Semantic Relatedness and
Textual Entailment”, Proceedings of
the 8th International Workshop on Semantic
Evaluation (SemEval 2014), Dublin, Ireland, pp. 1-8.
Fernando Pereira (2000), “Formal Grammar and Information
Theory: Together Again?”,
Philosophical Transactions of the Royal Society, Royal Society,
London, pp. 1239-1253.
Kai Sheng Tai, Richard Socher,
and Christopher D. Manning (2015), “Improved Semantic
Representations from Tree-Structured Long Short-Term Memory Networks” in
Proceedings
of ACL 2015, Beijing, China, pp.
1556–1566.
Richard
Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew
Y Ng,
and Christopher Potts (2013), “Recursive Deep Models for Semantic
Compositionality over a Sentiment Treebank”, Proceedings
of the 2013 Conference on
Empirical
Methods in Natural Language Processing (EMNLP), Seattle, WA,
pp. 1631–1642,
Jon
Sprouse, J., Carson T.
Schütze, and Diogo Almeida
(2013), “A Comparison of Informal
and Formal Acceptability Judgments Using a
Random Sample from Linguistic Inquiry
2001–2010”, Lingua, 134, 219–248.
Jon
Sprouse and Diogo Almeida (2013), “The
Empirical Status of Data in Syntax: A Reply
to Gibson and Fedorenko”, Language and Cognitive Processes 28, pp. 229–240.
Jon Sprouse, Beracah Yankama, Sagar Indurkhya, Sandiway Fong, and Robert C. Berwick
Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman, (2019), “Neural
Network
Acceptability Judgments”, Transactions of the Association for Computational
Linguistics 7, pp. 625–641.
Adina Williams, A. Drozdov
and Samuel R Bowman (2018a), “Do Latent Tree Learning
Models Identify Meaningful Structure in
Sentences?” in Transactions of the Association of
Computational Linguistics 6, pp. 253–267.