WeSSLLI 2020

 

Deep Learning and the Nature of Linguistic Representation

July 13-17, 2020

 

Shalom Lappin

University of Gothenburg, Queen Mary University of London, and

King’s College London

            

                             All class times are Boston EST

 

Class 1: Introduction to Deep Learning in NLP

July 13, 11:00am-12:20pm

 

1.   Machine learning as a source of cognitive insights

2.   Basic Elements of deep learning

3.   Types of deep neural networks

4.   An example application of deep learning to NLP

 

                     Class 1 slides

 

                                                            References

 

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio (2015), Neural Machine

    Translation by Jointly Learning to Align and Translate, Proccedings of ICLR 2015.

Yuri Bizzoni and Shalom Lappin (2017), Deep Learning of Binary and Gradient Judgements

    for Semantic Paraphrase, Proceedings of the International Workshop on Computational

    Semantics 2017, Montpellier, France, September 2017.

Alexander Clark and Shalom Lappin (2011), Linguistic Nativism and the Poverty of the

    Stimulus, Wiley-Blackwell, Oxford.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2019), “BERT: Pre-

    training of Deep Bidirectional Transformers for Language Understanding”, Proceedings

    of NAACL-HLT 2019, Minneapolis MN, pp. 4171–4186.

Jeffrey L. Elman (1990), “Finding Structure in Time”, Cognitive Science 14, pp. 179-211.

Jeffrey Pennington, Richard Socher, and Christopher D. Manning (2014), “GloVe: Global

    Vectors for Word Representation”, Proceedings of EMNLP 2014, pp. 1542-1543.

Ian Goodfellow (2016), Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press,

    Cambridge MA. 

Sepp Hochreiter and Jürgen Schmidhuber (1997), “Long Short Term Memory”, Neural

    Computation 9(8), pp. 1735-80

Shalom Lappin, and Stuart Shieber (2007), “Machine Learning Theory and Practice as a Source

    of Insight into Universal Grammar”, Journal of Linguistics 43(2), pp. 393-427.

Tomaš Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean (2013),

    Distributed Representations of Words and Phrases and their Compositionality”,

    Proceedings of NIPS 2013, Lake Tahoe, Nevada.

Alec Radford, Karthik Narasimha, Tim Salimans, and Ilya Sutskever (2018), “Improving

    Language Understanding by Generative Pre-Training”, Open AI.

David E. Rumelhart, James L. McClelland and PDP Research Group (1986), Parallel

    Distributed Processing, Volume 1, MIT Press, Cambridge MA.

Ashish Vaswani,  Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,

    and Łukasz Kaiser (2017), “Attention Is All You Need”, Proceedings of NIPS 2017,

    Long Beach CA.

 

 

Class 2: Learning Syntactic Properties with Deep Neural Networks

July 14, 11:00am-12:20pm

 

1.   Using DNNs to identify subject-verb agreement

2.   Experimenting with DNN architecture and parameters

3.   DNNs and hierarchical structure

4.   Deep learning of tree structures

 

Class 2 slides

 

                                                   References

 

Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta (2009), “The WaCky

    Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora”,

    Language Resources and Evaluation 43, 209–226.

Jean-Philippe Bernardy and Shalom Lappin (2017), “Using Deep Neural Networks to Learn

    Syntactic Agreement”, Linguistic Issues in Language Technology 15.2, pp. 1-15.

Idan Blank, Zuzanna Balewski, Kyle Mahowald, and Ev Fedorenko (2016) ”Syntactic

    Processing is Distributed across the Language System”, NeuroImage 127, pp. 307–323.

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning (2015),

    A Large Annotated Corpus for Learning Natural Language Inference”, Proceedings of the

    2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.

    632–642.

Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning,

    and Christopher Potts (2016), “A Fast Unified Model for Parsing and Sentence

    Understanding”, Proceedings of the 54th Annual Meeting of the Association for

    Computational Linguistics (ACL), Volume 1: Long Papers, pages 1466–1477.

Kyunghyun, Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi

    Bougares, Holger Schwenk, and Yoshua Bengio (2014), “Learning Phrase Representations

    Using RNN Encoder–Decoder for Statistical Machine Translation”, Proceedings of the

    2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha,

    Qatar, pp. 1724–1734. Association for Computational Linguistics.

Jihun Choi, Kang Min Yoo, and Sang-goo Lee (2018), “Learning to Compose Task-Specific

    Tree Structures”, Proceedings of the Thirty-Second Association for the Advancement of

    Artificial Intelligence Conference on Artificial Intelligence (AAAI-18), volume 2.

Chris Dyer,  Adhiguna Kuncoro,  Miguel Ballesteros, Noah A. Smith (2016), “Recurrent

    Neural Network Grammars”, Proceedings of NAACL-HTL 2016, San Diego, CA,

    pp. 199-209.

Yoav Goldberg (2019), “Assessing BERT's Syntactic Abilities”, ArXiv 1901.05287.

Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, and Marco Baroni (2018),

    Colorless Green Recurrent Networks Dream Hierarchically” in Proceedings NAACL-HLT

    2018, New Orleans LA, pp. 1195–1205.

Amulya Gupta and Zhu Zhang (2018), “To Attend or Not to Attend: A Case Study on

    Syntactic Structures for Semantic Relatedness”, Proceedings of ACL 2018, Melbourne,

    Australia , pp. 2116–2125.

John Hewitt and Christopher D. Manning (2019), “A Structural Probe for Finding Syntax in

    Word Representations” in Proceedings of NAACL-HIT 2019, Minneapolis MN, pp. 4129–

    4138.

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu (2016),

    Exploring the Limits of Language Modeling”, arXiv preprint arXiv:1602.02410.

             Adhiguna Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, and Phil

                 Blunsom (2018),LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling

                 Structure Makes Them Better”, Proceedings of the ACL 2018, Melbourne, Australia,

                pp. 1426–1436.

            Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, and Phil Blunsom (2019), Scalable

                Syntax-Aware Language Models Using Knowledge Distillation”, Proceedings of the ACL 2019,

                Florence, Italy, pp. 3472–3484.

            Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg (2016), “Assessing the Ability of

    LSTMs to Learn Syntax-Sensitive Dependencies”, Transactions of the Association for

    Computational Linguistics, 4, pp. 521–535.

Jean Maillard, Stephen Clark, and Dani Yogatama (2017), “Jointly Learning Sentence

    Embeddings and Syntax with Unsupervised Tree-LSTMs”, arXiv preprint 1705.09189.

Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor (1999),

    Treebank3, LDC99T42. Linguistic Data Consortium.

Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning (2006),  

    Generating Typed Dependency Parses from Phrase Structure Parses”, Proceedings of

    LREC 2006, Genoa, Italy.

Rebecca Marvin and Tal Linzen (2018), “Targeted Syntactic Evaluation of Language Models”,

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,

    Brussels, Belgium, pp. 1192–1202.

Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, Gülsen Eryigit, Sandra Kübler,

    Svetoslav Marinov, and Erwin Marsi, (2007), “Maltparser: A Language-Independent

    System for Data-Driven Dependency Parsing”, Natural Language Engineering, 95–135.

Helmut Schmid (1995), “Improvements in Part-of-Speech Tagging with an Application to

    German”, Proceedings of the ACL SIGDAT-Workshop, Association for Computational

    Linguistics.

Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D.

    Manning (2011), “Semi-Supervised Recursive Autoencoders for Predicting Sentiment

    Distributions”, Proceedings of the 2011 Conference on Empirical Methods in Natural

    Language Processing (EMNLP), pp. 151–161.

Adina Williams, Nikita Nangia, and Samuel R. Bowman (2018), “A Broad-Coverage

    Challenge Corpus for Sentence Understanding Through Inference”, Proceedings of the

    2018 Conference of the North American Chapter of the Association for Computational

    Linguistics: Human Language Technologies (NAACL-HLT).

Adina Williams, A. Drozdov and Samuel R Bowman (2018a), “Do Latent Tree  Learning

    Models Identify Meaningful Structure in Sentences?” in Transactions of the Association of

    Computational Linguistics 6, pp. 253–267.

Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, and Wang Ling (2017),

    Learning to Compose Words into Sentences with Reinforcement Learning”, Proceedings

    of the International Conference on Learning Representations (ICLR).

 

 

Class 3: Machine learning and the Sentence Acceptability Task

July 15, 11:00am-12:20pm

 

1.   Gradience in sentence acceptability

2.   Predicting acceptability with ML models

3.   Adding tags and trees

 

Class 3 slides

 

                                                                  References

 

David Adger (2003), Core Syntax: A Minimalist Aproach, Oxford University Press, Oxford.

Alexander Clark and Shalom Lappin (2011), Linguistic Nativism and the Poverty of the

    Stimulus, Wiley-Blackwell, Oxford.

Denqi Chen and Christopher Manning (2014), “A Fast and Accurate Dependency Parser

    using Neural Networks”, EMNLP 2014, Doha, Qatar, pp. 740–750.

Adam Ek, Jean-Philippe Bernard, and Shalom Lappin (2019), “Language Modeling with

    Syntactic and Semantic Representation for Sentence Acceptability Predictions” in

    Proceedings of NoDaLiDa 2019, Turku, Finland, pp. 76-85.

Dan Klein and Christopher Manning (2003a), “Accurate Unlexicalized Parsing”, Proceedings

     of the 41st annual meeting of the Association for Computational Linguistics (ACL 2003,

    Sapporo, Japan, pp. 423–430.

 Dan Klein and Chris Manning (2003b), “Fast Exact Inference with A Factored Model for

    Natural Language Parsing”, Advances in neural information processing systems 15 (NIPS-

    03), Whistler, Canada, pp. 3–10.

Jey Han Lau, Alexander Clark, and Shalom Lappin (2014). “Measuring Gradience in

    Speakers Grammaticality Judgements”, Proceedings of the Annual Meeting of the

    Cognitive Science Society 36, Quebec City QC.

Jey Han Lau, Alexander Clark, and Shalom Lappin (2015), “Unsupervised Prediction of

    Acceptability Judgements” in Proceedings of ACL 2015, Beijing, China, pp. 1618–1628.

Jey Han Lau, Alexander Clark, and Shalom Lappin (2017), “Grammaticality, Acceptability,

    and Probability: A Probabilistic View of Linguistic Knowledge”, Cognitive Science 41(5),

    pp. 1202–1241.

Tomàš Mikolov (2012). Statistical Language Models Based on Neural Networks,

    Unpublished doctoral dissertation, Brno University of Technology.

T. Mikolov, S. Kombrink, A. Deoras, L. Burget, and J. Eernocky (2011), “RNNLM-

    Recurrent Neural Network Language Modeling Toolkit”, IEEE Automatic Speech

    Recognition and Understanding Workshop, Big Island, Hawaii.

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič,

    Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira,

    Reut Tsarfaty, Daniel Zeman (2016), “Universal Dependencies v1: A Multilingual

    Treebank Collection”, LREC.

Joakim Nivre, Agić Željko, Lars Ahrenberg, and et al (2017),  Universal dependencies 2.0,

    LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics

    (UFAL), Faculty of ´ Mathematics and Physics, Charles University, Prague, The Czech

    Republic.

Adam Pauls and Dan Klein (2012), “Large-Scale Syntactic Language Modeling with

    Treelets”, Proceedings of the 50th Annual Meeting of the Association for Computational

    Linguistics, Jeju, Korea, pp. 959–968.

David Vilares and Carlos Gómez-Rodríguez (2018), “A Transition-based Algorithm for

    Unrestricted AMR Parsing”, Proceedings of NAACL-HLT 2018, New Orleans, Louisiana

    pp. s 142–149.

Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman, (2019), “Neural Network

    Acceptability Judgments”, Transactions of the Association for Computational

    Linguistics 7, pp. 625–641.

 

 

Class 4: Predicting Human Acceptability Judgments in Context

July 16, 11:00am-12:20pm

 

1.   Judgments in context

2.   Two sets of experiments

3.   The compression effect and discourse coherence

4.   Predicting acceptability with different DNNs

 

Class 4 slides

 

                                                                 References

 

Jean-Philippe Bernardy, Shalom Lappin, and Jey Han Lau (2018), “The Influence of Context

    on Sentence Acceptability Judgements” in Proceedings of ACL 2018, Melbourne Australia.

Yuri Bizzoni and Shalom Lappin (2019), “Predicting Metaphor Paraphrase Judgements in

    ContextProceedings of the 13th International Conference on Computational Semantics,

    Gothenburg, Sweden, pp.165-175.

Mickaël Causse, Vsevolod Peysakhovich, and Eve F. Fabre (2016), “High Working Memory

    Load Impairs Language Processing During a Simulated Piloting Task: An ERP and

    Pupillometry Study”, Frontiers in Human Neuroscience 10.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2019), “BERT: Pre-

    training of Deep Bidirectional Transformers for Language Understanding”, Proceedings

    of NAACL-HLT 2019, Minneapolis MN, pp. 4171–4186.

Felix Hill, Roi Reichart, and Anna Korhonen (2015), “SimLex-999: Evaluating Semantic

    Models with (Genuine) Similarity Estimation”, Computational Linguistics 4, pp. 665–695.

Aine Ito, Martin Corley, and Martin J. Pickering (2018), “A Cognitive Load Delays

    Predictive Eye Movements Similarly During L1 and L2 Comprehension”, Bilingualism:

    Language and Cognition 21(2), pp. 251-264.

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico,

    Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer,

    Ondrej Bojar, Alexandra Constantin, and Evan Herbst (2007), “Moses: Open Source

    Toolkit for Statistical Machine Translation”, Proceedings of the 45th Annual Meeting of the

    Association for Computational Linguistics. Prague, Czech Republic, pp. 177–180.

Jey Han Lau, Timothy Baldwin, and Trevor Cohn (2017), “Topically Driven Neural

    Language Model”, Proceedings of the 55th Annual Meeting of the Association for

    Computational Linguistics, Vancouver, Canada, pp. 355–365.

Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, and Chang Shu (2020),

    “How Furiously Can Green Ideas Sleep: Sentence Acceptability in Context”, Transactions

    of the Association for Computational Linguistics 8, pp. 296-310.

Hyangsook Park, Jun-Su Kang, Sungmook Choi, and Minho Lee (2013), “Analysis of

    Cognitive Load for Language Processing Based on Brain Activities”, Neural Information

    Processing, Berlin, Heidelberg. Springer Berlin Heidelberg, pp. 561–568.

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever (2019),

    Language Models are Unsupervised Multitask Learners”, OpenAI.

John Sweller (1988), “Cognitive Load During Problem Solving: Effects on Learning”,

    Cognitive Science, 12(2), pp. 257–285.

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc

    V. Le (2019), “XLNet: Generalized Autoregressive Pretraining for Language

    Understanding”, NIPS 2019, Vancouver, BC.

 

 

Class 5: Cognitively Viable Computational Models of Linguistic Knowledge

July 17, 11:00am-12:20pm

 

1.   How useful are linguistic theories for NLP applications

2.   Machine learning models vs formal grammars

3.   Explaining language acquisition

4.   Conclusions and future work

 

Class 5 slides

 

                                                               References

 

David Adger (2003), Core Syntax: A Minimalist Aproach, Oxford University Press, Oxford.

Alexander Clark and Shalom Lappin (2011), Linguistic Nativism and the Poverty of the

    Stimulus, Wiley-Blackwell, Oxford.

Jean-Philippe Bernardy and Shalom Lappin (2017), “Using Deep Neural Networks to Learn

    Syntactic Agreement”, Linguistic Issues in Language Technology 15.2, pp. 1-15.

Jean-Philippe Bernardy, Shalom Lappin, and Jey Han Lau (2018), “The Influence of Context

    on Sentence Acceptability Judgements” in Proceedings of ACL 2018, Melbourne Australia.

Jihun Choi, Kang Min Yoo, and Sang-goo Lee (2018), “Learning to Compose Task-Specific

    Tree Structures”, Proceedings of the Thirty-Second Association for the Advancement of

    Artificial Intelligence Conference on Artificial Intelligence (AAAI-18), volume 2.

Noam Chomsky (1957), Syntactic Structure, Mouton, The Hague.

Alexander Clark (2003), Combining Distributional and Morphological Information for Part

    of Speech Induction, Proceedings of the Tenth Conference on European Chapter of the

    Association for Computational Linguistics – Volume 1, EACL ’03, pp. 59–66.

Alexander Clark (2015), “Canonical Context-Free Grammars and Strong Learning: Two

    Approaches”, Proceedings of the 14th Meeting on the Mathematics of Language (MoL 14),

    Chicago, IL, pp. 99–111.

Alexander Clark and Shalom Lappin (2011), Linguistic Nativism and the Poverty of the

    Stimulus, Wiley-Blackwell, Oxford.

Alexander Clark and Shalom Lappin (2013),Complexity in Language Acquisition, Topics

    in Cognitive Science, 5(1), pp. 89–110.

Alexander Clark and Ryo Yoshinaka (2014), Distributional Learning of Parallel Multiple

    Context-Free Grammars, Machine Learning 96, pp. 5-31.

Stephen Crain and Rosalind. Thornton (1998), Investigations in Universal Grammar: A Guide

    to Experiments in the Acquisition of Syntax and Semantics, MIT Press, Cambridge, MA.

Adam Ek, Jean-Philippe Bernard, and Shalom Lappin (2019), “Language Modeling with

    Syntactic and Semantic Representation for Sentence Acceptability Predictions” in

    Proceedings of NoDaLiDa 2019, Turku, Finland, pp. 76-85.

Jerry Fodor (2000), The Mind Doesn’t Work that Way, MIT Press, Cambridge, MA.

Jerry Fodor and Zenon Pylyshyn (1988), “Connectionism and Cognitive Architecture: A

    Critical Analysis”, Cognition 28, pp. 3-71.

Edward Gibson and Evelina Fedorenko (2013), “The Need for Quantitative Methods in

    Syntax and Semantics Research”, Language and Cognitive Processes 28, pp. 88–124.

Edward Gibson, Steven T. Piantadosi, and Evelina Fedorenko (2013), “Quantitative Methods

    in Syntax/Semantics Research: A Response to Sprouse and Almeida (2013)”, Language

    and Cognitive Processes 28, pp. 229–240.

E. Mark Gold (1967), “Language Identification in the Limit,” Information and Control 10(5),

    pp. 447–474.

John Hewitt and Christopher D. Manning (2019), “A Structural Probe for Finding Syntax in

    Word Representations” in Proceedings of NAACL-HIT 2019, Minneapolis MN, pp. 4129–

    4138.

Jennifer Hu, Jon Gauthier, Peng Qian, Ethan Wilcox, and Roger P. Levy (2020), “A Systematic

    Assessment of Syntactic Generalization in Neural Language Models”, Proceedings of the 58th

    Annual Meeting of the Association for Computational Linguistics, pp. 1725–1744.

Yoon Kim (2014), “Convolutional Neural Networks for Sentence Classification”,

    Proceedings of the 2014 Conference on Empirical Methods in Natural Language

    Processing (EMNLP), , Doha, Qatar, pp. 1746–1751.

              Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, and Phil Blunsom (2019), Scalable

                  Syntax-Aware Language Models Using Knowledge Distillation”, Proceedings of the ACL 2019,

                  Florence, Italy, pp. 3472–3484.

              Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, and

                  Phil Blunsom (2020), “Syntactic Structure Distillation Pretraining For Bidirectional Encoders”,

                  ArXiv:2005.13482.

Shalom Lappin and Jey Han Lau (2018), “Gradient Probabilistic Models vs Categorical

    Grammars: A Reply to Sprouse et al. (2018)”, The Science of Language (Ted Gibson blog).

Jey Han Lau, Alexander Clark, and Shalom Lappin (2014). “Measuring Gradience in

    Speakers Grammaticality Judgements”, Proceedings of the Annual Meeting of the

    Cognitive Science Society 36, Quebec City QC.

Jey Han Lau, Alexander Clark, and Shalom Lappin (2015), “Unsupervised Prediction of

    Acceptability Judgements” in Proceedings of ACL 2015, Beijing, China, pp. 1618–1628.

Jey Han Lau, Alexander Clark, and Shalom Lappin (2017), “Grammaticality, Acceptability,

    and Probability: A Probabilistic View of Linguistic Knowledge”, Cognitive Science 41(5),

    pp. 1202–1241.

Jey Han Lau, Carlos Armendariz, Shalom Lappin, Matthew Purver, and Chang Shu (2020),

    How Furiously Can Green Ideas Sleep: Sentence Acceptability in Context”, Transactions

    of the Association for Computational Linguistics 8, pp. 296-310.

R. Thomas McCoy, Robert Frank, and Tal Linzen (2020), “Does Syntax Need to Grow on Trees?

    Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks”, Transaction of the

    Association for Computational Linguistics 8, pp. 125-140.

Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and

    Roberto Zamparelli (2014), “SemEval-2014 Task 1: Evaluation of Compositional

    Distributional Semantic Models on Full Sentences through Semantic Relatedness and

    Textual Entailment”, Proceedings of the 8th International Workshop on Semantic

    Evaluation (SemEval 2014), Dublin, Ireland, pp. 1-8.

Fernando Pereira (2000), “Formal Grammar and Information Theory: Together Again?”,

    Philosophical Transactions of the Royal Society, Royal Society, London, pp. 1239-1253.

Kai Sheng Tai, Richard Socher, and Christopher D. Manning (2015), “Improved Semantic

    Representations from Tree-Structured Long Short-Term Memory Networks” in

    Proceedings of ACL 2015, Beijing, China, pp. 1556–1566.

Richard Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew

    Y Ng, and Christopher Potts (2013), “Recursive Deep Models for Semantic 

    Compositionality over a Sentiment Treebank”, Proceedings of the 2013 Conference on

    Empirical Methods in Natural Language Processing (EMNLP), Seattle, WA,

    pp. 1631–1642,

Jon Sprouse, J., Carson T. Schütze, and Diogo Almeida (2013), A Comparison of Informal

    and Formal Acceptability Judgments Using a Random Sample from Linguistic Inquiry

    2001–2010”, Lingua, 134, 219–248.

Jon Sprouse and Diogo Almeida (2013), The Empirical Status of Data in Syntax: A Reply

    to Gibson and Fedorenko, Language and Cognitive Processes 28, pp. 229–240.

Jon Sprouse, Beracah Yankama, Sagar Indurkhya, Sandiway Fong, and Robert C. Berwick

    (2018), “Colorless Green Ideas do Sleep Furiously: Gradient Acceptability and the Nature

    of the Grammar”, The Linguistic Review 35:3, pp. 575-599.

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman

    (2018), “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language

    Understanding”, Proceedings of EMNLP 2018, Brussels, Belgium, pp. 353–355.

Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman, (2019), “Neural Network

    Acceptability Judgments”, Transactions of the Association for Computational

    Linguistics 7, pp. 625–641.

Adina Williams, A. Drozdov and Samuel R Bowman (2018a), “Do Latent Tree  Learning

    Models Identify Meaningful Structure in Sentences?” in Transactions of the Association of

    Computational Linguistics 6, pp. 253–267.