Manuscripts
Substance Beats Style: Why Beginning Students Fail to Code with LLMs
Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson.
Preprint
GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models
Zixuan Wu, Yoolim Kim, and Carolyn Jane Anderson.
Preprint
Untangling classes of context-sensitivity: a closer look at the semantics of American English tomorrow.
Carolyn Jane Anderson.
2019 draft on LingBuzz
The andative and venitive construction in San Lucas Quiaviní Zapotec.
Carolyn Jane Anderson. 2017. Ms.
Draft on LingBuzz
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries.
Preprint
2024
Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark
Funing Yang and Carolyn Jane Anderson. Accepted to the 4th International Workshop on Natural Language Processing for Digital Humanities (NLP4DH) at EMNLP 2024. Selected for oral presentation.
Preprint
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha. Accepted to COLM 2024.
Preprint
What Parenthesized Modifiers (May) Mean
Yoolim Kim and Carolyn Jane Anderson. Proceedings of Experiments in Linguistic Meaning (ELM) 3.
Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha. Accepted to OOPSLA 2024.
Preprint
Exploring Language Representation through a Resource Inventory Project
Carolyn Jane Anderson. Teaching Resource accepted to TeachNLP Workshop at ACL 2024.
Preprint
A Prompting Assignment for Exploring Pretrained LLMs
Carolyn Jane Anderson. Teaching Resource accepted to TeachNLP Workshop at ACL 2024.
Preprint
StudentEval: a Benchmark of Student-Written Prompts for Large Language Models of Code
Hannah Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson. Findings of the Association for Computational Linguistics 2024.
Arxiv draft
Preprint
HuggingFace dataset
Non-Expert Programmers in the Generative AI Future
Molly Q Feldman and Carolyn Jane Anderson. Accepted to CHIWORK 2024.
Preprint
How Beginning Programmers and Code LLMs (Mis)read Each Other
Sydney Nguyen, Hannah Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman. Accepted to CHI 2024.
Paper
2023
StarCoder: May the Source Be With You!
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Randy, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Suriya Gunasekar, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries. Accepted to Transactions on Machine Learning Research
Preprint
Protagonist-mediated perspective
Carolyn Jane Anderson and Arjun Guha. Proceedings of Sinn und Bedeutung (SuB) 28.
Preprint
Solving and Generating NPR Sunday Puzzles with Large Language Models
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation
Do All Minority Languages Look the Same to Chat-GPT? Linguistic (Mis)information in a Large Language Model.
Sydney Nguyen and Carolyn Jane Anderson. Poster to be presented at the Society for Computation in Linguistics (SCiL) 2023.
Cross-linguistic differences in processing parentheticals between English and Korean.
Yoolim Kim and Carolyn Jane Anderson. Accepted for presentation at Comparative Punctuation Worldwide.
SantaCoder: Don’t Reach For the Stars!
Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Terry Yue Zhuo, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Michael Lappert, Ian Yu, Paulo Villegas, Jia Li, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Arjun Guha, Harm de Vries, Leandro von Werra.
Best Paper Award at Deep Learning 4 Code (DL4C) workshop.Draft
Grammatical perspective-taking in comprehension and production.
Carolyn Jane Anderson and Brian Dillon. Open Mind.
Paper
Exploring Social Biases of Large Language Models in a College Artificial Intelligence Course
Skylar Kolisko and Carolyn Jane Anderson. Proceedings of the Thirteenth Symposium on Educational Advances in Artificial Intelligence (EAAI-23).
Preprint
2022
Eliciting Associated Motion Constructions in Two Zapotec Languages
Fe Silva-Robles, Felipe H. Lopez, John Duff, and Carolyn Jane Anderson. Semantic Fieldwork Methods
Protagonist-Mediated Perspective
Carolyn Jane Anderson. Talk to be given at the Narration in Context workshop at the Deutsche Gesellschaft für Sprachwissenschaft (DGfS), 2022.
(Some) parentheses are focus-sensitive operators
Carina Bolaños Lewen and Carolyn Jane Anderson. Proceedings of Sinn und Bedeutung (SuB) 26.
Paper
2021
ProSPer: Probing Human and Neural Network Language Model Understanding of Spatial Perspective.
Tessa Masis and Carolyn Jane Anderson. Accepted to the BlackboxNLP workshop at the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021.
Preprint
Solver-based Gradual Type Migration.
Luna Phipps-Costin, Carolyn Jane Anderson, Michael Greenberg, and Arjun Guha. Accepted to the ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages and Applications (OOPSLA) 2021.
Tell Me Everything You Know: A Conversation Update System for the Rational Speech Acts Framework
Carolyn Jane Anderson. Proceedings of the Society for Computation in Linguistics (SCiL) 2021.
Paper
Coming in, or going out? Measuring the effect of discourse factors on perspective prominence
Diagnosing the semantics of perspectival expressions
Carolyn Jane Anderson. Poster presented at the annual meeting of the Linguistic Society of America (LSA) 2021.
Abstract
2020
Shifting the Perspectival Landscape: Methods for Encoding, Identifying, and Selecting Perspectives.
Carolyn Jane Anderson. Dissertation, University of Massachusetts, Amherst.
LingBuzz
Can neural network language models understand spatial perspective?
Carolyn Jane Anderson and Tessa Masis. Paper presented at Bridging AI and Cognitive Science (BAICS), at the International Conference on Learning Representations (ICLR) 2020.
Non-archival paper
2019
Guess Who's Coming (And Who's Going): Bringing Perspective to the Rational Speech Acts Framework.
Carolyn Jane Anderson and Brian Dillon. Proceedings of the Society for Computation in Linguistics (SCiL) 2019.
Paper Poster
"Tomorrow" Isn't Always A Day Away.
Taking other perspectives into account: an RSA model of perspectival reasoning.
Carolyn Jane Anderson and Brian Dillon. Talk given at Rational Approaches in Language Science (RAiLS) 2019.
Explaining the progressive motion verb puzzle in Zapotec.
Carolyn Jane Anderson. Talk given at the Texas Linguistics Society 2019.
Slides
2018
"Tomorrow" Isn't Always A Day Away.
Carolyn Jane Anderson. Poster presented at the 31st annual CUNY Human Sentence Processing Conference (CUNY) 2018.
Abstract
The San Lucas Quiaviní Zapotec Andative and Venitive.
2017
The Andative and Venitive Construction in San Lucas Quiaviní Zapotec.
2016
Negation in Colonial Valley Zapotec.
Carolyn Jane Anderson and Brook Danielle Lillehaugen. Transactions of the Philological Society 114(3).
2015
The Morphosyntax of Negation in Colonial Valley Zapotec.
2014
NetKAT: Semantic Foundations for Networks.
Carolyn J. Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL) 2014.
PDF Slides
La morfosintaxis de la negation en el zapoteco del Valle colonial.
Carolyn Jane Anderson and Brook Danielle Lillehaugen. Talk presented at Coloquio sobre Lenguas Otomangues y Vecinas IV: Mario Molina Cruz (COLOV) 2014.
Abstract
"I talk it and I feel it": Language attitudes of Moroccan university students
Carolyn Jane Anderson. Honors thesis, Swarthmore College.
2013
Language Ideology and Human Rights Doctrine in Morocco.
Carolyn Anderson. Talk presented at New Ways of Analyzing Variation (NWAV) 42.