Manuscripts

Substance Beats Style: Why Beginning Students Fail to Code with LLMs

Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson.

Preprint

GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models

Zixuan Wu, Yoolim Kim, and Carolyn Jane Anderson.

Preprint

Untangling classes of context-sensitivity: a closer look at the semantics of American English tomorrow.

Carolyn Jane Anderson.

2019 draft on LingBuzz

The andative and venitive construction in San Lucas Quiaviní Zapotec.

Carolyn Jane Anderson. 2017. Ms.

Draft on LingBuzz

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries.

Preprint

2024

Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark

Funing Yang and Carolyn Jane Anderson. Accepted to the 4th International Workshop on Natural Language Processing for Digital Humanities (NLP4DH) at EMNLP 2024. Selected for oral presentation.

Preprint

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha. Accepted to COLM 2024.

Preprint

What Parenthesized Modifiers (May) Mean

Yoolim Kim and Carolyn Jane Anderson. Proceedings of Experiments in Linguistic Meaning (ELM) 3.

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha. Accepted to OOPSLA 2024.

Preprint

Exploring Language Representation through a Resource Inventory Project

Carolyn Jane Anderson. Teaching Resource accepted to TeachNLP Workshop at ACL 2024.

Preprint

A Prompting Assignment for Exploring Pretrained LLMs

Carolyn Jane Anderson. Teaching Resource accepted to TeachNLP Workshop at ACL 2024.

Preprint

StudentEval: a Benchmark of Student-Written Prompts for Large Language Models of Code

Hannah Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson. Findings of the Association for Computational Linguistics 2024.

Arxiv draft
Preprint
HuggingFace dataset

Non-Expert Programmers in the Generative AI Future

Molly Q Feldman and Carolyn Jane Anderson. Accepted to CHIWORK 2024.

Preprint

How Beginning Programmers and Code LLMs (Mis)read Each Other

Sydney Nguyen, Hannah Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman. Accepted to CHI 2024.

Paper

2023

StarCoder: May the Source Be With You!

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Randy, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Suriya Gunasekar, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries. Accepted to Transactions on Machine Learning Research

Preprint

Protagonist-mediated perspective

Carolyn Jane Anderson and Arjun Guha. Proceedings of Sinn und Bedeutung (SuB) 28.

Preprint

Solving and Generating NPR Sunday Puzzles with Large Language Models

Jingmiao Zhao and Carolyn Anderson. Accepted to the International Conference on Computational Creativity (ICCC) 2023

Preprint
Paper

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, and Abhinav Jangda. Accepted to IEEE Transactions on Software Engineering

Preprint
Paper

Do All Minority Languages Look the Same to Chat-GPT? Linguistic (Mis)information in a Large Language Model.

Sydney Nguyen and Carolyn Jane Anderson. Poster to be presented at the Society for Computation in Linguistics (SCiL) 2023.

Cross-linguistic differences in processing parentheticals between English and Korean.

Yoolim Kim and Carolyn Jane Anderson. Accepted for presentation at Comparative Punctuation Worldwide.

SantaCoder: Don’t Reach For the Stars!

Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Terry Yue Zhuo, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Michael Lappert, Ian Yu, Paulo Villegas, Jia Li, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Arjun Guha, Harm de Vries, Leandro von Werra.

Best Paper Award at Deep Learning 4 Code (DL4C) workshop.
Draft

Grammatical perspective-taking in comprehension and production.

Carolyn Jane Anderson and Brian Dillon. Open Mind.

Paper

Exploring Social Biases of Large Language Models in a College Artificial Intelligence Course

Skylar Kolisko and Carolyn Jane Anderson. Proceedings of the Thirteenth Symposium on Educational Advances in Artificial Intelligence (EAAI-23).

Preprint

2022

Eliciting Associated Motion Constructions in Two Zapotec Languages

Fe Silva-Robles, Felipe H. Lopez, John Duff, and Carolyn Jane Anderson. Semantic Fieldwork Methods

Protagonist-Mediated Perspective

Carolyn Jane Anderson. Talk to be given at the Narration in Context workshop at the Deutsche Gesellschaft für Sprachwissenschaft (DGfS), 2022.

(Some) parentheses are focus-sensitive operators

Carina Bolaños Lewen and Carolyn Jane Anderson. Proceedings of Sinn und Bedeutung (SuB) 26.

Paper

2021

ProSPer: Probing Human and Neural Network Language Model Understanding of Spatial Perspective.

Tessa Masis and Carolyn Jane Anderson. Accepted to the BlackboxNLP workshop at the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021.

Preprint

Solver-based Gradual Type Migration.

Luna Phipps-Costin, Carolyn Jane Anderson, Michael Greenberg, and Arjun Guha. Accepted to the ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages and Applications (OOPSLA) 2021.

Tell Me Everything You Know: A Conversation Update System for the Rational Speech Acts Framework

Carolyn Jane Anderson. Proceedings of the Society for Computation in Linguistics (SCiL) 2021.

Paper

Coming in, or going out? Measuring the effect of discourse factors on perspective prominence

Carolyn Jane Anderson. Proceedings of Experiments in Linguistic Meaning (ELM) 1.

Abstract
Paper

Diagnosing the semantics of perspectival expressions

Carolyn Jane Anderson. Poster presented at the annual meeting of the Linguistic Society of America (LSA) 2021.

Abstract

2020

Shifting the Perspectival Landscape: Methods for Encoding, Identifying, and Selecting Perspectives.

Carolyn Jane Anderson. Dissertation, University of Massachusetts, Amherst.

LingBuzz

Can neural network language models understand spatial perspective?

Carolyn Jane Anderson and Tessa Masis. Paper presented at Bridging AI and Cognitive Science (BAICS), at the International Conference on Learning Representations (ICLR) 2020.

Non-archival paper

2019

Guess Who's Coming (And Who's Going): Bringing Perspective to the Rational Speech Acts Framework.

Carolyn Jane Anderson and Brian Dillon. Proceedings of the Society for Computation in Linguistics (SCiL) 2019.

Paper Poster

"Tomorrow" Isn't Always A Day Away.

Carolyn Jane Anderson. Proceedings of Sinn und Bedeutung 23.

Paper Slides

Taking other perspectives into account: an RSA model of perspectival reasoning.

Carolyn Jane Anderson and Brian Dillon. Talk given at Rational Approaches in Language Science (RAiLS) 2019.

Explaining the progressive motion verb puzzle in Zapotec.

Carolyn Jane Anderson. Talk given at the Texas Linguistics Society 2019.

Slides

2018

"Tomorrow" Isn't Always A Day Away.

Carolyn Jane Anderson. Poster presented at the 31st annual CUNY Human Sentence Processing Conference (CUNY) 2018.

Abstract

The San Lucas Quiaviní Zapotec Andative and Venitive.

Carolyn Jane Anderson. Talk given at the annual meeting of the Society for the Study of Indigenous Languages of the Americas (SSILA) 2018.

Honorable Mention, Best Student Presentation Award

Abstract Slides

2017

The Andative and Venitive Construction in San Lucas Quiaviní Zapotec.

Carolyn Jane Anderson. Talk given at the Workshop on Multi-Verb Constructions: Semantic, Syntactic and Typological Perspectives (MVC) 2017.

Abstract Handout

2016

Negation in Colonial Valley Zapotec.

Carolyn Jane Anderson and Brook Danielle Lillehaugen. Transactions of the Philological Society 114(3).

PDF

2015

The Morphosyntax of Negation in Colonial Valley Zapotec.

Carolyn Jane Anderson and Brook Danielle Lillehaugen. Talk given at the annual meeting of the Society for the Study of Indigenous Languages of the Americas (SSILA) 2015.

Abstract Handout

2014

NetKAT: Semantic Foundations for Networks.

Carolyn J. Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL) 2014.

PDF Slides

La morfosintaxis de la negation en el zapoteco del Valle colonial.

Carolyn Jane Anderson and Brook Danielle Lillehaugen. Talk presented at Coloquio sobre Lenguas Otomangues y Vecinas IV: Mario Molina Cruz (COLOV) 2014.

Abstract

"I talk it and I feel it": Language attitudes of Moroccan university students

Carolyn Jane Anderson. Honors thesis, Swarthmore College.

PDF

2013

Language Ideology and Human Rights Doctrine in Morocco.

Carolyn Anderson. Talk presented at New Ways of Analyzing Variation (NWAV) 42.

Handout