BIO

Hello! I’m a Ph.D. Candidate of Sociology at UC Berkeley and a Visiting Sociologist at Meta AI. An economic and cultural sociologist by training, I use mixed methods, including surveys, interviews, international fieldwork, and digital ethnographies to investigate whether socio-technical systems & AI technologies designed to improve our day-to-day lives meet their moral and social commitments. More specifically, my dissertation explores how monetary and gifting structures in the sharing economy shape economic and social exchange between users. My applied work on AI seeks to alleviate language accessibility issues online by designing ethical translation systems for those who speak low-resource languages.

At Meta AI, I work on making machine translation (see “No Language Left Behind” and “SeamlessM4T“) and large language models more socially and ethically oriented.

My research and ideas have been featured on TIME, NPR (Morning Edition, All Things Considered), WIRED, Reuters, The Verge, Vox, San Francisco Chronicle, Quartz, GQ, Mashable, HuffPost, and USA Today. 

REFEREED ARTICLES

Wang, Skyler. “Couch with Strings Attached: Reciprocal Orientations in Relational Work.” Revise and resubmit at American Sociological Review.

The emergence of the sharing economy has given rise to a class of noncommercial platforms that facilitate reciprocal exchanges aimed at cultivating solidarity and community-building. However, how moral are these relational spaces and what kinds of relationships do they stimulate? Using sexual exchange in Couchsurfing, the most intimate form of “sharing” in the gift-based hospitality network, I explore how relational work performed by hosts and guests allows economics and intimacy to coexist. To do so, I develop the concept of “reciprocal orientations”—a set of cognitive constructs comprising of generalized, positive direct, and negative direct reciprocities—to show not only how multiple reciprocities shape the formation of indebted and hierarchical relations in gift-based networks, but how interactions born out of these relational configurations produce both social and antisocial outcomes. Drawing on 96 interviews with Couchsurfers and four years of digital and in-person ethnographies, I demonstrate through online host-guest matching, in-person interactions, and user references that disparate invocations of reciprocal orientations produce four sexual forms varying in their degree of sociality—mutuality, transaction, leverage, and violation. The article ultimately concludes with implications for economic sociology and the study of alternative economies in the age of platform society.

Wang, Skyler. “How Platform Exchange and Safeguards Matter: The Case of Sexual Risk and Trust in Airbnb and Couchsurfing.” Review and resubmit at Computer Supported Cooperative Work and Social Computing—CSCW.

In contrast to platforms facilitating monetary exchange, reciprocity-based systems are often regarded as more social and trust-driven. However, reciprocity fosters indebtedness and relational ambiguities, which may lead to riskier interactions that jeopardize sociality. I test these claims by comparing two network hospitality platforms—Airbnb (monetary) and Couchsurfing (reciprocal). Using sexual risk, an underexplored form of platform danger, and drawing on interviews with 40 female Airbnb and Couchsurfing guests, I argue that Airbnb’s provision of binding monetary exchange and institutional safeguards increases user trust and reduces risk through three mechanisms: casting initial guest-host relation into a buyer-seller arrangement, stabilizing interactional scripts, and formalizing sexual violence recourse. Conversely, Couchsurfing’s facilitation of reciprocal exchange, alongside the lack of safeguards, increases sexual precarity both on- and off-platform. This study demonstrates how platforms with strong social motivations can produce harm and concludes with implications for designs that better serve vulnerable user populations.

Wang, Skyler*, Ned Cooper*, Margaret Eby, and Eun Seo Jo. “From Human-Centered to Social- Centered Artificial Intelligence: Assessing ChatGPT’s Impact through Disruptive Events.” (*equal authorship). Revise and resubmit at Big Data & Society.

Large language models (LLMs) and dialogue agents have existed for years, but the release of recent GPT models has been a watershed moment for artificial intelligence (AI) research and society at large. Immediately recognized for its generative capabilities and versatility, ChatGPT’s impressive proficiency across technical and creative domains led to its widespread adoption. While society grapples with the emerging cultural impacts of ChatGPT, critiques of ChatGPT’s impact within the machine learning community have coalesced around its performance or other conventional Responsible AI evaluations relating to bias, toxicity, and ‘hallucination.’ We argue that these critiques draw heavily on a particular conceptualization of the ‘human-centered’ framework, which tends to cast atomized individuals as the key recipients of both the benefits and detriments of technology. In this article, we direct attention to another dimension of LLMs and dialogue agents’ impact: their effect on social groups, institutions, and accompanying norms and practices. By illustrating ChatGPT’s social impact through three disruptive events, we challenge individualistic approaches in AI development and contribute to ongoing debates around the ethical and responsible implementation of AI systems. We hope this effort will call attention to more comprehensive and longitudinal evaluation tools and compel technologists to go beyond human-centered thinking and ground their efforts through social-centered AI.

Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews†, Can Balioglu†, Marta R. Costa-jussà†, Onur Celebi†, Maha Elbayad†, Cynthia Gao†, Francisco Guzmán†, Justine Kao†, Ann Lee†, Alexandre Mourachko†, Juan Pino†, Sravya Popuri†, Christophe Ropers†, Safiyyah Saleem†, Holger Schwenk†, Paden Tomasello†, Changhan Wang†, Jeff Wang†, and Skyler Wang†. 2023. “SeamlessM4T: Massively Multilingual & Multimodal Machine Translation.” arXiv:2308.11596v1. (†research and engineering leadership; under review at Nature)

TIME Magazine’s Top 200 Inventions of 2023

Select media coverage: Reuters, The Verge, TechCrunch, Axios, Mashable, Engadget, ZDNet

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems composed of multiple subsystems performing translation progressively, putting scalable and high-performing unified speech translation systems out of reach. To address these gaps, we introduce SeamlessM4T—Massively Multilingual & Multimodal Machine Translation—a single model that supports speechto-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations, dubbed SeamlessAlign. Filtered and combined with humanlabeled and pseudo-labeled data (totaling 406,000 hours), we developed the first multilingual system capable of translating from and into English for both speech and text. On Fleurs, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous state-of-the-art in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. On CVSS and compared to a 2-stage cascaded model for speech-to-speech translation, SeamlessM4T-Large’s performance is stronger by 58%. Preliminary human evaluations of speech-to-text translation outputs evinced similarly impressive results; for translations from English, XSTS scores for 24 evaluated languages are consistently above 4 (out of 5). For into English directions, we see significant improvement over WhisperLarge-v2’s baseline for 7 out of 24 languages. To further evaluate our system, we developed Blaser 2.0, which enables evaluation across speech and text with similar accuracy compared to its predecessor when it comes to quality estimation. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks (average improvements of 38% and 49%, respectively) compared to the current state-of-the-art model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Compared to the state-of-the-art, we report up to 63% reduction in added toxicity in our translation outputs. Finally, all contributions in this work—including models, inference code, finetuning recipes backed by our improved modeling toolkit Fairseq2, and metadata to recreate the unfiltered 470,000 hours of SeamlessAlign —are open-sourced and accessible here.

NLLB team, Marta R. Costa-jussà,* James Cross,* Onur Çelebi,* Maha Elbayad,* Kenneth Heafield,* Kevin Heffernan,* Elahe Kalbassi,* Janice Lam,* Daniel Licht,* Jean Maillard,* Anna Sun,* Skyler Wang,* Guillaume Wenzek,* Al Youngblood,* Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. 2022. “No Language Left Behind: Scaling Human-Centered Machine Translation.” arXiv:2207.04672 (*equal authorship). Revise and resubmit at Nature.

Select media coverage: The Verge, CNET, Quartz

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, we open source all contributions described in this work, accessible at this URL.

Yong, Zheng-Xin, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Samuel Cahyawijaya, Holy Lovenia, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Long Phan, Yin Lin Tan, and Alham Fikri Aji. 2023. “Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages.” In Proceedings of the Sixth Workshop on Computational Approaches to Linguistic Code-Switching (EMNLP 2023)

Media coverage: WIRED

While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research. The proliferation of Large Language Models (LLMs) in recent times compels one to ask: can these systems be used for data generation? In this article, we explore prompting multilingual LLMs in a zero-shot manner to create code-mixed data for five languages in South East Asia (SEA) — Indonesian, Malay, Chinese, Tagalog, Vietnamese, as well as the creole language Singlish. We find that ChatGPT shows the most potential, capable of producing code-mixed text 68% of the time when the term “code-mixing” is explicitly defined. Moreover, both ChatGPT’s and InstructGPT’s (davinci-003) performances in generating Singlish texts are noteworthy, averaging a 96% success rate across a variety of prompts. Their code-mixing proficiency, however, is dampened by word choice errors that lead to semantic inaccuracies. Other multilingual models such as BLOOMZ and Flan-T5-XXL are unable to produce code-mixed texts altogether. By highlighting the limited promises of LLMs in a specific form of low-resource data generation, we call for a measured approach when applying similar techniques to other data-scarce NLP contexts.

Mukherjee, Meghna*, Margaret Eby*, Skyler Wang*, Armando Lara-Millán, and Maya Earle. 2022. “Medicalizing Risk: How Experts and Consumers Manage Uncertainty in Direct-to-Consumer Genetic Health Testing.” PLOS ONE 17(8):1-20.
(*equal authorship)

Given the increased prevalence of direct-to-consumer (DTC) genetic health tests in recent years, this paper delves into discourses among researchers at professional genomics conferences and lay DTC genetic test users on Reddit communities to understand the contested value of genetic knowledge and its direct implications on health management. Harnessing ethnographic observations at 5 conferences and text-analysis of 52 threads, we find both experts and lay patient-consumers navigate their own versions of “productive uncertainty.” Experts develop genetic technologies to legitimize unsettled genomics as medical knowledge and mobilize resources and products, while lay patient-consumers turn to Internet forums to gain clarity on knowledge gaps that helps better manage their genetic risk states. By showing how the uncertain nature of genomics serves as a productive force placing both parties within a mutually cooperative cycle, we argue that experts and patient-consumers co-produce a form of relational medicalization that concretizes “risk” itself as a disease state.

Watson, Ryan J., Shannon Snapp*, and Skyler Wang*. 2017. “What We Know and Where To Go From Here: A Review of Lesbian, Gay, and Bisexual Youth Hookup Literature.” Sex Roles 77(11- 12):801-811.
(*equal authorship)

In this paper, we acknowledge and critique the absence of lesbian, gay, and bisexual (LGB) experiences in the recent proliferation of scholarship on “hooking up” among youth (aged 16 to 24). Although previous research has documented that LGB youth hookup at high rates (up to three-quarters of LGB youth), and oftentimes more than heterosexuals, the most basic aspects of hookups (e.g., motivations, experiences, and outcomes) have not been comprehensively explored. This is pertinent because young adulthood, in particular, is a time when young people explore their sexuality. Most scholarship on hooking up has focused on White heterosexual college students, mostly due to sampling constraints and impediments, and so we are left with a critical gap in our knowledge about LGB youth—a population that is typically at higher risk for sexual, mental, and emotional health issues. We begin by reviewing the literature on hooking up among heterosexual young adults as organized by four themes: hookup definitions/frequencies, contexts, motivations, and outcomes. We do this to explicitly highlight and contrast what little is known about LGB youth hookups. We then provide a research agenda that projects how future researchers can advance this area of scholarship and begin to fill its gaps, while considering the hookup experiences of diverse LGB youth.

Wang, Skyler. 2022. “Migrant Allies & Sexual Remittances: How International Students Change the Sexual Attitudes of Those Who Remain Behind.” Sociological Perspectives 65(2):328–349.

How does moving from a sexually conservative country to a liberal one alter the way international students think about homosexuality and same-sex rights, and how does this impact their communities back home? Drawing on survey data with 90 heterosexual Singaporean students studying at the University of British Columbia in Vancouver, as well as interview data with 17 students and 14 of their family members and friends who remained in Singapore, this study finds that despite having a broad spectrum of prior opinions, the majority of the student participants acquired increasingly accepting sexual attitudes after their relocation. Furthermore, many of them send these new conceptions as “sexual remittances” to their originating communities, changing the values of those who remain behind. This study helps lay the groundwork for further investigations of how engagements among international students and their social networks can contribute to evolving understandings of transnational sexuality and the globalization of culture.

NON-REFEREED CONTRIBUTIONS

“ChatGPT Is Cutting Non-English Languages Out of the AI Revolution.” WIRED, May 31, 2023.

“Will virtual dating outlast the pandemic?” Quartz, Published May 14, 2020.

“In Partisan 2019, Listing ‘Moderate’ Can Hurt You On Dating Apps.” HuffPost, Published Nov 8, 2019.

“Signaling Your Politics on Tinder Is a Messy Business.” GQ, Published Jun 24, 2019.

“The Affluent Homeless: A Sleeping Pod, A Hired Desk and A Handful Of Clothes.” National Public Radio (All Things Considered), Published Apr 23, 2019.

“What you need to know about online dating.” University of California Official FB Page, Published Feb 14, 2019.

“Love Data Week: Online dating expert talks data (and 5 tips for online dating).” UC Berkeley Library, Published Feb 12, 2019.

“The rise of the Tinder-themed wedding.” Mashable, Published Feb 10, 2019.

“’Dating Sunday’: The busiest day of the year for online dating is Jan. 6.” USA Today, Published Jan 5, 2019.

“Are Dating Apps Affecting Our Mental Health?” Wisconsin Public Radio, Published Nov 1, 2018.

“Millennials don’t want to own things. Startups are eager to help.” San Francisco Chronicle,  Published Sep 10, 2018.

“Finding Love in a Hopeless Place.” New Hampshire Public Radio, Published Aug 17, 2018.

“Be My (Rural) Valentine: Finding Love Outside of Town.” Jefferson Public Radio, Published Feb 14, 2018.

“What Makes Us Click: How Online Dating Shapes Our Relationships.” National Public Radio (Morning Edition), Published Jan 2, 2018.

“The Unlit Flame: My Tinder Misadventures.” The Ubyssey, Published Feb 10, 2016.

TEACHING

I teach an interdisciplinary university seminar called “Artificial Intelligence & Society: The Promises and Limits of Technological Futures” at UC Berkeley. The class is offered in Spring and Fall 2023.

Little of our lives today remains untouched by Artificial Intelligence (AI), which makes understanding its reach and influence on society increasingly pertinent. This course uses an interdisciplinary approach to critically dissect AI’s origins, proliferation, and ubiquity from social, political, and philosophical angles. We explore questions such as: what makes intelligence of this kind ‘artificial,’ and how does it differ from other types of intelligence (such as those embodied by humans or animals)? What is the relationship between AI, natural language processing, machine learning, big data, and algorithms? Why is creating AI systems that align with human values so challenging? How can we critically examine the production processes of generative or large language models such as ChatGPT to understand who they help and leave behind? What gives AI so much political sway, and how can policy and AI governance alleviate the problems it engenders? By incorporating academic research, sci-fi literature, films, and a variety of guest lectures by AI practitioners, this course offers a dynamic look at the promises and limits of AI in delivering a utopic technological future.

What Makes You Click: Online Dating in the Age of Modern Romance” is another course I teach at Cal. The course, once featured on NPR,  focuses on using sociological perspectives to help students understand the broad cultural patterns and implications of an ever-evolving platform and data-driven orientation toward relationship formation. This class was offered in Spring 2021 and Summer 2021.

In Summer 2022, I taught another original course, “The Give and Take: Sociology of the Sharing Economy.” Please reach out to me if you would like the syllabus of any of these classes.

In recognition of my teaching efforts, I was awarded the Herbert Blumer Fellowship for Excellence in Teaching, the Outstanding Graduate Student Instructor Award, the Certificate in Teaching and Learning in Higher Education, and the Teaching Effectiveness Award for Graduate Student Instructors. Click to read the qualifying essay for this last award, “Going Public: Designing Writing Assignments with Social Impact.

CV


Curriculum Vitae

Click to view or right-click and “save link as” to download.