[{"content":"A couple of months ago, I talked to the graduating class of the MA in Computational Social Science program at the University of Chicago. As a recent alumna, and member of the program\u0026rsquo;s guinea pig cohort, I was excited to share my experiences from the last two years of being out in the world, and I figured a lot of what I shared with them is potentially applicable to a wide variety of data science job seekers.\nThere are literally hundreds of articles, listicles and essays about how to navigate the data science job market written by people with far more experience and wisdom than I have to offer. I will add my data point to the mix. A couple of things specific to my background that have shaped my experiences — I am a foreign national living and working in the United States, I have a background in economics and public policy, was a data scientist in a research lab at a university, and have recently made the shift to an industry position.\nFind your competitive advantage. Data science jobs have truly grown in scope and variety over the last few years, so it would serve you well to really examine where your competitive advantage lies. Allow job descriptions to help you make this determination — be wary of postings that ask for thirty-four years of experience with machine learning, natural language processing, data engineering, statistics and the ability to assemble a computer from a couple of chips and scrap metal. This likely means that the hiring team isn\u0026rsquo;t entirely clear as to why they want to hire someone. Clear, descriptive, and honest job descriptions will give you the information you need to gauge for yourself whether you are best poised to do the thing they want you to do.\nHere are a couple of descriptions I like a lot — I can\u0026rsquo;t remember where they\u0026rsquo;re from, unfortunately, so I can\u0026rsquo;t shout out to the thoughtful managers that wrote them.\nExamine closely for fit. It is such a cliché but also entirely true that interviews are supposed to be a two-way street. You\u0026rsquo;re going to be spending 40–50 hours a week doing this thing, so you may as well ask for all the information you need, and really evaluate if this job is a good fit for you. Asking these questions also allows the interviewer to get to know you better, and it gives them more information to assess you on. When I interviewed right out of my masters, I was second-guessing having turned down a cushy job, in a precarious visa situation, and on the verge of running out of health insurance — this made me nervous and underconfident, and I ended up saying whatever I thought the interviewer wanted to hear. To no one\u0026rsquo;s surprise, this was not an effective strategy.\nWhen I interviewed for my current job, here are some questions I asked my potential manager. A good time to ask these questions is when you\u0026rsquo;ve been offered the job, or are pretty sure you\u0026rsquo;ll be — these should help you make your decision.\nWhat will I be working on for the first 3–6 months? Who is the consumer of our team\u0026rsquo;s work? What was this team\u0026rsquo;s last successful project? What is the team\u0026rsquo;s engagement with open source development? What resources will I have access to for continued education? Here are some issues I would recommend discussing openly if you\u0026rsquo;re about to accept a research data science (or an econ pre-doc) position.\nIs this gig a stepping stone to grad school? What authorship ambitions do I have? What is the extent of my collaboration with the PI, with students, and with research staff? How much infrastructure development am I signing up to do? What stage of a project will I be working on? Read between the lines closely to estimate the division of time across data collection, experiment design, data cleaning, grant-writing, and paper-writing. It can help to have an online presence. I hesitate to bring this up, because it certainly does favour individuals with pre-existing privilege, with the time, energy and resources to dedicate to cultivating a professional presence online, outside of work. It is substantially more difficult for primary caretakers, people working multiple jobs, older candidates, people with interests and responsibilities outside their careers, people for whom it might not be entirely safe to be forthcoming with information in public. These advantages also distribute very clearly across race, gender, and class lines. It risks creating the Red Queen\u0026rsquo;s race — where you have to run faster and faster to stay in the same place; when enough candidates have something \u0026rsquo;extra\u0026rsquo; to offer, it is no longer \u0026rsquo;extra\u0026rsquo;, it becomes what is expected.\nHaving said that, an online presence can be a low-stakes way to share your work, practice writing and communication, and generally allow a glimpse into the way you approach problems. If you\u0026rsquo;re in school now, it\u0026rsquo;s very likely that with a few extra hours of effort, several of your class projects can be repurposed into simple blog posts. At minimum, it helps when it is easy to google you — even just a concise and updated LinkedIn profile is helpful. If you are able to devote capacity to it — it can offer a long-form argument for why someone should hire you — it is substantially more information than a resume, and unlike an interview, you have full control over what you\u0026rsquo;re discussing, and how. Simple landing pages, elaborate blogs, well-documented Github READMEs, detailed LinkedIn profiles are all valid!\nPractice talking about your work. You don\u0026rsquo;t have to be a drag and talk about your work all the time, but it is useful to grow accustomed to talking about your work informally. Practice talking about your work to people who are not your advisor, manager, or collaborator — talk to folks on the fringes of your discipline, and those entirely outside it. This will help you figure out what aspects of your job are more interesting to talk about, provoke curiosity, and follow-up questions, as opposed to what felt most challenging to do. Most of us have a tendency to talk about the difficult work because we\u0026rsquo;re understandably proud of having done it, but it might not be the best reflection of our abilities, or help us have an animated and engaged conversation — which is ultimately what we want our interview to be.\nLearning to talk about my work has been difficult for me — I love what I do, but I\u0026rsquo;m usually loath to talk about it when I\u0026rsquo;m off the clock. I often explain things to my mother when I want a sanity check — she\u0026rsquo;s smart, willing to listen because I\u0026rsquo;m her daughter, and from a completely different discipline, so I\u0026rsquo;m able to figure out when something that seems obvious to me, does indeed need to be explicitly stated. Learn to use context and situational cues to figure out whether results-focused, methods-focused or process-focused explanations are best suited to exhibit your work.\nIf you have a connection, use it. Again, this is another thing that I wish wasn\u0026rsquo;t the case, but unfortunately, it is. Credible referrals from people who know you in a capacity that allows them to speak to your skills, knowledge, and work ethic, can go a long way. At the very least, it usually gets a pair of human eyes on your application. I\u0026rsquo;ve found that academic research positions aren\u0026rsquo;t always advertised, so if there are any professors or labs you want to work with, just reach out — your worst outcome is that someone hears that their work is admired.\nIf you\u0026rsquo;re asking for a referral or even just for information, ask NICELY. Remember that you\u0026rsquo;re not entitled to their time and energy. Ask specific, bounded questions (don\u0026rsquo;t say \u0026ldquo;hey, can you tell me about this job?\u0026rdquo; — they\u0026rsquo;re not obligated to guess what you want to know and write you an essay about their job). Offer multiple channels of communication, at their convenience. You don\u0026rsquo;t need to, and shouldn\u0026rsquo;t, grovel, but you definitely should be kind and polite! Send a thank-you note later.\nIf you\u0026rsquo;re cold-applying, really optimize your resume by matching keywords to the job description. Include a basic cover letter, even if it is optional. Don\u0026rsquo;t hesitate to prune your resume ruthlessly to really foreground your most relevant experiences.\nLeverage your social science background. I went to college in an institute of technology where as an Economics major, I was basically regarded as a Poetry major (zero shade at Poetry, but you know what I mean). Partly because \u0026rsquo;technology\u0026rsquo; was seen as so distinct from social science during undergrad, I was surprised by how many things that are now packaged as \u0026lsquo;data science\u0026rsquo; are decades old statistical concepts applied commonly across economics, political science, and quantitative sociology.\nThe term data science is relatively recent, but social scientists really have been using these ideas forever. We have long traditions of analyzing surveys, using econometrics to draw conclusions about populations from samples, a wide variety of experiments to make causal claims. Forecasting, time series analysis, experiment design, randomized trials (A/B testing? it\u0026rsquo;s a remix) are all completely within the realm of tools quantitative social scientists use to understand the world.\nHere are some podcasts/talks by very successful data scientists with social science educations that touch on this issue:\nSean J Taylor, of prophet fame, talked to Casual Inference about using causal inference at Lyft. Chris Albon — political scientist, and maker of flashcards — discussed his approach to hiring at Devoted Health on the Google Cloud Platform Podcast. Stephanie Kirmer gave an insightful keynote address at #satRdaysChicago last year about what data science can learn from sociology. Python or R? This image from Kieran Healy never fails to make me laugh, but I really think the answer to this question is \u0026ldquo;do not gatekeep, comment your code, and learn SQL\u0026rdquo;.\nData science Twitter turns into a battleground every few months to hash this out (interspersed by tidy vs base R wars), and it almost never seems to yield anything useful at all. Use whatever has the most comprehensive infrastructure for the task you have at hand.\nPython is more ubiquitous, probably more production friendly, has an older machine learning/deep learning ecosystem, probably a more sophisticated set of NLP libraries, and has a solid set of tools to work with networks. I would assume that it\u0026rsquo;s easier for people with CS backgrounds to pick up Python quickly.\nR was created by statisticians to do statistical computing, and is excellent at this. The tidyverse ecosystem has made data exploration, wrangling, and visualizing easy and pleasurable. CRAN has a rapidly growing collection of cutting-edge libraries implementing methods in causal inference, spatial analysis, and time series statistics. My favourite thing about R is the wonderful community that surrounds it.\nThe important thing though, is that these differences are closing rapidly. We\u0026rsquo;re seeing more instances of R deployed to production, more statistical packages in Python, and reticulate to help the two languages play nicely with each other. Whichever you decide is your primary language, I would highly recommend being at least conversant in the other, and being able to read and review code in it.\nTalk respectfully and transparently about money. We\u0026rsquo;ve hopefully learned by now that a culture of never talking about money with your peers has played a part in allowing dramatic pay disparity to persist in a wide variety of contexts. Even Meredith Grey didn\u0026rsquo;t know she was being lowballed until her coworkers shared their incomes with her.\nIf it can inform a friend or coworker\u0026rsquo;s leverage or decisions, don\u0026rsquo;t hesitate to share details of your compensation. When I switched from a university position to a tech industry position, I would have underestimated my own market value by close to 30% if it hadn\u0026rsquo;t been for the transparency and wisdom of three women from R Ladies Chicago.\nAs a general rule of thumb, don\u0026rsquo;t disclose your previous salary while applying to a new job — it is illegal in several US states for potential employers to ask for this information. Push back against supplying a range until the end of the hiring process, when you have all the information you need. At this point, do your homework! Go on LinkedIn, Glassdoor, BuiltIn, talk to your networks, and arrive at a realistic range that you\u0026rsquo;d be happy to accept. If you\u0026rsquo;re in school, talk to your career services team about this — they probably have plenty of experience with this.\nIf you\u0026rsquo;re negotiating at your current job, I don\u0026rsquo;t have experience with this but I really like the idea of having a running document of all the work you\u0026rsquo;re accomplishing — Julia Evans calls this a brag document, and it sounds like a great way to have your work recognized.\nRemember, it\u0026rsquo;s one job! It\u0026rsquo;s important to find a role that\u0026rsquo;s a great fit for you, but remember it\u0026rsquo;s one job! Most likely, it\u0026rsquo;s 5–10% of your career. If your dream job doesn\u0026rsquo;t feel accessible to you at the moment, take the time to pick up the skills you need to have it. Making career changes is hard, and I\u0026rsquo;ve noticed that it\u0026rsquo;s easier to make changes along one axis at a time — geography / skillset / domain. For me, this looked like: econ policy job → masters → data science job in policy/research → data science job in tech. If you\u0026rsquo;re international, you\u0026rsquo;re already making a significant geography change.\nData science is no longer an undersupplied market — apart from the multitude of specific graduate programs that have come up in the past few years, other quantitative disciplines have woken up to the need to incorporate computational skills into their training. Vicki Boykis wrote a brilliant blog post on this, and it\u0026rsquo;s probably even more relevant now. The following are all real data science positions — even \u0026lsquo;data science\u0026rsquo; is a new, made-up word — we\u0026rsquo;re still negotiating the boundaries of what this actually means.\ndata analyst, research analyst, business analyst data scientist, research scientist, ML engineer experimentation, causal inference, people analytics decision scientist, systems engineer, applied statistician data manager, research associate, pre-doctoral research assistant I\u0026rsquo;m still learning to do this, but I think we\u0026rsquo;d all be better served if we let go of the perceived notions of hierarchy across these jobs. You should definitely think about leveling — your responsibilities and compensation should be in line with your skills and experience — but there are several valid and real ways to do data science, and it\u0026rsquo;s a waste of time to tell ourselves otherwise.\nTake risks that are calibrated to your specific situation. It is completely valid and legitimate to take a job that feels non-optimal in some ways to pay your bills, to maintain visa status, for health insurance, to be near a loved one. It is also valid to pass up on non-optimal offers if you have a few feet of runway before you must have a job. You don\u0026rsquo;t owe anyone an explanation — do what\u0026rsquo;s right for you!\n","permalink":"https://sushmitagopalan.com/posts/navigating-data-science-careers/","summary":"\u003cp\u003eA couple of months ago, I talked to the graduating class of the MA in Computational Social Science program at the University of Chicago. As a recent alumna, and member of the program\u0026rsquo;s guinea pig cohort, I was excited to share my experiences from the last two years of being out in the world, and I figured a lot of what I shared with them is potentially applicable to a wide variety of data science job seekers.\u003c/p\u003e","title":"Navigating Data Science Careers"},{"content":"I\u0026rsquo;ve been feeling with growing certainty that barring extreme circumstances and mental or physical illness, whether you feel happy on a daily basis is mostly a function of habits. I\u0026rsquo;ve noticed, anecdotally, that I tend to be happier on days I get adequate sunshine, exercise, don\u0026rsquo;t use my phone too much, actually watch tv as opposed to letting Grey\u0026rsquo;s Anatomy play while I scroll aimlessly through Twitter, etc. Conversely, I\u0026rsquo;m more likely to wake up early, go outdoors, be productive, read and engage with other human beings on days I\u0026rsquo;m feeling happier.\nLately, I\u0026rsquo;ve become curious about the Quantified Self movement; it advocates the pursuit of self-knowledge through numbers. Thanks to the fact that tech is ubiquitously (and creepily) tracking our every action, there\u0026rsquo;s plenty of data that\u0026rsquo;s already passively being collected about each of us. For data about myself, I pulled my daily steps off my iPhone using this app, my Netflix history, financial transactions (to track daily activities - everything costs $ except the lake, parks and libraries) and transcripts of WhatsApp conversations. To account for factors beyond my control, I pulled temperature, location and hours of daylight for each day of observation.\nIf I want to understand the predictors of my happy days, I need some measure of my daily happiness. Since I haven\u0026rsquo;t actively been tracking my moods with a mood journal or an app like Happify or Track Your Happiness, I have to wade back into this data and come up with some metric to use as a proxy.\nI live and work on a continent that\u0026rsquo;s very far away from my family and several friends - as a result, some of my closest and most meaningful relationships are conducted online. Since I often talk about my day over text message with people who weren\u0026rsquo;t part of my day, I wondered if the words I use to talk about my day could be mined to compute a happiness score, for want of a better term. I\u0026rsquo;m going to walk you through the exploratory analysis I did to figure out whether this made sense.\nTo do this, I first exported WhatsApp transcripts of conversations with four different friends to understand what this data looked like. All four are friends I have relatively substantive conversations with and are unlikely to have noise from forwards or tweets or cat pictures. I split up the text messages into sentences, and used functions from the sentimentr package to compute polarity scores for each sentence, using Baccianella, Esuli and Sebastiani\u0026rsquo;s (2010) SentiWord lexicon. With sentimentr, I was able to incorporate valence shifters; these account for words that negate, amplify, de-amplify, or overrule adjacent words with high polarity scores. Scores are grouped by date and sender, and then averaged. For instance, the average sentiment of the text messages I sent my friend Pranathi on Oct 21, 2018 was 0.32 .\nNeutral sentences like \u0026lsquo;How are you?\u0026rsquo; are coded as 0 with more positive sentences having positive sentiment scores and more negative sentences having negative scores. Here\u0026rsquo;s a quick look at the overall distribution of sentiment scores. As I\u0026rsquo;d expected, it looks roughly normal - most sentences are relatively neutral with more extreme sentiment scores becoming increasingly less frequent.\nI know that I\u0026rsquo;m susceptible to the Sunday blues, so I wanted to see if I tend to send more despondent messages on Sundays. Turns out, not true at all! The valance of my text messages are, in fact, most positive on Sundays! Perhaps this is because we\u0026rsquo;re all animatedly discussing the wild, exciting Saturday nights (lol) of our twenties, but silently sulking through laundry and chores when the full force of Monday-dread hits. Adding to the noise here is time-difference - this data represents at least three different time zones and so, my friends and I are often in different moods and frames of mind when we talk to each other.\nThe next thing I was curious to explore was - whose moods are getting reflected in the text? There\u0026rsquo;s naturally a fair degree of interdependence here - even when I\u0026rsquo;m having a great day, I\u0026rsquo;m unlikely to sound bouyant and pumped while talking to a friend who\u0026rsquo;s feeling miserable, and vice-versa. Conversations (ideally, anyway) involve listening and responding, and we could potentially hypothesize that the person initiating the conversation is likely to set the tone and the other person takes their cue from them. I\u0026rsquo;ve had the phone I\u0026rsquo;m using since Nov 2017, so I plotted daily averages over close to two years. I don\u0026rsquo;t really know what to make of these trends other than that I appear to use more positively scored words than my friends.\nTo break this down further, I split these trends up by friend and I rather like what this reveals! In the words of the ultimate object of my bicep-envy (Michelle Obama), \u0026ldquo;When they go low, we go high\u0026rdquo;. I should also probably find a way to model my penchant for quoting things that are completely out of context but, look! There are several instances where a local minimum in one person\u0026rsquo;s curve is accompanied by a local maximum in the other person\u0026rsquo;s curve of somewhat similar magnitude. It appears as if when one of us is having a rough day, the other is being reassuring and responding with positivity and optimism! Before we cue sunshine and rainbows, however, does this also mean that when one person is talking about something very positive, the other is bringing them down? Without accounting for lag, I can\u0026rsquo;t say for sure, but this also makes me wonder whether highly positive (or highly negative) information is more likely to be shared over a phone conversation.\nThese curves also appear to suggest that \u0026lsquo;moods\u0026rsquo; are somewhat persistant and cyclical. I looked at the valance of my text messages alone, at two different levels of granularity, to see if they roughly correlated with my own memories of my feelings at the time and if any patterns I was seeing were artifacts of ggplot smoothing functions. I won\u0026rsquo;t go into too much detail, but they don\u0026rsquo;t consistently correlate, confirming that the words I use are not only a function of my own moods but also a function of my friends\u0026rsquo; moods. I am, of course, an unreliable narrator of my own past and could be conflating the feelings I had while doing certain things with the feelings that surfaced when those actions bore fruit. Weekly cyclical patterns emerge when the \u0026lsquo;span\u0026rsquo; parameter for geom_smooth\u0026rsquo;s loess function is set to a lower value. This controls the degree to which smoothing occurs; lower values produce more wriggly lines.\nSo, should I use sentiment scores on my text messages as a proxy for how good my mood was on a given day? Shouldn\u0026rsquo;t I be trying a more sophisticated, tailored approach to classifying sentiment first? I\u0026rsquo;ve used a completely unaltered off-the-shelf dictionary-based implementation using SentiWord. This doesn\u0026rsquo;t take into account specificities of context, the fact that a non-trivial percentage of my text messages are transilterated Hindi, slang, expletives, emojis and GIFs. For instance, \u0026lsquo;Haha\u0026rsquo; and \u0026lsquo;Hahaha\u0026rsquo; are both rated as -0.59 while hahahahaha is rated 0. \u0026ldquo;What the fuck!\u0026rdquo; is rated -0.29 but \u0026ldquo;wtf!\u0026rdquo; is rated 0. Inexplicably, \u0026ldquo;This is amazing\u0026rdquo; gets a -0.23.\nBefore I whip tensorflow out though, I think I have enough reasons to believe that even with more accurate and specific sentiment classification, this is maybe not the best idea.\nMy text messages are not responses to my day alone, they are also responses to my friends\u0026rsquo; days. There\u0026rsquo;s a lot of missing data - I\u0026rsquo;m not texting my friends when I\u0026rsquo;m hanging out with them IRL or speaking to them on the phone. What information I choose to convey over text is not a choice I make randomly and is likely correlated with the types of emotions I\u0026rsquo;m swayed by at that point. It would be a fairly strong assumption to make that anyone is consistently authentic and honest in communicating their feelings. Some of my most cheery Instagram posts were made on rather miserable days. While I continue to think of alternatives to use as a proxy for my daily happiness, I\u0026rsquo;ve started to track my moods using the Daylio app. If you have other ideas, I\u0026rsquo;d love to hear them! Code for this can be found here.\n","permalink":"https://sushmitagopalan.com/posts/happiness-an-outcome-variable/","summary":"\u003cp\u003eI\u0026rsquo;ve been feeling with growing certainty that barring extreme circumstances and mental or physical illness, whether you feel happy on a daily basis is mostly a function of habits. I\u0026rsquo;ve noticed, anecdotally, that I tend to be happier on days I get adequate sunshine, exercise, don\u0026rsquo;t use my phone too much, actually watch tv as opposed to letting Grey\u0026rsquo;s Anatomy play while I scroll aimlessly through Twitter, etc. Conversely, I\u0026rsquo;m more likely to wake up early, go outdoors, be productive, read and engage with other human beings on days I\u0026rsquo;m feeling happier.\u003c/p\u003e","title":"Happiness: An Outcome Variable"}]