Why are you doing your data project? This is the key question of the second step in The Data Lifecycle: Motivation. All data projects have a driving force behind them, often many at once. A clear, transparent definition of all the reasons why a project is being done is the route towards greater equity and better data science.


Data teams often spend a lot of time talking about how they will implement data science, which methods they will use, what datasets and features are best, what results they will need. But often very little time talking about why they will do each of these things. And why you and your team design and produce a data product largely determines all the next steps.


A first attempt to define a project’s motivation might centre around the subject of the questions being asked. Something like: “Our study about access to clean water is motivated by the desire to increase access to clean water”. No question, what you want answers about is at the core of what motivates a project, however, this type of obvious and shallow definition is a huge hindrance to getting clarity about what your project is actually about. Let’s expand our definition.


Let’s take a step back and talk about money. How does step 1 of the data life cycle: Funding affect step 2 of the data life cycle: Motivation? Data projects are often completely framed by pragmatic restrictions around money. Who is funding a project can have major impacts. How invested they are in a certain kind of result, how the money is allocated, and even the schedule in which it is distributed affect the ‘why’. “Our study about access to clean water is motivated by the desire to increase access to clean water and is funded by the municipal government” is a very different project than “Our study about access to clean water is motivated by the desire to increase access to clean water and is funded by a water treatment corporation”. There isn’t a better or worse comparison here, the motivation step of The Data Lifecycle isn’t about deciding which motivations are good or bad, it’s about clarifying what they are.


Next, let’s incorporate time. Time is one of the strongest motivators of data projects. The deadlines that a project must hit will impact methodology, scope, analysis, and publication, but it’s rarely talked about transparently. Our hypothetical water study probably has three or more layers of motivating timelines. How urgent is the water access for study participants? When does the report have to be given to the committee/board/stakeholders? How long until you need results to secure an additional round of funding? Again, in no way are we suggesting that a short-term project is inherently worse than a 70-year study, it’s just that being transparent about your internal and external timelines is going to help the design of the project, the understanding of your audience, and the satisfaction of your stakeholders. “Our study about access to clean water is motivated by the desire to increase access to clean water and will be completed in time for a report to be delivered to the committee in October before the new budget is voted on”.


Our initial definition based on the purpose of the project was a little shallow because we didn’t ask why we chose to ask these questions. A few basic questions that might get you started in this area include: How was the subject of the project chosen? Is this data project in some way required (compliance, funding renewal, oversight, etc)? Who stands to gain from this project – both externally (people getting access to water) and internally (staff, the organization, etc.)? Sometimes the internal motivations will be too sensitive to publicize. We encourage you to try anyway as you might be surprised at the positive outcomes of a little uncomfortable transparency. Either way, in order for your project to not be held up by a bunch of unspoken yet overwhelming criteria, your team needs to be aware of all the reasons behind the ‘why’ of a project. “Our study about access to clean water is motivated by the desire to increase access to clean water, prompted by a recent health crisis in the area, with the additional long-term goal of increasing the percentage of the municipal budget allocated to water treatment studies”.


What are the data product goals of this project? Even the most academic of projects would do well to consider their work in terms of ‘data products’. At the end of the project are you hoping for a specific result (this is the majority of all data projects)? Will you publish the results no matter what they are? Are the results intended to persuade, inform, make a decision, or confirm an existing result or theory? How will the results be communicated? Who are your audiences for the project? All of these factors about the motivation of the project impact the way the project is carried out, but they often go undiscussed or worse intentionally buried. “Our study about access to clean water is intended to persuade the local government and community about the importance of a new water treatment plant, and therefore intends to find whatever measurable benefits exist while maintaining as rigorous a scientific impartiality as possible. The results will be communicated in a print report to the committee with an associated presentation, but also an interactive video for community members.”


How to Use Your Motivation Information


At We All Count, we think that the best tool for data science projects is a ‘Motivation Statement’ comprised of all the information available about the ‘why’ of any data project. Combining the example sentences from above and adding the remaining information from the We All Count Motivation Checklist into a couple of paragraphs would set you apart immediately from the closed facade of most data projects. Most likely you will have internal and external versions of the statements, but make sure that you have them because your team can’t design a project effectively without this information.


A standard practice of publishing Motivation Statements will provide donors, participants, stakeholders, researchers, etc., a succinct explanation of the context of a project but also an easy test to see what is being omitted that might affect equity. “Why didn’t they fill out the funding section of their motivation statement?” or “Oh I see why this project had to be done in October, that makes sense.” or “I don’t see any of the internal motivations posted here, I want to know all the people who stand to gain from this project” or “oh cool, so they do intend to publish the results in a format the community can engage with”. Once you know all the ‘whys’ of your project, you are ready to move on top the next step in The Data Lifecycle.