Final Project
The final project for our course is designed to allow you to explore a topic of your choice and apply the concepts we’ve learned throughout the course. You will have the opportunity to select a topic that interests you, conduct research and source relevant data, analyze the data, and present your findings at one of our final class sessions.
Selecting a Topic / Research Question
You are free to choose any topic that interests you, provided that you can find relevant data to analyze. Finding a dataset is crucial, and obtained and cleaning data can often be the most time-consuming part of any data analysis project. To mitigate this challenge, it is recommended that you search for well-curated and documented datasets that are readily available online before you finalize your topic.
A good place to start is to check Google Dataset Search or Kaggle for datasets that are relevant to your topic. There are also many other sources of datasets, such as government databases, academic repositories, and other open data platforms. For example, the Israeli Central Bureau of Statistics (CBS) has a wealth of data available on various topics, which can be found here. Sports data availability is sometimes limited, depending on the sport and the league, but leagues are increasingly making data available.
Once you have a general topic in mind, and confirm that you can find relevant data, you can start to refine your research question. A good research question should be specific, measurable, and generate interesting insights:
- Specific: The question should be focused and not too broad. For example, instead of asking “How does war affect the economy?”, you might ask “How does the outbreak of war in a country affect its GDP growth rate in the following year?”
- Measurable: The question should be answerable with data. For example, instead of asking “Is climate change real?”, you might ask “What is the trend in global average temperatures over the past 50 years?”
- Insightful: There are lots of questions that can be answered with data, but not all of them are interesting or insightful. A good research question should lead to new insights or a better understanding of the topic. For example, rather than asking “Who has the most goals in the Premier League?”, you might ask “How does the presence of other star players on a team affect the goal-scoring performance of a player in the Premier League?”
As we talk about uncertainty quantification and hypothesis testing throughout the course, you may want to consider how they can be applied to your research question.
Expectations
The first main goal is that your project should apply some of the uncertainty-related concepts we’ve covered in the course. Hypothesis testing, confidence intervals, and prediction intervals are all fair game.
The second main goal is that you should learn something new about your topic! I want this course and this project is to make you feel empowered to use data to answer questions and gain insights. You should aim to uncover new information or perspectives on your topic that you didn’t know before.
Of secondary importance is that you should work on the skills surrounding the presentation of your findings. This includes thoughtful and professional-looking data visualizations, clear and concise explanations of your methods and results, and a well-structured presentation that both informs and engages your audience. 1
Presentation
You will present your findings in 20-25 minutes at one of our final class sessions. I am hoping that some of the program organizers (and hopefully other fellows and mentors) will be able to attend. You are expected to prepare slides for your presentation, which should include:
- An introduction to your topic and research question
- A description of the data you used
- An explanation of your methods and analysis
- A summary of your findings and insights (with specific statistics and supporting data visualizations)
- A conclusion that ties everything together and discusses the implications of your findings
Timeline
The exact dates will depend on how quickly we get through course material, but roughly speaking, the recommended timeline is as follows:
- 7/18 - 7/21: Choose a topic, find relevant data.
- 7/22 - 7/25: Refine your research question start with data cleaning and exploratory data analysis.
- 7/28 - 8/1: We will have covered the main course material by this point, so you can start to apply the concepts we’ve learned to your analysis.
- 8/2 - 8/5: Nearly full-time work on your project. You should be basically done with your data analysis by the end of this period.
- 8/6 - 8/7: Prepare your presentation slides and practice your presentation
- 8/7 or 8/8: Present your findings in class.
Footnotes
It is of course exceptionally difficult to do this in your second or third language. Don’t worry about it – you’re not being graded and all that matters is that you do your best to communicate your ideas.↩︎