Example Report Template for a Data Analysis Project
The structure below is one possible setup for a report stemming from a data analysis project. It loosely follows the structure of a standard scientific manuscript. Adjust as needed. You don’t need to have exactly these sections, but the content covering those sections should be addressed.
Departmet of Microbiology, University of Georgia, Athens, GA, USA.
Department of Population Health, University of Georgia, GA, USA.
University of Georgia.
\(*\) These authors contributed equally to this work.
\(\land\) Corresponding author: some@email.com
\(\dagger\) Disclaimer: The opinions expressed in this article are the author’s own and don’t reflect their employer.
1 Summary
This project was part of an exercise for the MADA course to learn about data cleaning and the READY workflow.
2 Methods
Describe your methods. That should describe the data, the cleaning processes, and the analysis approaches. You might want to provide a shorter description here and all the details in the supplement.
Data included height, weight, gender, pets owned, and number of books read in the last year. Data was cleaned by removing individuals from the set with missing data. Various plots and tables were made to explore the data. Data was eventually fit to a linear model for final analysis.
2.1 Data acquisition
Data was made up by Esther Palmer in order to have something for this exercise.
2.2 Data import and cleaning
Packages used:
library(readxl) #for loading Excel fileslibrary(dplyr) #for data processing/cleaning
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr) #for data processing/cleaninglibrary(skimr) #for nice visualization of data library(here) #to set paths
Use a combination of text/tables/figures to explore and describe your data. Show the most important descriptive results here. Additional ones should go in the supplement. Even more can be in the R and Quarto files that are part of your project.
Note the loading of the data providing a relative path using the ../../ notation. (Two dots means a folder up). You never want to specify an absolute path like C:\yourname\yourproject\results\ because if you share this with someone, it won’t work for them since they don’t have that path. You can also use the here R package to create paths. See examples of that below. I generally recommend the here package.
Table 1: Data summary table. All caption text goes here.
skim_type
skim_variable
n_missing
complete_rate
character.min
character.max
character.empty
character.n_unique
character.whitespace
factor.ordered
factor.n_unique
factor.top_counts
numeric.mean
numeric.sd
numeric.p0
numeric.p25
numeric.p50
numeric.p75
numeric.p100
numeric.hist
character
Pets_owned
0
1
3
6
0
3
0
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
factor
Gender
0
1
NA
NA
NA
NA
NA
FALSE
3
M: 4, F: 3, O: 2
NA
NA
NA
NA
NA
NA
NA
NA
numeric
Height
0
1
NA
NA
NA
NA
NA
NA
NA
NA
165.666667
15.97655
133
156
166
178
183
▂▁▃▃▇
numeric
Weight
0
1
NA
NA
NA
NA
NA
NA
NA
NA
70.111111
21.24526
45
55
70
80
110
▇▂▃▂▂
numeric
Number_books_read
0
1
NA
NA
NA
NA
NA
NA
NA
NA
7.777778
11.08803
0
2
3
6
32
▇▁▁▁▁
3.2 Basic statistical analysis
To get some further insight into your data, if reasonable you could compute simple statistics (e.g. simple models with 1 predictor) to look for associations between your outcome(s) and each individual predictor variable. Though note that unless you pre-specified the outcome and main exposure, any “p<0.05 means statistical significance” interpretation is not valid.
Figure 1 shows a scatterplot figure produced by one of the R scripts.
Figure 1: Height and weight stratified by gender.
3.3 Full analysis
Use one or several suitable statistical/machine learning methods to analyze your data and to produce meaningful figures, tables, etc. This might again be code that is best placed in one or several separate R scripts that need to be well documented. You want the code to produce figures and data ready for display as tables, and save those. Then you load them here.
Example Table 2 shows a summary of a linear model fit.
Table 2: Linear model fit table.
term
estimate
std.error
statistic
p.value
(Intercept)
149.2726967
23.3823360
6.3839942
0.0013962
Weight
0.2623972
0.3512436
0.7470519
0.4886517
GenderM
-2.1244913
15.5488953
-0.1366329
0.8966520
GenderO
-4.7644739
19.0114155
-0.2506112
0.8120871
4 Discussion
4.1 Conclusions
It seems that reading books is negatively, although not signifigantly correlated with height.