why reproducible

The fundamental idea behind a robust, reproducible analysis is a clean, repeatable script-based workflow (i.e. the sequence of tasks from the start to the end of a project) that links raw data through to clean data and to final analysis outputs.

principles of a good analysis workflow

Any cleaning, merging, transforming, etc. of data should be done in scripts, not manually.
Split your workflow (scripts) into logical thematic units. For example, you might separate your code into scripts that
- 1. load, merge and clean data
- 1. analyse data
- 1. produce outputs like figures and tables
Eliminate code duplication by packaging up useful code into custom functions (Programming: write a function). Make sure to comment your functions thoroughly, explaining their expected inputs and outputs, and what they are doing and why.
Document your code and data as comments in your scripts or by producing separate documentation (see Programming and Reproducible reports).
Any intermediary outputs generated by your workflow should be kept separate from raw data. 结果输出应该和原始数据分开

file system structure

The data folder contains all input data (and metadata) used in the analysis.
The docs folder contains the manuscript.
The figs directory contains figures generated by the analysis.
The output_data folder contains any type of intermediate or output files (e.g. simulation outputs, models, processed datasets, etc.). You might separate this and also have a cleaned-data folder.
The scripts contains R scripts with function definitions.
The rmd folder contains RMarkdown and reports files that document the analysis or report on results.

good name principle

machine readable

Use delimiters to separate and make important metadata information
Avoid spaces, punctuation, accented characters and case sensitivity.
“_” to separate metadata to be extracted as strings later on
“-” instead of spaces or vice versa but do not mix

human readable

Ensure file names also include informative description of file contents
Adapt the concept of the slug to link outputs with the scripts in which they are generated

easy to order by default
Starting file names with a number helps.
For data, this might be a date allowing chronological ordering.
Make sure to use ISO 8601 format (YYYY-MM-DD) to avoid confusion between differing local dating conventions.
For scripts, you could use a number indicating the position of the scripts in the analysis sequence e.g. 01_download-data.R

workflow: Organising projects for reproducibility

why reproducible

principles of a good analysis workflow

file system structure

good name principle

Jixing Liu

workflow: Organising projects for reproducibility

why reproducible

principles of a good analysis workflow

file system structure

good name principle

Jixing Liu

使用 R 输出格式化的 Excel

如何拟合一条曲线

努力后的失败，才是诚实的失败

蝇王

如何阅读大量的学术论文, 而不发疯？

多标签分类问题

新药研发

Deep Work

The Hello World Of Neural Network

使用 R 分析可视化你的 iPhone 健康 APP 数据