Skip to content

melmzv/huwiwiQuartoThesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A reproducible Quarto-based template for thesis at HU Berlin’s School of Business and Economics

Views Counter Views Counter

Why use this template?

This repository is a modular Quarto + LaTeX template for empirical theses at HU Berlin’s Institute of Accounting & Auditing and Finance Group. The template complies with the WiWi faculty and Institute guidelines. Simply clone and render template! ✨

Based on the TRR 266 TREAT template, this project combines Quarto for rendering, Python for data processing, and LaTeX for typesetting in one seamless framework - so you can focus on research rather than the technical details of formatting and data management.

This template also showcases a complete thesis example using the public Gapminder dataset: from pulling and preparing the data, to computing summary statistics and generating plots, all the way through to a fully rendered PDF. Use it as a blueprint for your own reproducible empirical thesis.

Why use Quarto?

Quarto lets you embed code, plots, tables, and output in a single document, making it a powerful alternative to traditional LaTeX-only workflows, which often require separate tools and manual steps to embed code results.

This makes Quarto ideal for empirical research where transparency, version control, and automation are essential, ensuring that your analysis is fully transparent and reproducible - from raw data to final PDF.

Still not impressed? You might want to read this blog post by Guillaume Dehaene, which explains why "Quarto is better" for modern academic workflows.

Why not just use Overleaf?

  • While Overleaf is a convenient cloud-based platform, it is still limited by its reliance on internet connection, and its servers can occasionally experience downtime quite frequently. ⚡
  • Plus, Zotero integration - necessary for managing references - is only available for premium users. And let’s be honest: we’d rather not pay for something we can do better locally. 💸
  • Quarto, on the other hand, allows you to work offline - ensuring your research is always accessible - and integrates with Git for full version control and collaboration (branching, pull requests), so you get the same collaborative benefits as Overleaf plus the safety of Git: if your laptop breaks, your work is safe.

Centered GIF

Where do I start?

"Talk is cheap. Show me the code."
Linus Torvalds

You start by setting up a few tools on your system:

  • If you are new to Python, follow the Real Python installation guide that gives a good overview of how to set up Python on your system.

  • Additionally, you will need to set up an Integrated Development Environment (IDE) or a code editor. We recommend using VS Code; please follow the Getting started with Python in VS Code Guide.

  • You will also need Quarto, a scientific and technical publishing system used for used for documenting this project. Please follow the Quarto installation guide to install Quarto on your system. I recommend downloading the Quarto Extension for enhanced functionality, which streamlines the workflow and ensures professional documentation quality for this project. You can find out more about the system here.

  • To render this Quarto thesis template to PDF, you need to install TinyTeX directly via Quarto using:

    quarto install tinytex
  • Finally, you will also need to have make installed on your system, if you want to use it. It reads instructions from a Makefile and helps automate the execution of these tasks, ensuring that complex workflows are executed correctly and efficiently.

    • For Linux users this is usually already installed.
    • For MacOS users, you can install make by running brew install make in the terminal.
    • For Windows users, there are few options to install make and they are dependent on how you have setup your system. For example, if you have installed the Windows Subsystem for Linux (WSL), you can install make by running sudo apt-get install make in the terminal. If not, you are probably better off googling how to install make on Windows and follow a reliable source.

📂 Next, explore the repository to familiarize yourself with its folders and their contents:

  • config: This directory holds configuration files that are being called by the program scripts in the code directory. We try to keep the configurations separate from the code to make it easier to adjust the workflow to your needs. In this project, pull_data_cfg.yaml file outlines the variables and settings needed to extract the necessary data from the Gapminder database. The prepare_data_cfg.yaml file specifies the configurations for preprocessing and cleaning the data before analysis, ensuring consistency and accuracy in the dataset and following the paper filtration requirements. The do_analysis_cfg.yaml file contains parameters and settings for performing the final analysis on the extracted earnings data.

  • code: This directory holds program scripts used to pull data from Gapminder directly using python, prepare the data, run the analysis and create the output files. Using pickle instead of Excel/CSV for output files is more preferable as it is a more Pythonic data format, enabling faster read and write operations, preserving data types more accurately, and providing better compatibility with Python data structures and libraries.

Pickling process illustration

The picture illustrates the process of serializing Python objects into a binary format (pickling) for storage in a file and deserializing them back into Python objects (unpickling) for reuse in analysis or other workflows.

  • data: A directory where data is stored. It is used to organize and manage all data files involved in the project, ensuring clear separation between external, pulled, and generated data. Go through the sub-directories and a README file that explains their purpose.

  • doc: This directory contains Quarto file (.qmd) that include text and program instructions for the paper rendering. The file is rendered through the Quarto process using Python and the VS Code extension, integrating code, results, and literal text seamlessly.

Tip

  • To easily present your thesis directory structure (e.g., to your supervisor), you can copy and paste it directly from within VS Code using the Extension.
  • Download the VSCode Extension for duplicating files. This will streamline your workflow by allowing you to duplicate files directly within Visual Studio Code, rather than manually copying and pasting in Finder (Mac) or File Explorer (Windows). 😉
  • Another fresh tip to synchronise vertical or horizontal scrolling in splitted view in VS Code. To enable it, type in the Command Palette the action name Toggle Locked Scrolling Across Editors. This is particularly useful when aligning the config file with the corresponding Python file, for example. 👩‍💻
  • Here is a new tip for references.bib file! If you're working with multiple citation formats, consider setting up Zotero's Quick Copy feature to directly copy BibTeX-formatted references into the bib file. This can save time and ensure consistency in your bibliography. Learn more about the Quick Copy feature 👉 here.
  • If the project requires word count, use the VSCode Extension to display the word count in the status bar. Very convenient!
  • Use the Data Wrangler Extension to view and analyze the data in parquet format, show column statistics and visualizations. It is particularly useful, if your computer can not manage large-sized pulled CSVs.

Note

  • 👉 In order to display and work with Latex 'tex' files, download the LaTeX Workshop Extension for VS Code. This extension provides a comprehensive set of tools for editing and compiling LaTeX documents, including syntax highlighting, code completion, and PDF preview capabilities.

You also see an output directory but it is empty. Why? Because the output paper is created locally on your computer.

How do I create the template output?

Assuming that you have followed steps above, this should be relatively straightforward.

  1. Clone the repository to your local machine by running the following command in your terminal:
cd /path/to/your/projects && \                       # navigate to your projects folder
git clone https://github.com/melmzv/huwiwiQuartoThesis.git && \  # clone the repo
cd huwiwiQuartoThesis                                # enter the repo directory 
  1. Open the repository in VS Code and launch a new terminal.
  2. It is advisable to create a virtual environment for the project:
python3 -m venv venv # Run this command in the terminal to create a virtual environment in the `venv` directory.
source venv/bin/activate # Activate the virtual environment on Linux or macOS.
# venv\Scripts\activate.bat # If you are using Windows Command Prompt.
# venv/Script/Activate.ps1 # If you are using Windows PowerShell and have allowed script execution.
  1. With an active virtual environment, you can install the required packages by running pip install -r requirements.txt in the terminal. This will install the required packages for the project in the virtual environment.

  2. Run make all in the terminal. I use the Makefile Tools extension in VS Code to run the makefile and create the necessary output files to the output directory. Otherwise, you can run the following commands in the terminal:

python code/python/pull_data.py
python code/python/prepare_data.py
python code/python/do_analysis.py
quarto render doc/paper.qmd
mv doc/paper.pdf output
rm -f doc/paper.ttt doc/paper.fff
  1. Eventually, you will be greeted with the "paper.pdf" file in the output directory. You have successfully used an open science resource and reproduced the analysis. Congratulations! 🚀

Note

I have saved a copy of rendered "paper.pdf" in the doc directory for your quick GitHub preview.

Setting up your Thesis Project

To start your own thesis project, follow same steps as in previous section but notice the following:

  1. Remove any files that you don’t need for your specific project.
  2. If your project requires access to a private database (e.g. WRDS) rather than open‐source Gapminder data, copy the file _secrets.env to secrets.env in the project main directory. Then edit the secret.env by adding your database credentials.

Caution

Ensure your database credentials are stored securely in secrets.env. Sharing this file or exposing its contents could compromise access to sensitive data.

Contributing

If you have suggestions for improvements, bug fixes, or new features, please feel free to fork repo and submit a pull request or open an issue. Contributions are welcome! 🙌

Licensing

This project utilizes the template used in collaborative research center TRR 266 Accounting for Transparency.

The repository is licensed under the MIT license. I would like to give the following credit:

This repository was built based on the ['treat' template for reproducible research](https://github.com/trr266/treat).

References

💡 If you’re new to collaborative workflows for scientific computing, here are some helpful texts:

About

Quarto & Python-based thesis template at the HU Berlin School of Business and Economics.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors