Tutorial on Leveraging Princeton's Computer Networks for Managing Large Data Sets
At Princeton University's Center for Policy Research on Energy and the Environment (C-PREE), an intern has successfully utilised the university's Adroit computing cluster, funded by the High Meadows Environmental Institute (HMEI), for their project. The focus of the internship was to examine the effect of anomalous weather on economic activity, a topic of increasing importance in the face of climate change.
Adroit, a powerful computing resource, offers about 375 GB of RAM and can be accessed through both a graphical interface and command line. The intern found the graphical interface particularly useful for its ease of use in logistical tasks, data visualization, and familiar interface.
To gain access to Adroit, students are required to submit a registration form with their information. Once approved, they can connect to the cluster via secure shell (SSH) to submit and manage large data processing jobs. The cluster's parallel processing capabilities and job scheduling system, often SLURM or a similar workload manager, enable efficient handling of large datasets.
For the project, the intern used Adroit to formulate a model examining the impact of weather anomalies on economic activity using a large panel data set of American businesses. The data set, containing over 170,000 businesses and 400 data points for each variable, was extremely large, making Adroit an indispensable tool for the task.
The intern hopes that Adroit will prove to be just as valuable for other students in their internships and projects. They plan to continue using Adroit in future projects, finding it a powerful and helpful tool for estimating econometric models.
For those interested in accessing Adroit, the general procedure includes requesting access via Princeton OIT's HPC page or contacting your department's HPC coordinator. Once access is granted, users can transfer data with secure copy protocols, utilise job schedulers to run large-scale batch jobs, and access available modules or environments to set up software dependencies.
While specific onboarding details may vary, it is recommended to consult Princeton University’s OIT website or reach out to their HPC support team for the latest and most precise access instructions and usage best practices relevant to research projects involving large data sets. The internship at C-PREE serves as a testament to the potential of Adroit as a valuable tool for handling big data in future projects.
Data-and-cloud-computing technology played a crucial role in the intern's project at Princeton University's C-PREE, as they utilized the Adroit computing cluster for examining the effect of anomalous weather on economic activity. This powerful technology, accessible via secure shell (SSH) and offering a user-friendly graphical interface, enabled the intern to efficiently handle an extremely large dataset containing over 170,000 businesses and 400 data points for each variable.