摘要

The Deepwater Horizon oil discharge in the Gulf of Mexico is considered to be one of the worst environmental disasters to date. The spread of the oil spill and its consequences thereof had various environmental impacts. The National Oceanic and Atmospheric Administration (NOAA) in conjunction with the Environmental Protection Agency (EPA), the US Fish and Wildlife Service, and the American Statistical Association (ASA) have made available a few datasets containing information of the oil spill. In this paper, we analyzed four of these datasets in order to explore the use of applied statistics and machine learning methods to understand the spread of the oil spill. In particular, we analysed the "gliders, floats, boats" and "birds" data. The former contains various measurements on sea water such as salinity, temperature, spacial locations, depth and time. The latter contains information on the living conditions of birds, such as living status, oil conditions, locations and time. A varying-coefficients logistic regression was fitted to the birds data. The result indicated that the oil was spreading more quickly along the East-West direction. Analysis via boosted trees and logistic regression showed similar results based on the information provided by the above data.