Use RAPIDS on a single GPU
This notebook describes a machine learning training workflow using the famous NYC Taxi Dataset. That dataset contains information on taxi trips in New York City.
In this exercise, we attempt to answer this classification question:
based on characteristics that can be known at the beginning of a trip, will this trip result in a high tip?
RAPIDS is a collection of libraries which enable you to take advantage of NVIDIA GPUs to accelerate machine learning workflows. This exercise uses the following RAPIDS packages to execute code on a GPU, rather than a CPU:
cudf: data frame manipulation, similar to
cuml: machine learning training and evaluation, similar to
The code below loads the data into a
cudf data frame. This is similar to a
pandas dataframe, but it lives in GPU memory and most operations on it are done on the GPU.
import cudf taxi = cudf.read_csv( "https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-01.csv", parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"], )
Many dataframe operations that you would execute on a
pandas dataframe also work for a
Now that the data have been prepped, it’s time to build a model!
For this task, we’ll use the
cuml. If you’ve never used a random forest or need a refresher, consult “Forests of randomized trees” in the
scikit-learn documentation. We cast to 32-bit types for compatibility with older versions of
X = taxi[["PULocationID", "DOLocationID", "passenger_count"]].astype("float32").fillna(-1) y = (taxi["tip_amount"] > 1).astype("int32")
from cuml.ensemble import RandomForestClassifier rfc = RandomForestClassifier(n_estimators=100)
_ = rfc.fit(X, y)
We’ll use another month of taxi data for the test set and calculate the AUC score
taxi_test = cudf.read_csv( "https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-02.csv", parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"], ) X_test = taxi_test[["PULocationID", "DOLocationID", "passenger_count"]].astype("float32").fillna(-1) y_test = (taxi_test["tip_amount"] > 1).astype("int32")
from cuml.metrics import roc_auc_score preds = rfc.predict_proba(X_test) roc_auc_score(y_test, preds)
Need help, or have more questions? Contact us at:
- On Intercom, using the icon at the bottom right corner of the screen
We'll be happy to help you and answer your questions!