← Events and Videos
May 21, 2021 | 11:00AM ET

Dask DataFrame groupby. Why it can fail and how to compensate.

Hugo Shi, Saturn Cloud

Dask DataFrame groupby operations are very common and very powerful. However due to the distributed nature of Dask DataFrames, they can fail in unexpected ways. This talk covers mitigation strategies for these problems, including using set_index to optimize data layout, and using split_out and split_every parameters to optimize computation.