I’m actually not too familiar with how Impala reads from S3 (as I’ve only used it on-premise) — but I can only guess that this article is not relevant to the S3 case. Unlike Impala over HDFS, there is no optimization for data locality when reading from S3 — so the hotspotting is probably not an issue there.

--

I like data-backed answers

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store