I’m actually not too familiar with how Impala reads from S3 (as I’ve only used it on-premise) — but I can only guess that this article is not relevant to the S3 case. Unlike Impala over HDFS, there is no optimization for data locality when reading from S3 — so the hotspotting is probably not an issue there.

--

--

--

I like data-backed answers

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adir Mashiach

Adir Mashiach

I like data-backed answers

More from Medium

CS371p Spring 2022: Justin Milushev

Open CV-Image Processing

Rekindling the Mission: My Sensor Deployment Shadowing Experience in Kenya

Origin DNS error |