Interview questions for 10 yrs experienced for senior engineer role OOP basics (for modular ETL jobs Generators & iterators (for memory efficiency) Write complex queries (joins, window functions, CTEs) Understand indexing, partitioning, performance tuning Partitioning & bucketing Shuffles, joins, broadcast variables Cache vs persist Avoiding wide transformations early Airflow or Azure Data Factory Understand DAGs, triggers, retries, dependencies Parquet, ORC, Avro vs CSV, JSON Compression (snappy, gzip) Partitioning strategies on cloud storage (S3, ADLS) Handling nulls, duplicates, schema mismatch Type casting, filtering bad rows Column-level transformations (e.g., timestamp to epoch, JSON flattening) Normalization/standardization Skew handling techniques (salting keys) Idempotency (no duplicates if rerun) Incremental loads (using watermark, last updated timestamp) Data partitioning (by date or id for scale) Logging, monitoring, alerting Backfill handling Retry strategies...
Posts
Showing posts from June, 2025