Posts

Showing posts from June, 2025
Interview questions for 10 yrs experienced for senior engineer role   OOP basics (for modular ETL jobs Generators & iterators (for memory efficiency) Write complex queries (joins, window functions, CTEs) Understand indexing, partitioning, performance tuning Partitioning & bucketing Shuffles, joins, broadcast variables Cache vs persist Avoiding wide transformations early Airflow or Azure Data Factory Understand DAGs, triggers, retries, dependencies Parquet, ORC, Avro vs CSV, JSON Compression (snappy, gzip) Partitioning strategies on cloud storage (S3, ADLS) Handling nulls, duplicates, schema mismatch Type casting, filtering bad rows Column-level transformations (e.g., timestamp to epoch, JSON flattening) Normalization/standardization Skew handling techniques (salting keys) Idempotency (no duplicates if rerun) Incremental loads (using watermark, last updated timestamp) Data partitioning (by date or id for scale) Logging, monitoring, alerting Backfill handling Retry strategies...

Python Scripts

 https://medium.com/@yashwanthnandam/12-time-saving-python-automation-scripts-you-didnt-know-you-needed-bc400ad28d0a