MCQ Screening Questions for a Data Engineer
Use these 20 multiple-choice questions to quickly filter data engineer applicants, even if you're not a technical expert.
20 Knockout Questions for Data Engineers
| # | Question | A | B | C | D | Answer | Knockout Rule |
|---|---|---|---|---|---|---|---|
| 1 | What is an ETL pipeline? | A deployment process | Extract, Transform, Load — moving and preparing data | A cloud database | A frontend tool | B | Wrong = Hard Knockout |
| 2 | What is Apache Spark used for? | Frontend development | Large-scale distributed data processing | Database management | API development | B | Wrong = Knockout for big data roles |
| 3 | What is a data warehouse? | A physical storage room | A system for storing and analyzing large volumes of structured data | A file storage system | A NoSQL database | B | Wrong = Knockout |
| 4 | What is the difference between a data lake and a data warehouse? | No difference | A data lake stores raw data; a warehouse stores processed structured data | A warehouse is cheaper | A data lake is faster | B | Wrong = Knockout |
| 5 | What does partitioning a table in a database do? | Deletes old data | Divides a large table into smaller parts to improve query performance | Backs up the data | Encrypts the table | B | Wrong = Red flag |
| 6 | What is Apache Kafka used for? | Running ML models | Real-time data streaming between systems | Database management | Writing SQL queries | B | Wrong = Knockout for streaming roles |
| 7 | What is dbt (data build tool) used for? | Deploying applications | Transforming data inside the data warehouse using SQL | Managing cloud costs | Running containers | B | Wrong = Red flag for modern data stacks |
| 8 | What is a star schema in data modeling? | A cloud architecture | A data model with a central fact table surrounded by dimension tables | A database backup method | A streaming pattern | B | Wrong = Knockout |
| 9 | What is BigQuery? | A SQL editor | Google Cloud's serverless data warehouse | A data pipeline tool | An ML platform | B | Wrong = Knockout for GCP stacks |
| 10 | What does data lineage mean? | Cleaning data | Tracking the origin and movement of data through pipelines | Storing data in a lake | Encrypting data | B | Wrong = Red flag |
| 11 | What is the purpose of Apache Airflow? | Frontend deployment | Orchestrating and scheduling data pipelines | Writing SQL queries | Managing containers | B | Wrong = Knockout |
| 12 | What is data normalization? | Encrypting data | Organizing data to reduce redundancy and improve integrity | Backing up a database | Indexing tables | B | Wrong = Red flag |
| 13 | What is Snowflake? | A weather app | A cloud-based data warehousing platform | A data streaming tool | A BI tool | B | Wrong = Knockout for Snowflake stacks |
| 14 | What is a slowly changing dimension (SCD)? | A fast database | A method for handling changes in dimension data over time | A streaming tool | A schema type | B | Wrong = Red flag for senior data engineers |
| 15 | What is the role of a data catalog? | Writing SQL | Documenting and discovering available data assets in an organization | Running pipelines | Storing raw data | B | Wrong = Red flag |
| 16 | What does 'schema-on-read' mean? | Defining schema before storing | Applying schema when reading data, not when storing it | A database type | A normalization method | B | Wrong = Red flag |
| 17 | What is Redshift? | A monitoring tool | Amazon's cloud data warehouse service | A frontend framework | A data streaming tool | B | Wrong = Knockout for AWS stacks |
| 18 | What is data quality monitoring? | Deleting bad data | Continuously checking data for accuracy, completeness, and consistency | Backing up data | Encrypting pipelines | B | Wrong = Red flag |
| 19 | What is a medallion architecture? (Bronze, Silver, Gold) | A cloud cost model | A multi-layer data lake design pattern for progressively refining data | A database backup strategy | A streaming pattern | B | Wrong = Red flag for modern data platforms |
| 20 | What does ELT differ from ETL in? | Nothing | In ELT, data is loaded first then transformed inside the warehouse | ELT is faster always | ETL uses more storage | B | Wrong = Knockout |
"Asking about the difference between a data lake and a data warehouse has been a surprisingly effective filter for finding serious data engineers."
- Emily R., Hiring Manager