50 Basic GenAI Engineer Interview Questions
Posted on Tue 02 June 2026 in GenAI
A starter question bank for screening entry-level GenAI engineers. Grouped by theme, covering fundamentals through production concerns.
Fundamentals
-
What is generative AI vs discriminative AI? — Generative models learn to produce new data; discriminative models learn decision boundaries to classify or predict.
-
What is a large language model (LLM)? — A neural network trained on large text corpora to predict tokens, capable of broad language tasks.
-
Transformer architecture — Asks for a high-level grasp of attention-based sequence modeling without recurrence.
-
Attention mechanism — How a model weighs the relevance of different tokens when producing a representation.
-
Self-attention vs cross-attention — Self-attention relates tokens within one sequence; cross-attention relates one sequence to another.
-
Tokens and tokenization — How raw text is split into model-readable units and why the scheme matters.
-
Context window — The maximum number of tokens a model can attend to at once.
-
Embeddings — Dense vector representations of meaning, used for search, clustering, and retrieval.
-
Encoder vs decoder vs encoder-decoder — Architectural families and the task types each suits.
-
Pretraining vs fine-tuning — Broad foundational training vs targeted adaptation to a task or domain.
Prompting & Inference
-
Prompt engineering — Designing inputs to reliably steer model behavior.
-
Zero-shot vs few-shot prompting — No examples vs in-context examples to guide the output.
-
Chain-of-thought prompting — Eliciting intermediate reasoning steps to improve answers.
-
Temperature and top-p — Controls over randomness and the sampling distribution of generated tokens.
-
Greedy decoding vs sampling — Always taking the top token vs probabilistic selection for diversity.
-
System prompt — Instructions that set persistent behavior and role for the model.
-
Hallucinations — Why models invent plausible-but-false content and how to mitigate it.
-
Max tokens parameter — Caps output length and affects cost and truncation.
-
In-context learning — Learning a task purely from examples in the prompt, no weight updates.
-
Prompt injection — Adversarial inputs that hijack instructions and defenses against them.
RAG (Retrieval-Augmented Generation)
-
What is RAG and why use it? — Grounding generation in retrieved documents to reduce hallucination and add fresh knowledge.
-
Vector database — Storage and similarity search over embeddings.
-
Semantic vs keyword search — Meaning-based retrieval vs literal term matching.
-
Chunking and chunk size — Splitting documents for retrieval and the tradeoffs of granularity.
-
Cosine similarity — A measure of vector closeness used to rank retrieved items.
-
Embedding model in RAG — Converts queries and documents into the shared vector space.
-
Re-ranking — A second-stage scoring pass to refine retrieved candidates.
-
Evaluating a RAG system — Measuring retrieval relevance and answer faithfulness.
-
Hybrid search — Combining sparse (keyword) and dense (vector) retrieval.
-
Handling context overflow — Strategies when retrieved content exceeds the window.
Fine-tuning & Training
-
Full vs parameter-efficient fine-tuning — Updating all weights vs a small subset for cost savings.
-
LoRA — Low-rank adapters that fine-tune efficiently with few trainable parameters.
-
Quantization — Reducing numerical precision to shrink models and speed inference.
-
RLHF — Aligning models to human preferences via a reward signal.
-
Instruction tuning — Training on instruction-response pairs to improve task following.
-
Catastrophic forgetting — Loss of prior capabilities after further training.
-
SFT vs RLHF — Supervised fine-tuning vs preference-based reinforcement.
-
Fine-tune vs RAG vs prompting — When each approach is the right tool.
-
Data needed to fine-tune — Quantity, quality, and format requirements.
-
Overfitting — Memorizing training data and how to detect it.
Evaluation & Production
-
Evaluating LLM output quality — Approaches to judging open-ended generation.
-
Generative text metrics — BLEU, ROUGE, perplexity and their limits.
-
LLM-as-a-judge — Using a model to score outputs and its biases.
-
Reducing latency — Techniques like streaming, caching, and smaller models.
-
Controlling inference cost — Token budgeting, model selection, and caching.
-
Model distillation — Training a smaller model to mimic a larger one.
-
Guardrails — Constraints on inputs and outputs for safety and compliance.
-
Monitoring in production — Tracking quality, drift, cost, and failures.
-
Safety and ethical risks — Bias, misuse, privacy, and misinformation concerns.
-
Agent vs simple LLM call — Multi-step tool-using systems vs single-shot generation.