Juspay · Final Two Rounds · SDE Interview

Interview prep
curriculum,
Nilashis.

Include real-world scenarios and key questions to prepare for the SDE interview.

8 modules Concurrency — priority System Design — priority OS — good baseline
01
Solving coding problems
DSA — pattern recognition, core structures, thinking under pressure
Practice daily
Situations to practice — core structures
Given a list of orders arriving in a stream, find the most frequent order type in any window of last K orders.
Sliding window + hash map frequency tracking
You have a payment log. Find two transaction amounts that sum to a target fraud threshold.
Two pointer on sorted array, or hash set lookup
A delivery system logs events. At any point, find the median delivery time across all completed deliveries so far.
Two heaps — max-heap for lower half, min-heap for upper half
Build a system that evicts the least recently used session. Access and insert in O(1).
LRU Cache — doubly linked list + hash map, no library allowed
You're given a folder hierarchy as a tree. Find the deepest common folder between two files.
Lowest Common Ancestor on a binary tree
A dispatcher assigns tasks to workers. Some tasks depend on others. Find a valid execution order, or detect a cycle.
Topological sort using BFS (Kahn's algorithm)
Pattern recognition drill — do this before every problem
  • Read the problem. Before writing any code, write one sentence: "This is a ___ problem because ___." Forces you to name the pattern. This is your actual weakness — train it deliberately.
  • State your brute-force approach out loud first, then say what makes it slow, then optimize. Interviewers want to see you reason through complexity, not jump to the clever answer silently.
  • After solving, ask yourself: "What if the input is empty? A single element? All duplicates?" Edge case enumeration — catches bugs before the interviewer does.
Juspay-style problems to implement from scratch
  • Serialize a binary tree to a string, then reconstruct it exactly. No library.Tests tree traversal + string handling together
  • Given a dictionary of words and a sentence, split the sentence into valid words in all possible ways.Backtracking + memoization — Word Break II
  • Given K sorted lists of payment records, merge them into one sorted list.Heap-based merge — comes up in log aggregation at Juspay's scale
  • Given a matrix of 0s and 1s representing a city grid, count the number of isolated islands.BFS/DFS on a grid — tests your ability to apply tree traversal to non-tree shapes
  • Find the shortest path between two network nodes where each edge has a cost. Some edges can have negative cost.Bellman-Ford for graphs with negative weights
02
Operating Systems
How the machine underneath your code actually works
Good baseline
Situations to understand — not just memorize
Your Node.js server seems slow despite low CPU usage. What are the possible reasons?
Blocking I/O, context switching, memory pressure — process vs thread model
Two services are waiting for each other's resource and neither makes progress. How do you diagnose and fix it?
Deadlock — four conditions, detection via resource graph, fix via lock ordering
Your app is using 2GB RAM but the machine has only 1GB. It still runs. Why?
Virtual memory + demand paging — the OS brings pages from disk on access
A background job is taking CPU from your web server requests. How does the OS decide who gets to run?
CPU scheduling — priority queues, preemption, Round Robin
epoll lets a single thread handle 10,000 connections. How is that possible without threads?
I/O event notification — the kernel tells you which connections have data, you don't block per-connection
A zombie process is piling up on your server. What happened and how do you fix it?
Parent didn't call wait() on child's exit — process lifecycle, signal handling
Questions an interviewer will actually ask
  • What happens, step by step, when you run a program? Start from the shell command.fork → exec → loader → memory map → scheduler — trace the full path
  • Why is creating a thread cheaper than creating a process? What do they share?Stack is separate, heap/code/file descriptors are shared in threads
  • What is a page fault, and when is it normal vs a problem?Normal: first access to a page. Problem: thrashing — memory too small for working set
  • How does Node.js handle thousands of connections without spawning thousands of threads?Event loop + epoll — libuv sits between your JS and the OS kernel
  • Describe a scenario where you'd prefer multiple processes over multiple threads.Isolation — if one worker crashes, it shouldn't take the whole service down
03
Computer Networks
How data moves — from your app to the other side of the world
Active focus now
Situations to walk through until you can do it eyes closed
A user types "https://juspay.in" and presses Enter. Walk through every network event until the page loads.
DNS lookup → TCP handshake → TLS → HTTP request → response — the full stack
A video call drops quality when the network gets congested. Why doesn't it stop entirely like a file download would?
UDP — accepts packet loss, TCP would stall waiting for retransmit
Your API client sends 6 parallel requests to the same server over HTTP/1.1 and things get slow. What's happening?
Head-of-line blocking — only 6 connections allowed, queued behind each other
A user in Mumbai downloads a file from a US server. How does a CDN make this faster?
Edge node close to user serves cached content, reducing round-trip latency
Your WebSocket chat app works fine locally but messages get lost after 30 seconds of no activity in production. Why?
Idle connection timeout — load balancer kills it. Fix: heartbeat ping every 20s
A mobile client retries a failed payment request 3 times. The server receives all 3. What should the server do?
Idempotency key — you've already practiced this! Extend your mental model here.
Things you've already practiced — go deeper on these
  • WebSocket upgrade: describe the exact HTTP headers exchanged, and what the server must return. 101 Switching Protocols, Upgrade: websocket, Sec-WebSocket-Accept header
  • Resumable download: your client downloaded 60MB of a 100MB file and the connection dropped. What HTTP feature lets you resume? Range: bytes=62914560-, server responds 206 Partial Content with Content-Range
  • DNS: a fresh browser with no cache looks up "api.juspay.in" — what series of servers does it contact? Root server → .in TLD server → juspay.in authoritative server → returns A record
Questions to practice
  • Why does TLS need two round trips before data can flow? What does each step verify?Key exchange, certificate validation, session key agreement — then symmetric encryption
  • HTTP/2 multiplexes streams on a single connection. If the connection has packet loss, what goes wrong?Head-of-line blocking reappears at the TCP layer — HTTP/3 solves this with QUIC over UDP
  • How does exponential backoff with jitter work, and why does jitter matter?Without jitter, all retrying clients hit the server at the same moment — thundering herd
04
Concurrency & thread safety
Making code that runs correctly when multiple things happen at once
High priority for Juspay
Juspay's most common concurrency question: "here's code you wrote — now make it thread-safe." Practice writing broken code first, then fixing it. That's the real skill.
Code scenarios to implement from scratch
Two payment workers both read a balance of ₹500, both deduct ₹300, both write back ₹200. The real balance should be -₹100. Fix this.
Race condition on shared mutable state — mutex around read-modify-write
Build a job queue where producers add jobs and workers pick them up. Workers should sleep when there's nothing, not spin in a loop.
Blocking queue — mutex + condition variable, not busy waiting
A config object is read by 100 threads constantly and written once every 5 minutes. Locking every read is too slow. Fix it.
Read-write lock — many readers can proceed together, writer gets exclusive access
Thread A holds lock 1 and waits for lock 2. Thread B holds lock 2 and waits for lock 1. Nothing moves. Fix it.
Deadlock from unordered locking — always acquire locks in the same order across all threads
You need a config object that's initialized once when first needed and never again — and it must work correctly if 10 threads hit it simultaneously on startup.
Thread-safe singleton — double-checked locking with volatile / atomic flag
Build a rate limiter that allows a user a maximum of 100 API calls per minute, shared across multiple server threads.
Token bucket with mutex — threads decrement the token count atomically
The "convert to thread-safe" drill — practice this format
  • Take your LRU Cache from Module 1. Now make it safe for 10 threads reading and writing simultaneously. This is the single most likely Juspay concurrency question — DSA + concurrency merged.
  • Take a simple counter class with increment() and get(). Add thread safety without making get() unnecessarily slow. Atomic types for reads — locks only when writing. Or ReentrantReadWriteLock.
  • You have an event bus where components subscribe and publish. Add thread safety so subscribe() and publish() can be called from any thread. CopyOnWriteArrayList for subscribers, or synchronized blocks around the listener list.
Event loop situations (relevant to your Node.js background)
  • You have three async functions that must all finish before you proceed. How do you do this in Node.js without sequential await?Promise.all() — all three run concurrently, you await the array
  • A setTimeout with 0ms delay runs after your synchronous code finishes. Why? Explain the exact execution order.Call stack empties first, then event queue runs — microtask queue (Promises) before macrotask queue (setTimeout)
  • Your Node.js server blocks for 500ms on a CPU-heavy calculation. What happens to all incoming requests during that time?Event loop is stuck — single-threaded. Fix: move to worker_threads or break into smaller async chunks.
05
System Design
How to design something that handles millions of users without falling over
Priority area
Every time you design a system, go through five steps out loud: clarify requirements → estimate scale → draw the diagram → go deep on one component → name the trade-offs. Don't skip steps.
Full system designs to practice end-to-end
  • Design WhatsApp — you've done this, now go deeper. How does it handle message ordering when two people send simultaneously? How does it store 100B messages without a single DB? Sequence numbers per conversation, sharded message stores, Cassandra or similar wide-column store
  • Design the Juspay payment gateway. A merchant sends a charge request. Walk through every system the request touches until money moves. Idempotency key → fraud check → bank routing → ledger write → webhook to merchant
  • Design a URL shortener (like bit.ly) that handles 10,000 redirects per second. A short code must redirect in under 5ms. Cache in Redis with consistent hashing, DB as fallback, no redirects hit DB in hot path
  • Design a notification system that sends 50 million push notifications within 10 minutes of a product launch. Fan-out queue per user segment, workers pull from Kafka, batch to APNS/FCM
  • Design Redis from scratch. What data structures does it support internally? How does it survive a server restart? Hash tables, skip lists, RDB snapshots vs AOF append log — persistence trade-offs
Component-level situations — understand one deeply before moving on
Your cache is getting 100x more traffic than your DB on a normal day. Then you deploy and 1,000 requests all miss the empty cache simultaneously, hammering the DB. Fix it.
Cache stampede — mutex lock per key, or probabilistic early expiry
You add a new server to your cluster. How do you redistribute only 1/N of the keys instead of rehashing everything?
Consistent hashing — only keys between the new node and its predecessor need to move
Your payment service is down and orders are piling up. You want to process them once the service recovers, in order, without losing any. What do you use?
Durable message queue (Kafka) — producers write without waiting for consumer availability
A microservice is slow. You don't want slow requests to your DB to also slow down your cache reads. How do you isolate them?
Bulkhead pattern — separate thread pools / connection pools per downstream
Your DB has one writer and five read replicas. A user writes a record and immediately reads it but gets a 404. Why?
Replication lag — read-after-write consistency needs a sticky session or reading from primary after writes
Design a feature flag system that can toggle a feature for 0.1% of users, update the setting, and propagate to 10,000 servers in under a second.
Central config store → pub/sub push to all servers → local in-memory cache with TTL
06
Real-world engineering discussions
Systems the interviewer might bring up — be ready to reason about them
Payments — Juspay's actual domain, go deep here
  • A customer pays ₹10,000 for a flight. The network drops after the bank approves but before Juspay confirms to the airline. What happens? How do you prevent the customer from being charged twice?Idempotency key + two-phase status: "pending → confirmed" written atomically
  • You need to move ₹500 from Account A to Account B. The debit succeeds but the credit fails. How do you make sure neither account ends up wrong?Saga pattern — compensating transaction reverses the debit if credit fails
  • How does UPI route a payment from a customer's HDFC app to a merchant's Paytm account? What systems are in the middle?NPCI switch → issuer bank → acquirer bank → settlement at end of day
  • A fraudster is sending 500 payment attempts per second across different devices. How do you detect and stop this in real time?Velocity rules + ML scoring on device fingerprint + IP — block at API gateway before hitting payment logic
Caching & data consistency
  • Your cache shows a user's old account balance for 30 seconds after they top up. How do you fix this without making the system slow?Write-through cache — update cache and DB together on every write
  • You cache user sessions in Redis. Redis goes down. 5 million users get logged out. How do you architect against this?Redis cluster with replicas + local in-memory fallback with short TTL
  • A search feature queries the cache first, then the DB if not found. Attackers query millions of non-existent keys to hit the DB. Fix it.Bloom filter — check if a key could exist before querying anything
Distributed systems reasoning
  • The network between two data centers splits. Servers on each side can't talk to the other. Both sides keep accepting writes. When the split heals, what do you do with conflicting data?CAP theorem in practice — you chose Availability over Consistency. Resolution via version vectors or last-write-wins.
  • You deploy a new version to 1 out of 10 servers as a canary. A user hits server 3 (new version) and then server 7 (old version). Their session breaks. Why?Sticky sessions or session state in shared Redis — can't rely on local memory when deploying
  • A service calls another service which calls another. A timeout cascades and takes down everything. How do you prevent this?Circuit breaker pattern — after N failures, open the circuit and return fast errors for a cooldown period
07
Defending your own projects
They'll go deep on Scribe, Delivery Hub, the DICOM app — be ready
For every project, be ready to answer six things: what problem it solves, how it's architected, why you made each technical choice, the hardest thing you debugged, what would break at 100x scale, and what you'd change if you started over.
Scribe — real-time chat (Flask + Socket.io)
  • Your resume says "supports up to 5 concurrent users per room." Why that limit? What would break at 500?Be honest: Python's GIL, single Flask process, in-memory room state. At 500: move to Redis pub/sub for room distribution across servers.
  • If I close my laptop mid-conversation and reopen it, what happens to my messages? Do I see what I missed?MongoDB persistent storage handles history. But: no mechanism for "unread since disconnect" — explain how you'd add it.
  • How would you make Scribe work across multiple servers?Socket.io with Redis adapter — each server subscribes to a shared Redis pub/sub channel per room
Delivery Hub — logistics platform (Node.js + MySQL)
  • Your resume mentions simulating "race conditions" in tests. Give me a concrete example of one you found and how you fixed it.Two delivery partners accepting the same order simultaneously — fixed with SELECT FOR UPDATE / optimistic locking
  • How does order assignment work? Can two drivers both be assigned the same delivery?If not handled: yes. Walk through your solution — atomic DB update, first writer wins
  • If Delivery Hub had 100,000 active orders, what's the first thing in your current architecture that breaks?MySQL without read replicas, no caching on order status queries, no queue for assignment events
UCC DICOM app — AES-256 encryption (Flask)
  • Explain what a timing attack is, like I'm a backend developer who's never done security work.Attacker measures how long operations take to infer information — e.g., password comparison that exits early on first mismatch leaks the correct prefix
  • Why does every encryption operation use a different IV (initialization vector)? What goes wrong if you reuse one?Same key + same IV + same message = same ciphertext every time — patterns become visible to an attacker
  • You added "random delays" to prevent timing attacks. But can't an attacker just average out many measurements?Yes — this is why constant-time operations are the real fix. Random delay only raises the bar. Acknowledge this trade-off.
Student Portal — PHP + MySQL (1,500+ users)
  • 1,500 students hit the attendance page at 9am every day. How does your system handle the spike?If it doesn't have caching: be honest. Then explain what you'd add: Redis cache for read-heavy attendance data, 5-minute TTL
  • How did you secure the admin dashboard so students can't access faculty data?Walk through your role-based access control implementation — middleware check on every protected route
08
Thinking live in the interview
How to handle questions you haven't seen before — the real test
Juspay's open-ended round is evaluating how you think, not what you know. The candidate who says "I'm not sure, but here's how I'd reason about it…" and stays calm beats the candidate who knows the answer but panics when they don't.
A live-thinking protocol — use this every time
Step 1: Restate the problem in your own words before answering.
Confirms you understood correctly. Buys you thinking time. Makes you look careful.
Step 2: Ask one or two clarifying questions about scale or constraints.
"Are we optimizing for latency or throughput?" shows engineering maturity immediately.
Step 3: Name two or three possible approaches before picking one.
Don't silently pick the clever answer. Interviewers want to see you evaluate options.
Step 4: When you pick an approach, say what you're giving up.
"I'll use a cache here which means I accept stale data for up to 30 seconds."
Step 5: Start simple. Say "let me solve the simple case first, then handle edge cases."
A working simple solution beats a half-built optimal one. Always.
Step 6: At the end, say what you'd improve with more time.
"Given more time I'd add X because Y" — shows you can see beyond your own work.
Open-ended questions to practice out loud (record yourself)
  • Your payment service is processing 1 million transactions per minute. At 10am on a Monday it slows to a crawl. Walk me through how you'd find the cause.Don't jump to "add more servers." Start with metrics — latency, error rate, DB connections, queue depth.
  • A customer says "I was charged but the order didn't go through." Debug this live with me.Trace the request: did payment service confirm? Did order service receive the event? Did the webhook to merchant fire?
  • How would you design the backend for a QR code payment system used at 10 million shops in India?No single right answer — clarify: online or offline? UPI overlay? Merchant onboarding flow?
  • You need to build a system that sends an SMS exactly once when a user's balance drops below ₹100, even if the system crashes mid-send.At-least-once delivery from queue + idempotency check before SMS — exactly-once with deduplication
  • How would you make a system highly available if you only have one database?Replication for read scaling, automatic failover, health checks, circuit breaker on DB layer
The "derive it" drill — for things you might blank on
  • If you forget why TCP needs a 3-way handshake: ask yourself "what problem does the handshake solve?" — both sides need to confirm the other is alive and agree on sequence numbers. Derive the handshake from that.You don't need to memorize — reason from the problem backward to the mechanism.
  • If you forget database indexing: "sequential scan of 10M rows is too slow" → need a structure where you can find a row without scanning everything → B-tree index.First principles beats memorization when you're nervous.
Juspay SDE Interview Syllabus · Nilashis Saha · Final Two Rounds