Building a Cloud File Storage Service

Quick heads-up before we start, this is not a tutorial. I'm not going to walk you through setting up an S3 bucket, configuring IAM roles, or bootstrapping a Qwik app from scratch. That's what documentation is for.

This is a chapter from my engineering book. Think of it as reading someone's notes. I'm writing down the decisions I made, the tradeoffs I was thinking about, and the problems I ran into while building this for myself. Two voices you'll notice here: one is me talking to past-me as I was designing things, and the other is me talking to you, pointing out things I think are worth knowing.

The Goal

Two things I wanted out of this project:

Actually understand AWS, specifically S3 and how to interact with it using aws-sdk.
Build something clean with minimal dependencies using Bun as the runtime, and lean into its native APIs as much as possible.
Understand and build microservices.
Use Docker.

Ideas and Process

After a while of planning, going back and forth in my head, sketching things out... I landed on these decisions.

Back End

The goal: stay minimal, stay clean.

Two isolated microservices, one for auth, one for files. Both written in Bun with TypeScript. Honestly, this is just a great combo for small services. Easy setup, clean APIs, barely any configuration needed, and Bun's native APIs cover most of what you'd reach for a library for. Also, because these are microservices, each one gets its own database, otherwise what's even the point of calling them microservices?

For the database, I always reach for PostgreSQL. It's open source, has been around forever, and it's packed with features. The jsonb type alone is huge, being able to store and query JSON natively, using those special -> and ->> operators, is genuinely useful. And I love how flexible it is when modeling data: separate things cleanly, then build relations. Yes, relations can affect performance as the data grows, that's a real tradeoff worth thinking about, but for this project it wasn't the concern.

On ORMs, I used Drizzle ORM. For TypeScript projects it's my first pick. You get real freedom over how you write your queries, schema definitions that directly infer TypeScript types, and it never gets in your way. Relations are a little hard to get used to at first, not gonna lie. But the docs are clean and actually readable. Worth going through.

Front End

My first thought was to just go with plain HTML, CSS, and JS, 3 or 4 pages, nothing fancy. Then I thought: why not use a framework? Everyone does. But frameworks come with tradeoffs. Bundle size is one of them, and I didn't want to ship a massive JavaScript bundle just to render a few pages. I also didn't want to deal with hydration errors, something I kept running into with Next.js and React Server Components.

That's when I found Qwik. And yes, it's actually quick.

The thing that sold me: Qwik doesn't hydrate. Most SSR frameworks ship the full app as JavaScript to the client and "replay" everything on load... that's hydration. Qwik skips all of that through a concept called Resumability. The app serializes its state on the server and resumes exactly from that point on the client. No replay, no wasted work.

Qwik only downloads the JavaScript for an interaction when the user actually triggers it. Hover over a button, and only then does that button's JS get fetched. That's a fundamentally different model, and for this kind of app it's excellent.

Path is clear: learn a framework that genuinely solves the hydration problem → Qwik + QwikCity with Resumability.

Honest note on the frontend side of this project, I used AI to generate styled components with small Tailwind animations, and to debug some frontend logic bugs. I'm not hiding that. For a learning project like this, it made sense: stay focused on the architecture and let AI handle the repetitive UI wiring. Also I didn't do much refactoring or cleanup on the Qwik app. Things like DRY, extracting repeated logic into shared utilities, or making components more generic were not a priority here. It works, but it reads like first-draft code. Something to go back to.

That said, if I were building this for production, I'd use AI differently. I'd start by designing a proper theme, a unique visual identity and building a set of reusable components up front. That way the dev experience stays consistent and clean throughout. AI is great at that kind of scaffolding when you give it direction and context. The key difference: using it to skip thinking versus using it to move faster while thinking.

Design and Coding Decisions

Auth Service - GitHub

Kept this simple and secure:

JWT with short-lived access tokens and long-lived refresh tokens.
Track every active session per user so they can invalidate everything at once. Logout from all devices.
On each request: validate the token, check expiration, and check a revokedAt column in the refreshTokens table. This means even a technically-valid token can be revoked server-side.
Background job to clean up expired/revoked tokens. No point keeping garbage in the DB forever.
Custom middleware for route protection and request auditing. No extra dependencies, just readable, purpose-built code.

Files Service - GitHub

This is where things get more interesting.

The core idea I kept coming back to: the server should never touch the actual file bytes. If multiple users are uploading large files simultaneously and the server is the one moving those bytes, things will break or at least get slow and expensive. The server's job here is to be a coordinator, not a carrier.

Here's how the pieces fit together:

Server handles routing, stores metadata in the database, decides the upload strategy (single-part or multipart), generates presigned URLs via aws-sdk, and returns them to the client.
Client uploads directly to S3 using those presigned URLs, no server in the middle for the actual transfer.
Server acts as the secure gateway: it controls permissions, exposes only what's needed, and nothing else.

This approach is secure, predictable, and efficient. The server never sees file contents, only metadata and signals.

We'll go deeper on the upload flow in the next section.

Web App (Qwik) - GitHub

For state, Qwik gives you useSignal for primitive values and useStore for objects, both work well, each has its sweet spot. For data fetching and mutations, routeLoader$ and routeAction$ are your main tools. They're reactive, efficient, and play well with Qwik's resumable model.

One thing to keep in mind: everything that passes through Qwik's serialization boundaries must be serializable. No functions, no class instances, no circular references. This is the price of resumability. It's worth it, but you'll notice it when something breaks in a non-obvious way.

useTask$ is clean and useful for side effects, fire something when a signal changes, react when an action completes, etc.

Here's a quick example of how these APIs come together:

export default component$(() => {
  const showModal = useSignal<"login" | "register" | null>(null);
  const registerAction = useRegisterAction();
  const loginAction = useLoginAction();
 
  useTask$(({ track }) => {
    const result = track(() => registerAction.value);
    if (result?.message && !result?.error) {
      setTimeout(() => (showModal.value = "login"), 1500);
    }
  });
 
  return (...)
});

My take on Qwik overall: Clean, fast, and the mental model makes sense once it clicks. Whether it can replace everything Next.js does with all its ecosystem and features. Honestly, I don't know. But for this project it got the job done with way less headache. I'll take that.

A `routeAction$` Limitation You Should Know About

For the reader - routeAction$ is great for standard form submissions and one-shot mutations. But it has a hard constraint: it runs once per invocation and doesn't stay alive across async sequences. If you're building anything that drives multiple rounds of async work, like a chunked file upload don't reach for routeAction$. It will silently stall on you.

When I tried to initiate uploads for large files using routeAction$ with a queue and concurrency logic, things broke in a subtle way. The upload would start, get through the first chunk of the first file, and then just stop. No error, no retry, it'd silently stall.

After digging through the docs and going back and forth with AI to understand what was happening, I found the issue: routeAction$ runs exactly once. When it's done, it's done. That single execution can't drive a multi-step, long-running upload process, it doesn't have the lifecycle for it.

The fix was to move the upload initiation out of routeAction$ entirely and use fetch against a proper API route under routes/api/, which Qwik-City handles cleanly without that lifecycle constraint.

Here's the idea behind routes/api/uploads/initiate/index.ts - Source code for this part →:

POST /api/uploads/initiate
  generate a request ID for tracing
 
  try:
    read access_token from cookie
    read FILES_SERVICE_ENDPOINT from env
    read raw request body
 
    forward POST to upstream /uploads/initiate
      with Authorization: Bearer <token>
      with original body
 
    stream upstream response back to client (status + headers intact)
 
  catch:
    log error with request ID
    respond 502 -> "Upstream service unavailable"

The route acts as a thin proxy, picks up the access token from the cookie, forwards the request to the Files Service, and streams the response back. Clean, stateless, and it actually works for long-running upload sequences.

The Main Design - How the Upload Flow Works

There are four players: Client, File Service, Database, and S3
(from now on: const s3 = fileStorage).

Every request starts at the client and ends at the client. The reason is the same one I mentioned earlier, the server should never handle file bytes directly.

Here's the flow, step by step:

Step 1: Client reads and prepares
The client reads files and folders, extracts metadata, preserves the folder hierarchy (so the file tree structure stays intact on S3), and sends everything to the File Service to initiate the upload.

Step 2: Server stores and decides
The File Service receives the metadata, stores it in the database, and based on file size decides the strategy: single-part upload for small files, multipart for large ones. It then generates the appropriate presigned URL(s) using aws-sdk.

Step 3: Server returns presigned URLs
The generated URLs are sent back to the client. This is where the server's job as "coordinator" ends for now.

Step 4: Client uploads directly to S3
The client takes those presigned URLs and sends PUT requests directly to the S3 bucket. The server is not involved in the actual transfer. This is the point.

Step 5: Client signals completion
After the upload finishes, the client collects the ETag headers from each S3 response and sends them back to the File Service to finalize the upload.

Step 6: Server completes the upload on S3
The File Service calls CompleteMultipartUpload on S3 with the collected parts and their eTags, and updates the database record accordingly.

Upload Flow - 0/6

Press play to walk through the upload flow step by step.

A couple of important things from this design:

Concurrency on large files: When dealing with multipart uploads, we use Promise.all to upload parts in parallel. This is the whole point of breaking the file into chunks.
eTags are not optional: Each PUT request to S3 returns an ETag header in the response. You must capture these, because CompleteMultipartUpload requires an array of { PartNumber: number, ETag: string }. Miss this and the whole upload fails silently.
Giant files (future problem): For files in the 10GB+ range, generating all presigned URLs upfront becomes a massive JSON payload sent in a single response, that's inefficient. The cleaner approach is for the client to request presigned URLs in batches (e.g., 80 at a time, then the next 80, until done). For now, I capped uploads at 100MB (AWS free tier, learning purposes), so bumping the chunk size is good enough for the moment. But I'm aware of it.

AWS Multipart Upload

When a file is large, you have to use the Multipart Upload API, S3 won't accept it as a single PUT beyond a certain size. The idea: split the file into chunks, upload each one independently, then tell S3 to stitch them back together.

It starts with this command:

const cmd = new CreateMultipartUploadCommand({
  Bucket: process.env.AWS_BUCKET!,
  Key: s3Key,
  ContentType: mimeType,
});
 
const { UploadId } = await s3.send(cmd);

This returns an UploadId. Save it to the database, you'll need it for every subsequent step. It's how S3 links all the parts together. Lose this and the upload is orphaned. Next, generate presigned URLs for each part. Then inside a loop, use UploadPartCommand + getSignedUrl and push each call into a promises array. Resolve them all with Promise.all for better performance, no need to wait for one part's URL before generating the next. Each resolved promise gives you a { PartNumber, url } object. Those URLs go to the client.

Source code for this part →

After all parts are uploaded by the client, we close the loop with CompleteMultipartUpload, this tells S3 to assemble everything using the eTags collected from the PUT responses. The full multipart upload APIs needed:

CreateMultipartUpload: initialize, get the UploadId
UploadPart: upload each chunk (via presigned URL from the client)
CompleteMultipartUpload: finalize with all parts and their eTags
AbortMultipartUpload: clean up on failure (important! orphaned uploads still cost money on S3)
ListParts / ListMultipartUploads: useful for debugging or building a resume-upload feature

If you want to go deeper, the AWS docs on CreateMultipartUpload are actually solid. Recommended (But honestly you may end up overwhelmed).

Notes to Self

A few things to close out with.

Rules I actually applied: Version control (always, even on a solo side project with no one watching). Build, refactor, handle errors properly. Testing? I failed myself on that one. No unit tests, no integration tests.

Why aws-sdk and not Bun's native S3 API?: Bun has a built-in S3 client, which I would've loved to use. But at the time of building this, it didn't support multipart uploads or presigned URLs. Given that I'm always aiming for minimal dependencies, using aws-sdk specifically for those features was the right call.

The API Gateway question: If you're building microservices seriously, you need an API gateway. It handles routing between services, auditing, monitoring, rate limiting, and load balancing, all the cross-cutting concerns that shouldn't live inside individual services. I skipped it because I only have two services and this was a learning project. But if this were to grow, that's non-negotiable.

Redis for rate limiting: One thing I'd add in a production setup: a key-value store like Redis at the gateway level, for distributed rate limiting across containers. Redis is the obvious answer.

Conclusion

This project taught me more than I expected, not because it was hard, but because I forced myself to actually understand what I was building. Big shoutout to the guy who wrote this article that clicked the whole upload design in my head.

The formula that worked for me in the age of AI and agents: fundamentals first → books on your tech stack and problem-solving → articles from people who already solved what you're trying to solve → other people's code and open source → then AI, with context and intention. In that order. Books here means the stuff that builds your foundation, how the technology actually works, how to think about problems, the principles that don't expire. Articles means reading how real developers tackled real problems, not for copy-pasting answers, but for the thinking behind them. And reading other people's code, especially open source, is underrated. You learn patterns and decisions that no tutorial will ever show you. AI is a multiplier. But a multiplier of what you bring to it. If you bring understanding, it accelerates you. If you bring nothing, it just generates code you can't debug, maintain, or own. The goal in these days is to control the AI, give it direction, question its output, know when it's wrong. Don't let it drive. Code generated by an AI agent is your responsibility. Be responsible

Spot an inaccuracy, or need further clarification on this topic? Feel free to reach out at midaghdour@gmail.com.

Building a Cloud File Storage Service

The Goal

Two things I wanted out of this project:

Actually understand AWS, specifically S3 and how to interact with it using aws-sdk.
Build something clean with minimal dependencies using Bun as the runtime, and lean into its native APIs as much as possible.
Understand and build microservices.
Use Docker.

Ideas and Process

After a while of planning, going back and forth in my head, sketching things out... I landed on these decisions.

Back End

The goal: stay minimal, stay clean.

On ORMs, I used Drizzle ORM. For TypeScript projects it's my first pick. You get real freedom over how you write your queries, schema definitions that directly infer TypeScript types, and it never gets in your way. Relations are a little hard to get used to at first, not gonna lie. But the docs are clean and actually readable. Worth going through.

Front End

That's when I found Qwik. And yes, it's actually quick.

Qwik only downloads the JavaScript for an interaction when the user actually triggers it. Hover over a button, and only then does that button's JS get fetched. That's a fundamentally different model, and for this kind of app it's excellent.

Path is clear: learn a framework that genuinely solves the hydration problem → Qwik + QwikCity with Resumability.

Design and Coding Decisions

Auth Service - GitHub

Kept this simple and secure:

JWT with short-lived access tokens and long-lived refresh tokens.
Track every active session per user so they can invalidate everything at once. Logout from all devices.
On each request: validate the token, check expiration, and check a revokedAt column in the refreshTokens table. This means even a technically-valid token can be revoked server-side.
Background job to clean up expired/revoked tokens. No point keeping garbage in the DB forever.
Custom middleware for route protection and request auditing. No extra dependencies, just readable, purpose-built code.

Files Service - GitHub

This is where things get more interesting.

Here's how the pieces fit together:

Server handles routing, stores metadata in the database, decides the upload strategy (single-part or multipart), generates presigned URLs via aws-sdk, and returns them to the client.
Client uploads directly to S3 using those presigned URLs, no server in the middle for the actual transfer.
Server acts as the secure gateway: it controls permissions, exposes only what's needed, and nothing else.

This approach is secure, predictable, and efficient. The server never sees file contents, only metadata and signals.

We'll go deeper on the upload flow in the next section.

Web App (Qwik) - GitHub

useTask$ is clean and useful for side effects, fire something when a signal changes, react when an action completes, etc.

Here's a quick example of how these APIs come together:

export default component$(() => {
  const showModal = useSignal<"login" | "register" | null>(null);
  const registerAction = useRegisterAction();
  const loginAction = useLoginAction();
 
  useTask$(({ track }) => {
    const result = track(() => registerAction.value);
    if (result?.message && !result?.error) {
      setTimeout(() => (showModal.value = "login"), 1500);
    }
  });
 
  return (...)
});

My take on Qwik overall: Clean, fast, and the mental model makes sense once it clicks. Whether it can replace everything Next.js does with all its ecosystem and features. Honestly, I don't know. But for this project it got the job done with way less headache. I'll take that.

A `routeAction$` Limitation You Should Know About

For the reader - routeAction$ is great for standard form submissions and one-shot mutations. But it has a hard constraint: it runs once per invocation and doesn't stay alive across async sequences. If you're building anything that drives multiple rounds of async work, like a chunked file upload don't reach for routeAction$. It will silently stall on you.

Here's the idea behind routes/api/uploads/initiate/index.ts - Source code for this part →:

POST /api/uploads/initiate
  generate a request ID for tracing
 
  try:
    read access_token from cookie
    read FILES_SERVICE_ENDPOINT from env
    read raw request body
 
    forward POST to upstream /uploads/initiate
      with Authorization: Bearer <token>
      with original body
 
    stream upstream response back to client (status + headers intact)
 
  catch:
    log error with request ID
    respond 502 -> "Upstream service unavailable"

The Main Design - How the Upload Flow Works

There are four players: Client, File Service, Database, and S3
(from now on: const s3 = fileStorage).

Every request starts at the client and ends at the client. The reason is the same one I mentioned earlier, the server should never handle file bytes directly.

Here's the flow, step by step:

Step 3: Server returns presigned URLs
The generated URLs are sent back to the client. This is where the server's job as "coordinator" ends for now.

Step 5: Client signals completion
After the upload finishes, the client collects the ETag headers from each S3 response and sends them back to the File Service to finalize the upload.

Step 6: Server completes the upload on S3
The File Service calls CompleteMultipartUpload on S3 with the collected parts and their eTags, and updates the database record accordingly.

Upload Flow - 0/6

Press play to walk through the upload flow step by step.

A couple of important things from this design:

Concurrency on large files: When dealing with multipart uploads, we use Promise.all to upload parts in parallel. This is the whole point of breaking the file into chunks.
eTags are not optional: Each PUT request to S3 returns an ETag header in the response. You must capture these, because CompleteMultipartUpload requires an array of { PartNumber: number, ETag: string }. Miss this and the whole upload fails silently.
Giant files (future problem): For files in the 10GB+ range, generating all presigned URLs upfront becomes a massive JSON payload sent in a single response, that's inefficient. The cleaner approach is for the client to request presigned URLs in batches (e.g., 80 at a time, then the next 80, until done). For now, I capped uploads at 100MB (AWS free tier, learning purposes), so bumping the chunk size is good enough for the moment. But I'm aware of it.

AWS Multipart Upload

It starts with this command:

const cmd = new CreateMultipartUploadCommand({
  Bucket: process.env.AWS_BUCKET!,
  Key: s3Key,
  ContentType: mimeType,
});
 
const { UploadId } = await s3.send(cmd);

Source code for this part →

CreateMultipartUpload: initialize, get the UploadId
UploadPart: upload each chunk (via presigned URL from the client)
CompleteMultipartUpload: finalize with all parts and their eTags
AbortMultipartUpload: clean up on failure (important! orphaned uploads still cost money on S3)
ListParts / ListMultipartUploads: useful for debugging or building a resume-upload feature

If you want to go deeper, the AWS docs on CreateMultipartUpload are actually solid. Recommended (But honestly you may end up overwhelmed).

Notes to Self

A few things to close out with.

Redis for rate limiting: One thing I'd add in a production setup: a key-value store like Redis at the gateway level, for distributed rate limiting across containers. Redis is the obvious answer.

Conclusion

This project taught me more than I expected, not because it was hard, but because I forced myself to actually understand what I was building. Big shoutout to the guy who wrote this article that clicked the whole upload design in my head.

The formula that worked for me in the age of AI and agents: fundamentals first → books on your tech stack and problem-solving → articles from people who already solved what you're trying to solve → other people's code and open source → then AI, with context and intention. In that order. Books here means the stuff that builds your foundation, how the technology actually works, how to think about problems, the principles that don't expire. Articles means reading how real developers tackled real problems, not for copy-pasting answers, but for the thinking behind them. And reading other people's code, especially open source, is underrated. You learn patterns and decisions that no tutorial will ever show you. AI is a multiplier. But a multiplier of what you bring to it. If you bring understanding, it accelerates you. If you bring nothing, it just generates code you can't debug, maintain, or own. The goal in these days is to control the AI, give it direction, question its output, know when it's wrong. Don't let it drive. Code generated by an AI agent is your responsibility. Be responsible

Building a Cloud File Storage Service

Building a Cloud File Storage Service

The Goal

Ideas and Process

Back End

Front End

Design and Coding Decisions

Auth Service - GitHub

Files Service - GitHub

Web App (Qwik) - GitHub

A routeAction$ Limitation You Should Know About

The Main Design - How the Upload Flow Works

AWS Multipart Upload

Notes to Self

Conclusion

Building a Cloud File Storage Service

Building a Cloud File Storage Service

The Goal

Ideas and Process

Back End

Front End

Design and Coding Decisions

Auth Service - GitHub

Files Service - GitHub

Web App (Qwik) - GitHub

A routeAction$ Limitation You Should Know About

The Main Design - How the Upload Flow Works

AWS Multipart Upload

Notes to Self

Conclusion

A `routeAction$` Limitation You Should Know About

A `routeAction$` Limitation You Should Know About