Where Do All These Files Come From?

I've talked a lot in the last week about the dependency graph that Webpack builds from your entrypoint, and how the rules that you define (which are usually composed of loaders) tell Webpack how to deal with each type of file that it encounters as it walks the graph.

So where do all of these other files come from in my output directory? For instance, if I use HtmlWepbackPlugin to generate an HTML template, it will copy any images that it encounters in my HTML and update the src attributes appropriately in the built file. How does it know to do that?

Plugins can kick off their own compilations, which have their own dependency graphs, and can create arbitrary files (called "emitting"). Loaders can emit arbitrary files too. In fact, HtmlWepbackPlugin can only process images in your HTML template if you've configured a rule with and appropriate loader to do so.

Does your brain hurt yet? Let's walk through a basic example. Say I have a very simple repository that looks like this:

(as usual, code formatting looks better on the web. I'm going to fix this soon, I promise)

webpack.config.js
src/
 - img.jpg
 - index.js
 - index.html
    // index.html includes <img src="./img.jpg" />
    // but NO reference to index.js
dist/
 // here's where our built files go

I have a webpack config that looks like this

const HtmlWebpackPlugin = require("html-webpack-plugin");
const path = require("path");

module.exports = {
  entry: "./src/index.js", // not strictly needed, as this is the default
  output: {
    filename: "[name]-[contenthash].js",
    path: path.resolve(__dirname, "dist")
  },
  plugins: [
    new HtmlWebpackPlugin({ template: "./src/index.html" })
  ],
  module: {
    rules: [
      {
        test: /\.html$/,
        use: ["html-loader"]
      },
      {
        test: /\.(png|gif|jp(e?)g)$/,
        type: "asset/resource"
      }
    ]
  }
};

Running Webpack with this config produces contents in the dist directory that look like this:

dist/
 - 16ce1537430d8823fcd6.jpg
    // this is a copy of img.jpg
 - index.html
    // includes <img src='./16ce1537430d8823fcd6.jpg' />
    // and <script src='./main-a572965581558bddf992.js' />
 - main-a572965581558bddf992.js
    // this is the built JS bundle

How did we get here? Who's responsible for keeping track of which files are bundled into which hashed results, and where do their references get updated?

The file main-a572965581558bddf992.js is our main JavaScript bundle. It's produced by starting at the entrypoint listed in the config file (src/index.js) and walking its dependencies to produce the final JavaScript file. For today's example, let's assume that this file only contains console.log('hello world'). In this case, that's more or less what our output bundle will contain too. We'll cover more complex scenarios in coming days.

The interesting thing here is index.html. It doesn't show up anywhere in index.js's dependency graph. It doesn't contain any reference to index.js in itself. Yet it got copied to the dist directory, along with the files it referred to (img.jpg). Those references got updated with new filenames and the main JS bundle was injected into the resulting file.

Here's how it went down:

When we run Webpack with this config, it looks first to the entrypoint to build its initial dependency graph for generating our main bundle. In this case we're just doing hello world, but it could just as easily be a sprawling monstrosity of an application. If we had specified no rules or plugins in our config file, the output would be just that main.js file.

Plugins are able to hook into many different points of the compilation process, and are passed the compilation object itself--meaning that plugins have access to the main dependency graph, the entry point(s), and generated bundles. In this case, the HtmlWebpackPlugin is kicked off in parallel with the main compilation. It looks to the value of the template option that we pass it (src/index.html) and parses that file. As it parses, it will process any external dependencies that it finds. In our case, this is the src attribute of our img tag. As I said earlier, we need an appropriate loader for any file type that it comes across. This is why there's a rule testing image file extensions and marking them as type asset/resource. Asset types are a new feature of Webpack 5. In Webpack 4, you'd use a separate loader called file-loader.

As the plugin processes dependencies and emits them into the dist folder, it updates the src references in the generated html with the new file names. We'll talk about why filenames get mangled another day. As a final step, when the main compilation of our JS bundle is complete (remember it was happening in parallel while the plugin was processing our HTML), the plugin, which has access to the main compilation object, injects a script tag with a src attribute that points to the main bundle.

Tomorrow: Chunks, Hashing, and Cache-busting

Next Up:
Why Does Webpack Mangle My Filenames?

Previously:
Composing Webpack Loaders into Rules


Want to impress your boss?

Useful articles delivered to your inbox. Learn to to think about software development like a professional.

Icon