GridFS Guide: How to Upload Files and Images to MongoDB Easily Using Node
As a full-stack developer, it‘s common to need to add support for uploading, storing, and serving files in your web applications. Profile pictures, document attachments, and media galleries are just a few examples of features that require robust file handling.
While there are many architectures and tools available for implementing file storage, using MongoDB‘s GridFS can be a powerful option, especially if you‘re already using MongoDB as your primary database.
In this in-depth guide, we‘ll cover everything you need to know to use GridFS to easily upload and serve files from a Node.js application, including:
- What is GridFS and when to use it
- How to set up GridFS in an Express app
- Uploading single and multiple files
- Serving files and images
- Deleting files
- Performance best practices
- Alternatives to GridFS
By the end of this guide, you‘ll have all the knowledge you need to start using GridFS in your own Node.js and MongoDB applications. Let‘s get started!
What is GridFS?
GridFS is a specification for storing and retrieving large files in MongoDB. It allows you to store files that exceed the BSON document size limit of 16 MB by splitting the file into chunks and storing each chunk as a separate document.
GridFS was created to enable MongoDB to be used as a general purpose database that can handle any type of data, including large files. Some of the key features and benefits of GridFS include:
- Storing files of any size by sharding them into 255k chunks
- Retrieving parts of files without having to load the entire file into memory
- Accessing and streaming files using the MongoDB drivers
- Maintaining metadata about files, such as filename, content type, etc.
- Support for concurrent reads and writes
How GridFS Works
When you use GridFS to store a file, it actually creates two collections in MongoDB:
chunks
– Stores the binary chunks of the file. Each document represents a 255 KB chunk of the file.files
– Stores the file metadata. Each document represents a file stored in GridFS.
Here‘s an example of what documents in these collections might look like:
chunks
collection:
{
"_id" : ObjectId("54e8d42aa4562db293576287"),
"files_id" : ObjectId("54e8d42aa4562db293576286"),
"n" : 0,
"data" : BinData(0,"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz012345678910")
}
files
collection:
{
"_id" : ObjectId("54e8d42aa4562db293576286"),
"length" : NumberLong("1024"),
"chunkSize" : 262144,
"uploadDate" : ISODate("2021-05-01T10:36:14.187Z"),
"filename" : "large-file.zip",
"contentType" : "application/zip",
"aliases" : [],
"metadata" : {}
}
When you query GridFS for a file, it reconstructs the chunks into the original file and returns it. This allows MongoDB to handle files of practically unlimited size while still maintaining high performance random access reads and writes.
When to Use GridFS
Despite its utility, GridFS is not always the best choice for file storage. Here are some guidelines on when to use GridFS:
When to Use GridFS | When Not to Use GridFS |
---|---|
Files larger than 16 MB | Files smaller than 16 MB |
Storing file metadata is important | Only need to store raw binary data |
Random access to file content is needed | Entire files will always be read |
Streaming files to clients | Serving assets directly from disk |
In general, if you need to manage large files or metadata about files directly in your MongoDB database, GridFS is a good fit. For serving assets or storing smaller files, other solutions may be more appropriate. We‘ll discuss some alternatives to GridFS in a later section.
Setting Up GridFS in Node.js and Express
Now that we have a high-level understanding of GridFS, let‘s walk through how to set it up in a Node.js application using the Express web framework.
Installing Dependencies
To get started, we need to install a few dependencies:
npm install mongodb multer multer-gridfs-storage
Here‘s what each of these packages does:
mongodb
– The official MongoDB driver for Node.jsmulter
– Middleware for handlingmultipart/form-data
requests, which is the standard encoding for file uploads from a browsermulter-gridfs-storage
– A GridFS storage engine for Multer
Initializing the GridFS Bucket
Next, we need to initialize a GridFS bucket that will handle streaming files to and from MongoDB. Here‘s an example of how to do that:
const mongoose = require(‘mongoose‘);
const Grid = require(‘gridfs-stream‘);
const GridFsStorage = require(‘multer-gridfs-storage‘);
const crypto = require(‘crypto‘);
const path = require(‘path‘);
const conn = mongoose.createConnection(mongoURI);
let gfs;
conn.once(‘open‘, () => {
gfs = Grid(conn.db, mongoose.mongo);
gfs.collection(‘uploads‘);
})
const storage = new GridFsStorage({
url: mongoURI,
file: (req, file) => {
return new Promise((resolve, reject) => {
crypto.randomBytes(16, (err, buf) => {
if (err) {
return reject(err);
}
const filename = buf.toString(‘hex‘) + path.extname(file.originalname);
const fileInfo = {
filename: filename,
bucketName: ‘uploads‘
};
resolve(fileInfo);
});
});
}
});
const upload = multer({ storage });
Let‘s break this down:
-
We create a separate connection to MongoDB using
mongoose.createConnection
. This ensures our file handling doesn‘t impact the main application connection. -
We initialize a GridFS stream using
gridfs-stream
and bind it to theuploads
collection. -
We create a
GridFsStorage
engine instance, passing it the MongoDB connection URI. In thefile
function, we generate a random filename for security and return an object specifying the filename and bucket name. -
Finally, we initialize a
multer
middleware instance, passing it theGridFsStorage
engine. This allows us to useupload
as middleware in our routes to handle file uploads.
With this setup in place, we‘re ready to start implementing file uploads in our Express routes.
Uploading Files
To enable file uploads, we simply need to add the upload
middleware to the appropriate routes. Here‘s an example of how to handle a single file upload:
app.post(‘/upload‘, upload.single(‘file‘), (req, res) => {
res.json({ file: req.file });
});
In this route, we use upload.single(‘file‘)
to handle a single file upload in the file
field. Multer automatically handles streaming the file to GridFS, and we can access metadata about the uploaded file on req.file
.
We can also easily handle multiple file uploads using upload.array()
:
app.post(‘/upload-multiple‘, upload.array(‘files‘), (req, res) => {
res.json({ files: req.files });
});
Serving Uploaded Files
Once a file is uploaded, we can serve it by querying the GridFS bucket. Here‘s an example of how to serve a file by filename:
app.get(‘/files/:filename‘, (req, res) => {
gfs.files.findOne({ filename: req.params.filename }, (err, file) => {
if (!file || file.length === 0) {
return res.status(404).json({
err: ‘No file exists‘
});
}
if (file.contentType === ‘image/jpeg‘ || file.contentType === ‘image/png‘) {
const readstream = gfs.createReadStream(file.filename);
readstream.pipe(res);
} else {
res.status(404).json({
err: ‘Not an image‘
});
}
});
});
In this route, we first query the files
collection for a document with the specified filename
. If a file is found and it‘s an image, we create a read stream using gfs.createReadStream()
and pipe it to the response. This serves the image data directly to the client.
We can also add routes to delete files by _id
, get a list of all files, and more:
// Delete a file by ID
app.delete(‘/files/:id‘, (req, res) => {
gfs.remove({ _id: req.params.id, root: ‘uploads‘ }, (err, gridStore) => {
if (err) {
return res.status(404).json({ err: err });
}
res.redirect(‘/‘);
});
});
// Get a list of all files
app.get(‘/files‘, (req, res) => {
gfs.files.find().toArray((err, files) => {
if (err) {
return res.status(500).send(err);
}
res.json(files);
});
});
GridFS Performance
One of the key considerations when evaluating GridFS is performance. While GridFS can scale to handle very large files, it‘s important to understand its performance characteristics to ensure it meets your application‘s needs.
Some key factors that impact GridFS performance include:
- Network latency between the application server and MongoDB
- Size and number of chunks
- Available memory for caching chunks
- Disk speed, especially if files need to be paged to disk
- Concurrent requests
In general, GridFS performs best with larger files that are read sequentially rather than randomly accessed. Accessing many small files or random parts of files can lead to higher overhead due to the number of chunk documents that need to be fetched.
Some best practices for optimizing GridFS performance include:
- Use a dedicated MongoDB instance for file storage to avoid impacting the main application database
- Ensure the GridFS database is properly indexed, especially the
files_id
field on thechunks
collection - Choose an appropriate chunk size (the default of 255 KB is usually good)
- Use caching to avoid re-fetching unchanged files
- Monitor usage patterns and consider sharding the GridFS bucket if needed
To give you a rough idea of GridFS performance, here are some benchmarks from the official MongoDB documentation:
File Size | Number of Files | Total Size | Average Read Rate | Average Write Rate |
---|---|---|---|---|
1 MB | 5,000 | 5 GB | 149 MB/s | 56 MB/s |
500 MB | 1,000 | 500 GB | 120 MB/s | 39 MB/s |
5 GB | 200 | 1 TB | 114 MB/s | 28 MB/s |
Source: GridFS Benchmarks
As you can see, GridFS maintains good throughput even with very large files and data sets. However, your actual performance may vary depending on your specific hardware and configuration.
Alternatives to GridFS
While GridFS is a powerful option for storing files in MongoDB, it‘s not the only option. Here are some alternatives to consider:
-
Storing files directly in MongoDB as BinData – For files smaller than 16 MB, you can store them directly in MongoDB documents as binary data. This avoids the overhead of GridFS but doesn‘t support metadata or chunking.
-
Using a separate file system – You can store files on the server‘s file system or an attached network drive. Your application would store paths to the files in MongoDB but access the actual files directly from disk. This can be a good option if you need to serve a lot of static files efficiently.
-
Using a cloud storage service – Cloud services like Amazon S3, Azure Blob Storage, or Google Cloud Storage provide scalable and durable storage for files. You can store references to files in MongoDB but delegate the actual file storage to a specialized service. This is often the best approach for large-scale production applications.
Ultimately, the right choice depends on your application‘s specific needs and constraints. GridFS is worth considering if you need a tight integration between your files and MongoDB, but don‘t rule out other options that may be simpler or more scalable.
The Future of GridFS
As of MongoDB 4.4, GridFS is being phased out in favor of the new MongoDB driver streams API. While GridFS will continue to be supported for the foreseeable future, it‘s worth being aware of this change and considering migrating to the new streams API in the future.
The new streams API aims to provide a more modern and efficient interface for handling large files in MongoDB. Some of the benefits include:
- A simplified, Promise-based API
- Avoiding the need for separate
chunks
andfiles
collections - Allowing multiple chunk sizes in the same bucket
- Storing user-defined metadata on chunks
You can find more details on migrating from GridFS to the new streams API in the MongoDB documentation.
Conclusion
GridFS is a powerful tool for storing and retrieving large files in MongoDB. With support for chunking, streaming, and metadata, it provides a flexible solution for handling files of any size directly in your database.
In this guide, we covered everything you need to know to get started with GridFS in Node.js, including:
- What GridFS is and how it works
- Setting up GridFS in an Express application
- Uploading single and multiple files
- Serving files and images
- Deleting files
- Performance best practices
- Alternatives to GridFS
I hope this guide has given you a solid foundation for working with GridFS in your own projects. Remember, while GridFS is a great option for many use cases, it‘s not always the best choice. Always evaluate your application‘s specific needs and consider alternatives like direct MongoDB storage, file systems, and cloud storage services.
Finally, keep an eye on the new MongoDB driver streams API as a potential future replacement for GridFS. While GridFS will continue to be supported for some time, the streams API aims to provide a more modern and efficient interface for handling large files in MongoDB.
Additional Resources
- GridFS Documentation – Official MongoDB documentation on GridFS
- Storing Files with GridFS in Node.js – Blog post from MongoDB on using GridFS in Node.js
- Migrate from GridFS API – MongoDB documentation on migrating to the new streams API
- MongoDB Driver Streams API – API documentation for the new MongoDB driver streams API