Python Create File – How to Append and Write to a Text File

As a full-stack developer, you‘ll frequently need to read and write data to files in your Python backend services. File input/output (I/O) is an essential skill for everything from saving user preferences and caching to logging and data analysis. Virtually every non-trivial Python web app or script will need to persist data to the file system at some point.

In this in-depth guide, we‘ll cover all the key aspects of creating, writing to, appending, and reading files using Python‘s built-in functions and modules. You‘ll see detailed code examples, learn best practices and performance optimizations, and get expert tips for working with files in real-world projects. Let‘s get started!

Creating and Writing Text Files

To create a new file or overwrite an existing one, you use the open() function in write (‘w‘) mode:

with open(‘example.txt‘, ‘w‘) as file:
    file.write(‘Hello World!\n‘)
    file.write(‘This is a new file.\n‘)

The open() function takes two key arguments – the file path and the mode. For writing to a file, we pass ‘w‘ as the mode. This opens the file for writing and truncates (clears) it if it already exists. If the file doesn‘t exist, open() will create it.

Using the file handle returned by open(), we write strings to the file with the write() method. Unlike the print() function, write() does not automatically add newline characters, so you need to include them manually (\n) to write multiple lines.

It‘s important to always close a file after you‘re done writing to it. The best way to do this is using a with block as shown above. This ensures the file is closed automatically once the with block‘s scope ends, even if an exception is raised. If you don‘t use with, be sure to call close() on the file handle explicitly.

Appending to Files

Sometimes you want to add to an existing file without overwriting its contents. For this, we use append mode by passing ‘a‘ to open():

with open(‘example.txt‘, ‘a‘) as file:
    file.write(‘This line is appended.\n‘)

Append mode preserves the current contents and writes any new data to the end of the file. If the file doesn‘t exist, opening it in append mode will create it just like write mode.

Be careful when mixing write and append mode on the same file, as it‘s easy to accidentally overwrite data. Usually it‘s best to pick one mode and stick with it for a given file in your program.

Reading File Contents

Once you‘ve written some data to a file, you‘ll probably want to read it back at some point. For this, we use read mode by passing ‘r‘ to open():

with open(‘example.txt‘, ‘r‘) as file:
    contents = file.read()
    print(contents)

This reads the entire contents of the file into a string using read(). If you want to process the lines individually, you can use a for loop:

with open(‘example.txt‘, ‘r‘) as file:
    for line in file:
        print(line.strip())

This iterates over the file line by line. The strip() method removes the newline character from the end of each line before printing.

For large files, it‘s often better to process them line by line instead of reading the whole file at once to avoid using too much memory. Python will automatically buffer the file I/O for efficient reading and writing.

Text vs Binary Mode

By default, open() operates in text mode, which automatically handles line endings and other text formatting details for you based on the platform. On Windows, \r\n line endings are translated to \n; on Unix/Linux, \n line endings are left alone.

However, sometimes you need to read or write binary data like images, sound files, or serialized Python objects. In this case, you need to use binary mode by adding ‘b‘ to the mode string:

with open(‘example.bin‘, ‘wb‘) as file:
    file.write(b‘\x00\x01\x02\x03‘)

In binary mode, data is read and written as raw bytes with no formatting translation. When reading a binary file, read() returns a bytes object instead of a str.

Be careful not to use binary mode for text data or vice versa, as this can lead to formatting errors and corrupt data.

Paths and Directories

So far we‘ve used simple filenames that put the file in the same directory as the Python script. For larger projects, you‘ll often need to specify paths to organize files into subdirectories.

Python‘s os module provides many helpful functions for working with the file system in a cross-platform manner:

import os

# Get current working directory
cwd = os.getcwd()

# Create a new directory
os.mkdir(‘example_dir‘)

# Change current directory
os.chdir(‘example_dir‘)

# Get list of files and subdirectories 
files = os.listdir(‘.‘)

# Construct file path
file_path = os.path.join(cwd, ‘example_dir‘, ‘file.txt‘)

with open(file_path, ‘w‘) as file:
    file.write(‘File in a subdirectory.\n‘)

Always use os.path.join() to construct file paths instead of manually concatenating strings. This ensures your code works correctly on different operating systems.

Serializing Objects

Python‘s pickle module allows you to easily serialize and deserialize Python objects to and from files. This is useful for caching expensive computations or saving complex application state.

Here‘s how to pickle a dictionary to a file:

import pickle

data = {
    ‘name‘: ‘Alice‘,
    ‘age‘: 30,
    ‘hobbies‘: [‘reading‘, ‘running‘, ‘coding‘]
}

with open(‘data.pkl‘, ‘wb‘) as file:
    pickle.dump(data, file)

And here‘s how to unpickle it:

with open(‘data.pkl‘, ‘rb‘) as file:
    loaded_data = pickle.load(file)
    print(loaded_data)

Pickle uses a binary format, so you need to use binary read/write mode. Also, never unpickle data from an untrusted source, as it can lead to arbitrary code execution.

Structured File Formats

In addition to working with raw text and binary files, Python has excellent support for structured formats like CSV, JSON, and XML.

To read and write CSV files, use the built-in csv module:

import csv

with open(‘example.csv‘, ‘w‘, newline=‘‘) as file:
    writer = csv.writer(file)
    writer.writerow([‘Name‘, ‘Age‘, ‘City‘])
    writer.writerow([‘Alice‘, 30, ‘New York‘]) 
    writer.writerow([‘Bob‘, 25, ‘San Francisco‘])

with open(‘example.csv‘, ‘r‘) as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

For JSON, use the json module:

import json

data = {
    ‘name‘: ‘Alice‘,
    ‘age‘: 30,
    ‘city‘: ‘New York‘
}

with open(‘example.json‘, ‘w‘) as file:
    json.dump(data, file)

with open(‘example.json‘, ‘r‘) as file:
    loaded_data = json.load(file)
    print(loaded_data)

These modules handle the details of serializing and parsing the structured formats for you, so you can focus on working with the data itself.

Asynchronous File I/O

In modern async Python web frameworks like FastAPI and Starlette, you‘ll often need to perform file I/O without blocking the main event loop. For this, we can use the aiofiles library to perform non-blocking file operations:

import aiofiles

async def write_file():
    async with aiofiles.open(‘example.txt‘, ‘w‘) as file:
        await file.write(‘Hello async world!\n‘)

async def read_file():
    async with aiofiles.open(‘example.txt‘, ‘r‘) as file:
        contents = await file.read()
        print(contents)

The aiofiles API is very similar to the standard open() function, but uses async with and await to perform the I/O asynchronously. This allows your async web server to handle many concurrent file operations without blocking.

File I/O Performance Tips

When working with large files or many small files in performance-critical code, optimizing your file I/O can have a big impact. Here are a few tips:

  • Use buffering to read and write data in chunks instead of one byte at a time. The default buffer size is usually sufficient (4096 or 8192 bytes on most systems), but you can experiment with different sizes to find the optimum.

  • When processing structured data, consider binary formats like pickle, msgpack, or parquet instead of JSON or CSV. Binary formats are much more compact and faster to serialize and parse. For example, msgpack is up to 50x faster than pickle and 5-10x faster than JSON.

  • If you need to read an entire file into memory, use read() instead of readlines() or iterating over the file object. read() is implemented in C and can be much faster for large files.

  • Be careful with global interpreter lock (GIL) contention in multithreaded code that does heavy file I/O. The GIL can cause threads to block waiting for I/O, negating the benefits of concurrency. Consider using multiprocessing or async I/O instead for CPU-bound or I/O-bound workloads, respectively.

  • Profile and measure before optimizing. Use Python‘s built-in cProfile module or third-party tools like py-spy to identify file I/O bottlenecks in your code. Don‘t waste time optimizing code paths that aren‘t performance-critical.

Real-World Examples

To solidify these file I/O concepts, let‘s look at a few real-world examples you‘re likely to encounter as a Python web developer.

Web Scraping

When scraping websites, you‘ll often want to cache the HTML locally to avoid re-fetching it on every run. Here‘s a simple caching function using file I/O:

import os
import requests

def cached_get(url, cache_dir=‘cache‘):
    filename = url.split(‘/‘)[-1]
    filepath = os.path.join(cache_dir, filename)

    if os.path.exists(filepath):
        with open(filepath, ‘r‘) as file:
            content = file.read()
    else:
        os.makedirs(cache_dir, exist_ok=True)

        response = requests.get(url)
        content = response.text

        with open(filepath, ‘w‘) as file:
            file.write(content)

    return content

This function takes a URL and a cache directory, fetches the contents of the URL, and saves it to a file in the cache directory. On subsequent calls with the same URL, it reads the cached file instead of re-fetching the data. This can significantly speed up scrapers that revisit the same pages often.

Data Processing

In data-heavy web services, you‘ll frequently need to export data from a database to a file for analysis or import data from files into a database. Here‘s an example of exporting MySQL data to a CSV file:

import csv
import mysql.connector

db = mysql.connector.connect(
    host=‘localhost‘,
    user=‘user‘,
    password=‘password‘,
    database=‘example‘
)

cursor = db.cursor()
cursor.execute(‘SELECT * FROM users‘)

with open(‘users.csv‘, ‘w‘, newline=‘‘) as file:
    writer = csv.writer(file)
    writer.writerow([i[0] for i in cursor.description])
    writer.writerows(cursor)

db.close()

This code connects to a MySQL database, executes a SELECT query, and writes the results to a CSV file using the csv module. The cursor.description attribute provides the column names for the header row.

You can adapt this pattern to export data from any database to any file format, or to import data from files into a database using SQL INSERT statements.

Logging

Proper logging is crucial for debugging and monitoring production web services. Python‘s built-in logging module makes it easy to write logs to files:

import logging

logging.basicConfig(
    filename=‘example.log‘, 
    level=logging.DEBUG,
    format=‘%(asctime)s %(levelname)s: %(message)s‘
)

logging.debug(‘This is a debug message‘)
logging.info(‘This is an info message‘)
logging.warning(‘This is a warning message‘)
logging.error(‘This is an error message‘)
logging.critical(‘This is a critical message‘)

This configures the logging module to write logs to a file called example.log with a custom format including the timestamp, log level, and message. You can then use the different logging methods (debug(), info(), etc.) to write messages at different severity levels.

The logging module supports rotating log files, different handlers for different log levels, and much more. See the official docs for details.

Conclusion

In this comprehensive guide, we‘ve covered everything you need to know to work with files effectively in Python. Some key takeaways:

  • Use open() in write mode (‘w‘) to create or overwrite a file, append mode (‘a‘) to add to an existing file, and read mode (‘r‘) to read a file‘s contents.
  • Always close files after you‘re done with them, preferably using a with block to ensure cleanup even if an exception is raised.
  • Use binary mode (‘b‘) for non-text data like images, archives, and serialized objects.
  • The os and os.path modules provide portable functions for working with files and directories across different operating systems.
  • Python has built-in support for reading and writing structured file formats like CSV and JSON using the csv and json modules.
  • When working with async frameworks, use libraries like aiofiles for non-blocking file I/O.
  • Optimize file I/O by using buffering, minimizing disk seeks, and choosing efficient file formats for your use case.

With these skills and best practices, you‘re well-equipped to handle any file-related tasks in your Python web dev projects. So get out there and start reading and writing!

For further reading, check out Python‘s official open() docs, File and Directory Access, and Structured File Formats chapters. The RealPython File I/O Guide and Google‘s Python File I/O Tutorial are also excellent resources to deepen your understanding.

Happy coding, and may your file handling be bug-free and performant!

Similar Posts