Lesson 17: Files & Strings#
Overview#
Programs often need to save data or read information created elsewhere. In this lesson, you’ll learn how to work with text files and manipulate string data.
Learning Objectives#
By the end of this lesson, you should be able to:
Open, read, write, and append text files safely.
Differentiate file modes (“r”, “w”, “a”) and understand overwriting vs appending.
Apply common string methods (strip, split, join, replace) to clean and combine text.
Load and dump basic JSON structures and explain pretty-printing and character/encoding
Prerequisites#
Comfortable with navigating folders and file paths
Basic variable and loop syntax
Lesson Outline#
Introduction (5 min)
When you think about your daily life outside of programming, you might not realize how often you interact with text-based data. Although it may not feel like text files are everywhere, things like boarding passes, class schedules, directions, notifications, receipts, etc. are all built on structured text. However, these things can be messy, including inconsistent spacing, symbols, or formatting differences between different sources.
Because we rely so much on text, the ability to read, write, and manipulate this data gives your programs a way to interact with the world. String operations make this information usable, and almost every interaction your program will have with external data involves strings in some way.
In this lesson, we’ll explore how to work safely with files and build on some string tools you’ve seen earlier in the course, applying them to more realistic text.
1. Working with files (15 min)#
1. Accessing files#
The first step to interacting with external information is opening and reading the files safely. Python provides many ways to work with files, but the pathlib module is a reliable way to open files cleanly without having to specify absolute raw string paths. A Path creates a reference to a file on your computer, which allows you to open it with a context manager.
Using a context manager to open and work with files addresses the ‘safe’ part of working with files, as it ensures the file is properly closed. This ensures files don’t become ‘locked’ or you end up with incomplete writes.
from pathlib import Path
path = Path(" .txt")
with path.open(mode="r", encoding="utf-8") as f:
for line in f:
print(repr(line))
This example opens a file in read mode and prints each line. The
repr() function shows invisible characters like newline characters.
As soon as the code in the with block ends, Python automatically closes
the file.
2. Reading files#
Using the “r” mode opens a file for reading only. Reading line-by-line is optimal, as Python only has to load one line of memory at a time instead of loading the entire file. This is especially important when working with log files, transcripts, large datasets, or long text documents.
3. Writing and appending#
Writing to a file is similar to reading, but you choose a different mode depending on what you want to do.
with open(" .txt", "w", encoding="utf-8") as f:
f.write("first line\n")
with open(" .txt", "a", encoding="utf-8") as f:
f.write("another line\n")
Opening a file in “w” mode overwrites any existing content which essentially creates a fresh file even if one existed before. Using “a” mode appends new content to the end without erasing what previously existed.
4. Misconceptions#
Working with files introduces man challenges. A “file not found” error
usually means the path is incorrect. Reading text created on different
systems may result in encoding problems, which is why explicitly using
“utf-8”, a very common text encoding standard, is a good habit. Another
common issue is forgetting to include newlines (\n) between writes
to a file, which can smash lines together and lead to readability issues
when opening later.
2. String Methods (15 min)#
1. Trimming whitespace#
Many files include extra spaces, tabs, or newline characters. Methods
like strip(), lstrip(), and rstrip() remove these in
different ways.
text = " Hello, world! \n"
print(repr(text))
print(repr(text.strip()))
print(repr(text.lstrip()))
print(repr(text.rstrip()))
2. Case transformations#
Changing the case of text is useful for standardizing comparisons between text or creating clean output.
print(repr(text.lower()))
print(repr(text.upper()))
3. Splitting and joining#
Many files store lists of items, often separated by some special character called a delimiter. A comma, for example, is one of the most common delimiters, which are found in Comma-Separated Value (CSV) files. Similarly, Tab-Separated Value (TSV) files use tabs as their delimiter. These formats look like plain text when opened, but the structure of a list or table in the file comes from how the text is separated by these delimiters.
One way to parse this type of text is through Python’s split()
method.
sentence = "apples,bananas,pears"
words = sentence.split(",")
print(words) # ['apples', 'bananas', 'pears']
You can also combine a list of strings with the join() method by
specifying the desired delimiter.
joined = " | ".join(words)
print(joined) # "apples | bananas | pears"
4. Searching and replacing#
Searching within a string might help locate information you care about.
For example, you might want to check a file for a specific keyword such
as “ERROR”. The find() method returns the index where the substring
appears (or -1 if it isn’t found), which makes it useful for filtering
text or detecting certain patterns.
Replacing text allows you to correct certain patterns, update terminology, or convert text into specific formats that might make it easier for your program to handle.
message = "hello world"
print(message.find("world")) # 6
print(message.find("Python")) # -1
updated = message.replace("world", "Python")
print(updated) # "hello Python"
print(updated.startswith("hello")) # True
print(updated.endswith("Python")) # True
5. Other useful tools#
Python provides other useful string operations that might be useful.
len()returns the length of a stringinchecks if a substring appears anywhere in a larger stringisdigit()andisalpha()detect types of charactersstartswith()andendswith()detect prefixes/suffixes in a stringcount()checks how many times a substring appears
3. JSON basics#
Not all text files are unstructured paragraphs or even delimiter
separated data (like CSV files). JSON is another widely used structured
text format because it represents lists and dictionaries in a very
simple way. The json module provides a straightforward way of
working with JSON files.
import json
with open("config.json") as f:
data = json.load(f)
with open("output.json", "w") as f:
json.dump(data, f, indent=2)
4. Guided practice#
Exercise 1: Reading and cleaning file output#
Write a short program that:
Asks the user to enter a file name
Opens the file safely
Prints each line with leading and trailing whitespace removed
Prints the total number of lines in the file
Exercise 2: Word counter#
Read the file
Split each line into individual words
Normalize words
Count how many times each distinct word appears
Print the most frequent word
Exercise 3: JSON modification#
Load JSON data from a file
Modify one field based on user input
Save the updated JSON back to a new file
5. Assessment#
In many real systems such as healthcare, customer support, cybersecurity, etc., contact information is often messy and inconsistent. You will write a Python function that reads a text file containing a few messy contact records. Each line of the file contains one record, and each record may vary in spacing, capitalization, formatting, and missing fields. Your job is to transform each record into a clean, standardized format.
Input file example:#
name: jAnE DOE , email= Jane.Doe@Example.com , phone= 555-1234
NAME: alic Johnson , phone= 222-444
name: MARK TWAIN, email= mark.twain@books.org
Email=someone@nowhere.net , name: nobody special
name: John Johnson
Write a function:#
def clean_contact(record):
...
The function must:#
Read the file line-by-line, treating each line as one contact record
Trim any leading or trailing whitespace
Normalize labels case-insensitively
Split each record into cleaned fields
Properly capitalize the name
Standardize email formatting
If a field is missing, fill in with “UNKNOWN”
Return a pipe-delimited (|) file
Expected Output:#
NAME: Jane Doe | EMAIL: jane.doe@example.com | PHONE: 555-1234
NAME: Alice Johnson | EMAIL: UNKNOWN | PHONE: 222-444
NAME: Mark Twain | EMAIL: mark.twain@books.org | PHONE: UNKNOWN
NAME: Nobody Special | EMAIL: someone@nowhere.net | PHONE: UNKNOWN
NAME: John Johnson | EMAIL: UNKNOWN | PHONE: UNKNOWN
6. Challenge problems#
(End of PDF content for this section.)
7. Summary#
Key takeaways:#
Use context managers (with statements) to handle files safely
Understand file modes (“r”, “w”, “a”) and when to use each
Apply string operations to clean and interpret real-world data
Recognize that even simple tasks often require combining file I/O with multiple string methods
Connection to next lessons:#
The next lessons (L18/L19) will build on this by introducing error handling, where you’ll learn to:
Detect and handle missing files or incorrect file names
Address improperly formatted data
Anticipate edge cases
Write code that fails safely rather than crashing