Python Error : UnicodeDecodeError: ‘charmap’ codec can’t decode byte : Resolved

user October 25, 2023

The UnicodeDecodeError you’re seeing is a common error encountered when reading files in Python. It means that while decoding the file contents, Python came across a byte or sequence of bytes that doesn’t map to a valid character in the expected encoding (in this case, the default encoding, which is often ‘cp1252’ on Windows).

Here are steps to resolve the issue:

Explicitly Specify the Encoding:

Often, CSV files and text data are encoded in UTF-8. You can specify the encoding when you open the file:

with open('filename.csv', 'r', encoding='utf-8') as file:
    ...

If you’re not sure about the file’s encoding, you might need to find out from its source or through trial and error with common encodings like ‘utf-8’, ‘latin1’, ‘iso-8859-1’, etc.

Use a Different Error Handler:

If you’re okay with ignoring or replacing characters that aren’t valid in the expected encoding, you can provide an errors argument to the open function:

errors='ignore': This will ignore invalid encoding errors, which means that any problematic characters will be skipped.
errors='replace': This will replace problematic characters with the Unicode replacement character U+FFFD (�).

with open('filename.csv', 'r', encoding='utf-8', errors='replace') as file:
    ...

Investigate the File:

If you’re still having trouble, you might need to investigate the file itself. The error message provides the byte position (6125 in your case) of the problematic character. You can extract a portion of the file around this position to see the context:

with open('filename.csv', 'rb') as file:
    file.seek(6100)  # a bit before the problematic position
    print(file.read(50))  # read 50 bytes around the problem

This will give you a byte sequence that includes the problematic character. This might give you clues as to the file’s encoding or the nature of the data corruption.

Refer more on python here : Python