A Python tool for parsing Outlook email files (.olm) and extracting email content, including subject, sender, body, and attachments.
Have you ever had a large *.olm email backup archive (the only back up you had) go corrupt as in nothing else can repair the *.olm structure. (and you just want to salvage data)
All because out of curiosity, you decided to inspect the exported backup contents, looking to see if you got all the accounts?
Oh Pehaps you changed the .olm file extension to .zip (to confirm payload, accounts etc) , once sastified with the contents, you renamed the extention back to .olm. (somehow corrupted the file)
This project provides a simple and efficient way to parse Outlook email ('.xml') files from .olm and extract the email content.
The tool uses Python's built-in xml.etree.ElementTree library to parse the XML structure of the email file and extract the relevant information.
- Python 3.6 or later
- No external dependencies required
- Clone this repository using
git clone https://github.com/SipTech/extract_olm_data_to_eml.git - Navigate to the project directory using
cd extract_olm_data_to_eml - Run the tool using
python parse_xml_to_email.py
- Rename the corrupt backup file's file extenstion to
.zip(makes it a regular directory you can browse) - Place the backup file (.zip) in the project's root directory or any external disk as path i.e
/volume/myDrive/Microsoft_Outlook_backup.zipor/<mnt|media>/myDrive/Microsoft_Outlook_backup.zipas yourzip_file_path. - Provide the output directory to hold the resulting
.emlfiles i.e/volume/myDrive/Microsoft_Outlook_EML_EXTRACTor/<mnt|media>/myDrive/Microsoft_Outlook_EML_EXTRACTas youroutput_dir. (will be created if not exist) - Run the script using
python3 extract.py <zip_file_path> <output_dir> - The tool will extract all email content recursively and save it to a new file with the same name as the original files, but with a
.emlextension
The tool will output a new file with the same name as the original file, but with a .eml extension. For example, if the original file is named message_03349.xml, the output file will be named message_03349.eml.
- Make sure to 1st change the corrupt Outlook's file extention from ".olm" to ".zip" (this turns it into regular zip archive)
- Check the console output for any error messages
This project is open-source and welcomes contributions from the community. If you'd like to contribute, please fork this repository and submit a pull request with your changes.
This project is licensed under the MIT License. See the LICENSE file for more information.