Python轮子：压缩文件神器lzma

发表于2025年8月7日2025年8月14日作者桔子菌

内容目录

应用场景
模块导入
使用方法
总结

原文链接：http://www.juzicode.com/python-module-lzma

Python的lzma模块是标准库中用于处理LZMA/LZMA2压缩算法的工具，提供高效的数据压缩和解压功能，支持.xz文件格式，特别适合处理需要高压缩比的场景。

应用场景

大型数据集的高效存储
软件分发包的压缩
日志文件的归档存储
网络传输中的数据压缩
数据库备份压缩
科研数据的长期存储

模块导入

import lzma

使用方法

1）压缩文件

# juzicode.com/VX公众号:juzicode
import lzma
import os

# 原始文件
input_file = 'large_data.csv'
# 压缩后的文件
compressed_file = 'large_data.csv.xz'

# 使用LZMA压缩文件
with open(input_file, 'rb') as fin:
    with lzma.open(compressed_file, 'wb') as fout:
        fout.write(fin.read())
        
print(f"文件压缩完成，原始大小: {os.path.getsize(input_file)} 字节, "
      f"压缩后大小: {os.path.getsize(compressed_file)} 字节")

运行结果：生成压缩文件并显示压缩率：

文件压缩完成，原始大小: 24021 字节, 压缩后大小: 172 字节

2）解压文件

# juzicode.com/VX公众号:juzicode
import lzma

compressed_file = 'large_data.csv.xz'
decompressed_file = 'decompressed_data.csv'

# 解压LZMA文件
with lzma.open(compressed_file, 'rb') as fin:
    with open(decompressed_file, 'wb') as fout:
        fout.write(fin.read())
        
print("文件解压完成")

运行结果：生成解压后的原始文件

3）流式压缩数据

# juzicode.com/VX公众号:juzicode
import lzma

# 创建压缩器
compressor = lzma.LZMACompressor()

# 分块处理数据
chunks = [b'abcdefg' * 100, b'13800138000' * 100, b'ABCDEFG' * 100]
compressed_chunks = []

for chunk in chunks:
    compressed_chunks.append(compressor.compress(chunk))
    
# 结束压缩
compressed_chunks.append(compressor.flush())

# 合并压缩数据
compressed_data = b''.join(compressed_chunks)
print(f"原始数据大小: {sum(len(c) for c in chunks)} 字节, "
      f"压缩后大小: {len(compressed_data)} 字节")

运行结果：输出原始数据与压缩后数据大小对比

原始数据大小: 2500 字节, 压缩后大小: 112 字节

4）自定义压缩参数

# juzicode.com/VX公众号:juzicode
import lzma

# 自定义压缩参数
custom_filters = [
    {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME},
    # 可添加更多过滤器
]

# 使用自定义设置压缩文件
with open('large_data.csv', 'rb') as fin:
    with lzma.open('custom_compressed.xz', 'wb', filters=custom_filters) as fout:
        fout.write(fin.read())
        
print("使用自定义参数压缩完成")

5）多线程压缩

# juzicode.com/VX公众号:juzicode
import lzma
import concurrent.futures

def compress_chunk(chunk):
    """压缩数据块"""
    return lzma.compress(chunk)

# 大文件分块压缩
def parallel_compress(input_file, output_file, chunk_size=1024*1024):
    with open(input_file, 'rb') as fin:
        with concurrent.futures.ThreadPoolExecutor() as executor:
            compressed_chunks = []
            while True:
                chunk = fin.read(chunk_size)
                if not chunk:
                    break
                compressed_chunks.append(executor.submit(compress_chunk, chunk))
            
            with open(output_file, 'wb') as fout:
                for future in concurrent.futures.as_completed(compressed_chunks):
                    fout.write(future.result())
                    
    print("多线程压缩完成")

# 使用示例
parallel_compress('huge_file.bin', 'huge_file.bin.xz')

6）创建带校验和的压缩文件

# juzicode.com/VX公众号:juzicode
import lzma
import hashlib

def create_verified_archive(input_file, output_file):
    """创建带校验和的压缩文件"""
    # 计算原始文件校验和
    with open(input_file, 'rb') as fin:
        sha256_original = hashlib.sha256(fin.read()).hexdigest()
    
    # 压缩文件
    with open(input_file, 'rb') as fin:
        with lzma.open(output_file, 'wb') as fout:
            fout.write(fin.read())
    
    # 将校验和写入单独文件
    with open(output_file + '.sha256', 'w') as f:
        f.write(sha256_original)
    
    print(f"压缩文件创建完成，校验和已保存到 {output_file}.sha256")

# 使用示例
create_verified_archive('important_data.db', 'important_data.db.xz')

总结

lzma模块核心优势：

提供极高的压缩比，优于gzip和zlib
Python 3.3+标准库内置，无需额外安装
支持流式压缩和解压
提供多种压缩预设和自定义选项

注意事项：

压缩速度较慢（尤其是高压缩级别）
内存使用量较高
解压需要完整的数据块

2025年 12月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

应用场景

模块导入

使用方法

1）压缩文件

2）解压文件

3）流式压缩数据

4）自定义压缩参数

5）多线程压缩

6）创建带校验和的压缩文件

总结

发表评论 取消回复

发表评论取消回复