Python错误集锦：TesseractError: (1, ‘Error opening data file d:devTesseract-OCR5.0.0tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory.

发表于2021年9月2日2021年11月18日作者桔子菌

内容目录

错误提示：
错误原因：
解决方法：
扩展内容：

原文链接：http://www.juzicode.com/python-error-tesseracterror-opening-data-file-tessdata-prefix-environment-variable-error

错误提示：

tesseract使用config选项配置语言包路径时，提示TesseractError: (1, ‘Error opening data file d:devTesseract-OCR5.0.0tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory. Failed loading language \’chi_sim\’ Tesseract couldn\’t load any languages! Could not initialize tesseract.’)

#juzicode.com / VX公众号:桔子code  
import pytesseract as ts
text = ts.image_to_string('bookseg.png','chi_sim',config='--tessdata-dir d:\\dev\\Tesseract-OCR5.0.0\\tessdata')
print(text)

==========运行结果：
---------------------------------------------------------------------------
TesseractError                            Traceback (most recent call last)
<ipython-input-18-14ca11a7f8d2> in <module>
      1 #juzicode.com / VX公众号:桔子code
      2 import pytesseract as ts
----> 3 text = ts.image_to_string('bookseg.png','chi_sim',config='--tessdata-dir d:\\dev\\Tesseract-OCR5.0.0\\tessdata')
      4 print(text)

d:\python\python38\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
    407     args = [image, 'txt', lang, config, nice, timeout]
    408 
--> 409     return {
    410         Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    411         Output.DICT: lambda: {'text': run_and_get_output(*args)},

d:\python\python38\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
    410         Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    411         Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 412         Output.STRING: lambda: run_and_get_output(*args),
    413     }[output_type]()
    414 

d:\python\python38\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
    285         }
    286 
--> 287         run_tesseract(**kwargs)
    288         filename = kwargs['output_filename_base'] + extsep + extension
    289         with open(filename, 'rb') as output_file:

d:\python\python38\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    261     with timeout_manager(proc, timeout) as error_string:
    262         if proc.returncode:
--> 263             raise TesseractError(proc.returncode, get_errors(error_string))
    264 
    265 

TesseractError: (1, 'Error opening data file d:devTesseract-OCR5.0.0tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'chi_sim\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

错误原因：

1、使用config参数指定语言包路径时，不能使用右斜杠，只能使用左斜杠。

解决方法：

1、修改路径表示方法，右斜杠改为左斜杠

#juzicode.com / VX公众号:桔子code  
import pytesseract as ts
#text = ts.image_to_string(img_fn,lang,config='--tessdata-dir d:\\dev\\Tesseract-OCR5.0.0\\tessdata')
text = ts.image_to_string(img_fn,lang,config='--tessdata-dir d:/dev/Tesseract-OCR5.0.0/tessdata')
print(text)

扩展内容：

如果本文还没有完全解决你的疑惑，你也可以在微信公众号“桔子code”后台给我留言，欢迎一起探讨交流。

2025年 6月
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

错误提示：

错误原因：

解决方法：

扩展内容：

发表评论 取消回复

发表评论取消回复