Spark 教程

Spark SQL

Spark 笔记

Spark MLlib

pyspark 运行报错 TypeError: an integer is required (got type bytes) 的原因及解决方法

Spark 笔记 Spark 笔记


pyspark 如果是 2.4.x 版本以及 python 环境是 3.8 时,会报 TypeError: an integer is required (got type bytes) 错误,那如何 fix 该 error 呢。

错误信息

错误信息可能如下:

Traceback (most recent call last):
  File "/xxx/xxx/xxx.py", line 2, in <module>
    from pyspark.sql import SparkSession
  File "/xxx/xxx/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
    from pyspark.context import SparkContext
  File "/xxx/xxx/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
    from pyspark import accumulators
  File "/xxx/xxx/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "/xxx/xxx/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
    from pyspark import cloudpickle
  File "/xxx/xxx/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "/xxx/xxx/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes)

原因及解决方法

打印如上错误异常是因为 spark 2.4.x 还不支持 python 3.8 版本,需要将执行代码的 python 环境降级到 3.7 版本或以下即可解决。