# Reed–Solomon 编码（翻译）

## 译者序

Reed–Solomon 纠错码是一种特殊类型的纠错码。它是最古老的一种纠错类型，但现今仍被广泛使用，它是应用于公共领域的优秀的几种高效算法之一。

## 纠错原则

1. 检测哪些词被破坏了（因为它们不在简化字典中）；
2. 通过在字典中找到最相似的单词来纠正损坏的单词。

t h i s
t h a t
c o r n


t h i s a b c d
t h a t b c d e
c o r n c d e f


t * * * a b * d


t h o s b c d e


## QR 码结构

L 01 7 19
M 00 10 16
Q 11 13 13
H 10 17 9

### 信息数据

信息数据字节：40 d2 75 47 76 17 32 06 27 26 96 c6 c6 96 70 ec



### 解码

（以上长度字段大小仅对版本较小的 QR 码有效。）

## BCH 编码

### BCH 错误检测

def qr_check_format(fmt):
g = 0x537 # = 0b10100110111 in python 2.6+
for i in range(4,-1,-1):
if fmt & (1 << (i+10)):
fmt ^= g << i
return fmt


Python 注解：range函数对于非 Python 程序员可能比较陌生。它产生一个 4 到 0 的数字列表。在类 C 语言中，for循环语句应写为：for(i=4;i>=0;i--)；在类 Pascal 语言中为：for i := 4 downto 0

Python 注解2：&操作符执行按位与<<左移。这与类 C 语言是一致的。

encoded_format = (format<<10) ^ qr_check_format(format<<10)


$10100110111 =$ $1 x^{10} + 0 x^9 + 1 x^8 + 0 x^7 + 0 x^6$ $+ 1 x^5 + 1 x^4 + 0 x^3 + 1 x^2 + 1 x + 1$ $= x^{10} + x^8 + x^5 + x^4 + x^2 + x + 1$

### BCH 纠错

def hamming_weight(x): #计算一个二进制数中 1 的个数
weight = 0
while x > 0:
weight += x & 1
x >>= 1
return weight

def qr_decode_format(fmt):
best_fmt = -1
best_dist = 15
for test_fmt in range(0,32):
test_code = (test_fmt<<10) ^ qr_check_format(test_fmt<<10)
test_dist = hamming_weight(fmt ^ test_code)
if test_dist < best_dist:
best_dist = test_dist
best_fmt = test_fmt
elif test_dist == best_dist:
best_fmt = -1
return best_fmt


>>> from qr import *
>>> qr_decode_format(int("000111101011001",2))  # no errors
3
>>> qr_decode_format(int("111111101011001",2))  # 3 bit-errors
3
>>> qr_decode_format(int("111011101011001",2))  # 4 bit-errors
-1


## 有限域算法

### 数域简介

$10101010 =$ $1 \times x^7 + 0 \times x^6 + 1 \times x^5$ $+ 0 \times x^4 + 1 \times x^3 + 0 \times x^2 + 1 \times x + 0$ $= x^7 + x^5 + x^3 + x$

### 加法和减法

0101 + 0110 = 0101 - 0110 = 0101 XOR 0110 = 0011


$(x^2 + 1) + (x^2 + x)$ $= 2 x^2 + x + 1$ $= 0 x^2 + x + 1 = x + 1$

def gf_add(x, y):
return x ^ y

def gf_sub(x, y):
return x ^ y # in binary galois field, subtraction is just the same as addition (since we mod 2)


### 乘法

$(x^7 + x^3 + 1) (x^5 + x^3 + x)$ $= x^7 (x^5 + x^3 + x) + x^3 (x^5 + x^3 + x) + 1 (x^5 + x^3 + x)$ $= x^{12} + x^{10} + 2 x^8 + x^6 + x^5 + x^4 + x^3 + x$ $= x^{12} + x^{10} + x^6 + x^5 + x^4 + x^3 + x$

def cl_mul(x,y):
'''Bitwise carry-less multiplication on integers'''
z = 0
i = 0
while (y>>i) > 0:
if y & (1<<i):
z ^= x<<i
i += 1
return z


def gf_mult_noLUT(x, y, prim=0):
'''在伽罗华域中进行乘法，且不使用预先计算的查找表（因此它更慢）
利用不可约素多项式的标准无进位乘法 + 模降'''

### 将位无进位位操作定义为内部函数 ###
def cl_mult(x,y):
'''整数上的位无进位乘法'''
z = 0
i = 0
while (y>>i) > 0:
if y & (1<<i):
z ^= x<<i
i += 1
return z

def bit_length(n):
'''计算整数的最靠左的 1 的位置。等效于 int.bit_length()'''
bits = 0
while n >> bits: bits += 1
return bits

def cl_div(dividend, divisor=None):
'''按位对整数进行无进位长除法，并返回余数。'''
# 计算每个整数的最靠左的 1 的位置
dl1 = bit_length(dividend)
dl2 = bit_length(divisor)
# 如果被除数小于除数，就退出
if dl1 < dl2:
return dividend
# 否则，将除数最左边的 1 与被除数最左边的 1 对齐(通过左移除数)。
for i in range(dl1-dl2,-1,-1):
# 检查被除数是否可除尽(对第一次迭代无效，但对下一次迭代很重要)。
if dividend & (1 << i+dl2-1):
# 如果可除尽，则将除数左移到对齐最左边 1 位并 XOR(无进位减法)。
dividend ^= divisor << i
return dividend

### 主 GF 乘法例程 ###

# GF 数相乘
result = cl_mult(x,y)
# 然后用不可约素多项式进行模降(即除法中的余数)，使其保持在gf界内。
if prim > 0:
result = cl_div(result, prim)

return result


>>> a = 0b10001001
>>> b = 0b00101010
>>> print(bin(gf_mult_noLUT(a, b, 0))) # multiplication only
0b1010001111010
>>> print(bin(gf_mult_noLUT(a, b, 0x11d))) # multiplication + modular reduction
0b11000011


def gf_mult_noLUT(x, y, prim=0, field_charac_full=256, carryless=True):
'''Galois Field integer multiplication using Russian Peasant Multiplication algorithm (faster than the standard multiplication + modular reduction).
If prim is 0 and carryless=False, then the function produces the result for a standard integers multiplication (no carry-less arithmetics nor modular reduction).'''
r = 0
while y: # while y is above 0
if y & 1: r = r ^ x if carryless else r + x # y is odd, then add the corresponding x to r (the sum of all x's corresponding to odd y's will give the final product). Note that since we're in GF(2), the addition is in fact an XOR (very important because in GF(2) the multiplication and additions are carry-less, thus it changes the result!).
y = y >> 1 # equivalent to y // 2
x = x << 1 # equivalent to x*2
if prim > 0 and x & field_charac_full: x = x ^ prim # GF modulo: if x >= 256 then apply modular reduction using the primitive polynomial (we just subtract, but since the primitive number can be above 256 then we directly XOR).

return r


### 对数乘法

$\alpha^0 = 00000001$ $\alpha^4 = 00010000$ $\alpha^8 = 00011101$ $\alpha^{12} = 11001101$
$\alpha^1 = 00000010$ $\alpha^5 = 00100000$ $\alpha^9 = 00111010$ $\alpha^{13} = 10000111$
$\alpha^2 = 00000100$ $\alpha^6 = 01000000$ $\alpha^{10} = 01110100$ $\alpha^{14} = 00010011$
$\alpha^3 = 00001000$ $\alpha^7 = 10000000$ $\alpha^{11} = 11101000$ $\alpha^{15} = 00100110$

$10001001 \times 00101010$ $= \alpha^{74} \times \alpha^{142}$ $= \alpha^{74 \times 142} = \alpha^{216} = 11000011$

gf_exp = [0] * 512 # 创建512个元素的列表，考虑使用字节数组
gf_log = [0] * 256

def init_tables(prim=0x11d):
'''预先计算对数表和逆对数表，以便以后使用所提供的本原多项式进行更快的计算。'''
# 素数是本原(二元)多项式。因为它是二元意义上的多项式，
# 实际上，它只是0到255之间的一个伽罗华域值，而不是 GF 值的列表。
global gf_exp, gf_log
gf_exp = [0] * 512 # 逆对数(指数)表
gf_log = [0] * 256 # 日志表
# 对于伽罗华域 2^8 中的每个可能值，我们将预先计算该值的对数和反对数(指数)。
x = 1
for i in range(0, 255):
gf_exp[i] = x # 计算此值的逆对数并将其存储在表中。
gf_log[x] = i # 同时计算对数
x = gf_mult_noLUT(x, 2, prim)

# 如果只使用生成器=2或幂为2，则可以使用以下比gf_mult_nolut()更快的方法：
#x <<= 1 # multiply by 2 (change 1 by another number y to multiply by a power of 2^y)
#if x & 0x100: # similar to x >= 256, but a lot faster (because 0x100 == 256)
#x ^= prim # substract the primary polynomial to the current value (instead of 255, so that we get a unique set made of coprime numbers), this is the core of the tables generation

# 优化：使逆对数表的大小加倍，这样我们就不需要使用 mod 255
# 保持在边界内（因为我们将主要使用这个表来计算两个 GF 数的乘法）
for i in range(255, 512):
gf_exp[i] = gf_exp[i - 255]
return [gf_log, gf_exp]


Python 注解：ragne运算符的上界是排他性的，因此gf_exp[255]不会被上面设置两次。

def gf_mul(x,y):
if x==0 or y==0:
return 0
return gf_exp[gf_log[x] + gf_log[y]] # should be gf_exp[(gf_log[x]+gf_log[y])%255] if gf_exp wasn't oversized


### 除法

def gf_div(x,y):
if y==0:
raise ZeroDivisionError()
if x==0:
return 0
return gf_exp[(gf_log[x] + 255 - gf_log[y]) % 255]


Python 注解：raise语句抛出一个异常，并中止gf_div函数的执行。此除法，gf_div(gf_mul(x,y),y)==x，对于任何 x 和任何非零 y 有效。

### 幂与逆

def gf_pow(x, power):
return gf_exp[(gf_log[x] * power) % 255]

def gf_inverse(x):
return gf_exp[255 - gf_log[x]] # gf_inverse(x) == gf_div(1, x)


### 多项式

$00000001 x^4 + 00001111 x^3 + 00110110 x^2$ $+ 01111000 x + 01000000$ $= 01 x^4 + 0f x^3 + 36 x^2 + 78 x + 40$

def gf_poly_scale(p,x):
r = [0] * len(p)
for i in range(0, len(p)):
r[i] = gf_mul(p[i], x)
return r


Python 程序员注意：此函数不是以“pythonic”样式编写的。它可以很优雅地表达为一个列表推导式，但我限制自己的语言特性，更容易翻译到其他编程语言。

def gf_poly_add(p,q):
r = [0] * max(len(p),len(q))
for i in range(0,len(p)):
r[i+len(r)-len(p)] = p[i]
for i in range(0,len(q)):
r[i+len(r)-len(q)] ^= q[i]
return r


def gf_poly_mul(p,q):
'''Multiply two polynomials, inside Galois Field'''
# Pre-allocate the result array
r = [0] * (len(p)+len(q)-1)
# Compute the polynomial multiplication (just like the outer product of two vectors,
# we multiply each coefficients of p with all coefficients of q)
for j in range(0, len(q)):
for i in range(0, len(p)):
r[i+j] ^= gf_mul(p[i], q[j]) # equivalent to: r[i + j] = gf_add(r[i+j], gf_mul(p[i], q[j]))
# -- you can see it's your usual polynomial multiplication
return r


$01 x^4 + 0f x^3 + 36 x^2 + 78 x + 40$ $= (((01 x + 0f) x + 36) x + 78) x + 40$

def gf_poly_eval(poly, x):
'''Evaluates a polynomial in GF(2^p) given the value for x. This is based on Horner's scheme for maximum efficiency.'''
y = poly[0]
for i in range(1, len(poly)):
y = gf_mul(y, x) ^ poly[i]
return y


## Reed–Solomon 码

### RS 编码

#### 异常管理

class ReedSolomonError(Exception):
pass


raise ReedSolomonError("Some error message")


try:
raise ReedSolomonError("Some error message")
except ReedSolomonError, e:
pass # do something here


#### RS 生成多项式

Reed–Solomon 码使用类似于 BCH 码的生成多项式（不要与生成数α混淆）。生成器是因子$(x-\alpha^n)$的乘积，以$n=0$作为 QR 码的起点.例如：

$g_4(x) = (x - \alpha^0) (x - \alpha^1) (x - \alpha^2) (x - \alpha^3)$ $= 01 x^4 + 0f x^3 + 36 x^2 + 78 x + 40$

def rs_generator_poly(nsym):
'''Generate an irreducible generator polynomial (necessary to encode a message into Reed-Solomon)'''
g = [1]
for i in range(0, nsym):
g = gf_poly_mul(g, [1, gf_pow(2, i)])
return g


#### 多项式除法

def gf_poly_div(dividend, divisor):
'''Fast polynomial division by using Extended Synthetic Division and optimized for GF(2^p) computations
(doesn't work with standard polynomials outside of this galois field, see the Wikipedia article for generic algorithm).'''
# CAUTION: this function expects polynomials to follow the opposite convention at decoding:
# the terms must go from the biggest to lowest degree (while most other functions here expect
# a list from lowest to biggest degree). eg: 1 + 2x + 5x^2 = [5, 2, 1], NOT [1, 2, 5]

msg_out = list(dividend) # Copy the dividend
#normalizer = divisor[0] # precomputing for performance
for i in range(0, len(dividend) - (len(divisor)-1)):
#msg_out[i] /= normalizer # for general polynomial division (when polynomials are non-monic), the usual way of using
# synthetic division is to divide the divisor g(x) with its leading coefficient, but not needed here.
coef = msg_out[i] # precaching
if coef != 0: # log(0) is undefined, so we need to avoid that case explicitly (and it's also a good optimization).
for j in range(1, len(divisor)): # in synthetic division, we always skip the first coefficient of the divisior,
# because it's only used to normalize the dividend coefficient
if divisor[j] != 0: # log(0) is undefined
msg_out[i + j] ^= gf_mul(divisor[j], coef) # equivalent to the more mathematically correct
# (but xoring directly is faster): msg_out[i + j] += -divisor[j] * coef

# The resulting msg_out contains both the quotient and the remainder, the remainder being the size of the divisor
# (the remainder has necessarily the same degree as the divisor -- not length but degree == length-1 -- since it's
# what we couldn't divide from the dividend), so we compute the index where this separation is, and return the quotient and remainder.
separator = -(len(divisor)-1)
return msg_out[:separator], msg_out[separator:] # return quotient, remainder.


#### 主函数编码

def rs_encode_msg(msg_in, nsym):
'''Reed-Solomon main encoding function'''
gen = rs_generator_poly(nsym)

# Pad the message, then divide it by the irreducible generator polynomial
_, remainder = gf_poly_div(msg_in + [0] * (len(gen)-1), gen)
# The remainder is our RS code! Just append it to our original message to get our full codeword (this represents a polynomial of max 256 terms)
msg_out = msg_in + remainder
# Return the codeword
return msg_out


def rs_encode_msg(msg_in, nsym):
'''Reed-Solomon main encoding function, using polynomial division (algorithm Extended Synthetic Division)'''
if (len(msg_in) + nsym) > 255: raise ValueError("Message is too long (%i when max is 255)" % (len(msg_in)+nsym))
gen = rs_generator_poly(nsym)
# Init msg_out with the values inside msg_in and pad with len(gen)-1 bytes (which is the number of ecc symbols).
msg_out = [0] * (len(msg_in) + len(gen)-1)
# Initializing the Synthetic Division with the dividend (= input message polynomial)
msg_out[:len(msg_in)] = msg_in

# Synthetic division main loop
for i in range(len(msg_in)):
# Note that it's msg_out here, not msg_in. Thus, we reuse the updated value at each iteration
# (this is how Synthetic Division works: instead of storing in a temporary register the intermediate values,
# we directly commit them to the output).
coef = msg_out[i]

# log(0) is undefined, so we need to manually check for this case. There's no need to check
# the divisor here because we know it can't be 0 since we generated it.
if coef != 0:
# in synthetic division, we always skip the first coefficient of the divisior, because it's only used to normalize the dividend coefficient (which is here useless since the divisor, the generator polynomial, is always monic)
for j in range(1, len(gen)):
msg_out[i+j] ^= gf_mul(gen[j], coef) # equivalent to msg_out[i+j] += gf_mul(gen[j], coef)

# At this point, the Extended Synthetic Divison is done, msg_out contains the quotient in msg_out[:len(msg_in)]
# and the remainder in msg_out[len(msg_in):]. Here for RS encoding, we don't need the quotient but only the remainder
# (which represents the RS code), so we can just overwrite the quotient with the input message, so that we get
# our complete codeword composed of the message + code.
msg_out[:len(msg_in)] = msg_in

return msg_out


>>> msg_in = [ 0x40, 0xd2, 0x75, 0x47, 0x76, 0x17, 0x32, 0x06,
...            0x27, 0x26, 0x96, 0xc6, 0xc6, 0x96, 0x70, 0xec ]
>>> msg = rs_encode_msg(msg_in, 10)
>>> for i in range(0,len(msg)):
...    print(hex(msg[i]), end=' ')
...
0x40 0xd2 0x75 0x47 0x76 0x17 0x32 0x6 0x27 0x26 0x96 0xc6 0xc6 0x96 0x70 0xec
0xbc 0x2a 0x90 0x13 0x6b 0xaf 0xef 0xfd 0x4b 0xe0


Python 版本注意事项：print函数的语法已经改变，本例使用 Python3.0 版本。在 Python 的早期版本（特别是Python2.x）中，将print行替换为print hex(msg[i])，（包括最后逗号）并将range替换为xrange

### RS 解码

#### 解码概述

Reed–Solomon 解码是指从潜在损坏的消息和 RS 代码中返回经过修正的消息的过程。换句话说，解码是使用先前计算的 RS 码修复消息的过程。

1. 计算综合征多项式。这使我们可以使用 Berlekamp-Massey（或其他算法）分析错误的字符，并快速检查输入消息是否被破坏。
2. 计算擦除/错误定位多项式（来自综合征）。这是由 Berlekamp-Massey 计算的，它是一个检测器，它能准确地告诉我们哪些字符被破坏。
3. 计算擦除/错误评估多项式（从综合征和擦除/错误定位多项式）。需要评估字符被篡改的程度（即，帮助计算大小）。
4. 计算擦除/错误幅度多项式（从上面所有3个多项式）：这个多项式也可以称为腐败多项式，因为事实上，它准确地存储了需要从接收到的消息中减去的值，以获得原始的、正确的消息（即，对于被擦除的字符具有正确的值）。换句话说，我们提取噪声并将其存储在这个多项式中，我们只需要从输入消息中删除这个噪声来修复它。
5. 只需从输入消息中减去幅度多项式就可以修复输入消息。

#### 综合征计算

def rs_calc_syndromes(msg, nsym):
'''Given the received codeword msg and the number of error correcting symbols (nsym), computes the syndromes polynomial.
Mathematically, it's essentially equivalent to a Fourrier Transform (Chien search being the inverse).
'''
# Note the "[0] +" : we add a 0 coefficient for the lowest degree (the constant). This effectively shifts the syndrome, and will shift every computations depending on the syndromes (such as the errors locator polynomial, errors evaluator polynomial, etc. but not the errors positions).
# This is not necessary, you can adapt subsequent computations to start from 0 instead of skipping the first iteration (ie, the often seen range(1, n-k+1)),
synd = [0] * nsym
for i in range(0, nsym):
synd[i] = gf_poly_eval(msg, gf_pow(2,i))
return [0] + synd # pad with one 0 for mathematical precision (else we can end up with weird calculations sometimes)


>>> synd = rs_calc_syndromes(msg, 10)
>>> print(synd)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] # not corrupted message = all 0 syndromes
>>> msg[0] = 0  # deliberately damage the message
>>> synd = rs_calc_syndromes(msg, 10)
>>> print(synd)
[64, 192, 93, 231, 52, 92, 228, 49, 83, 245] # when corrupted, the syndromes will be non zero


def rs_check(msg, nsym):
'''Returns true if the message + ecc has no error of false otherwise (may not always catch a wrong decoding or a wrong message, particularly if there are too many errors -- above the Singleton bound --, but it usually does)'''
return ( max(rs_calc_syndromes(msg, nsym)) == 0


#### 擦除校正

def rs_find_errata_locator(e_pos):
'''Compute the erasures/errors/errata locator polynomial from the erasures/errors/errata positions
(the positions must be relative to the x coefficient, eg: "hello worldxxxxxxxxx" is tampered to "h_ll_ worldxxxxxxxxx"
with xxxxxxxxx being the ecc of length n-k=9, here the string positions are [1, 4], but the coefficients are reversed
since the ecc characters are placed as the first coefficients of the polynomial, thus the coefficients of the
erased characters are n-1 - [1, 4] = [18, 15] = erasures_loc to be specified as an argument.'''

e_loc = [1] # just to init because we will multiply, so it must be 1 so that the multiplication starts correctly without nulling any term
# erasures_loc = product(1 - x*alpha**i) for i in erasures_pos and where alpha is the alpha chosen to evaluate polynomials.
for i in e_pos:
e_loc = gf_poly_mul( e_loc, gf_poly_add([1], [gf_pow(2, i), 0]) )
return e_loc


def rs_find_error_evaluator(synd, err_loc, nsym):
'''Compute the error (or erasures if you supply sigma=erasures locator polynomial, or errata) evaluator polynomial Omega
from the syndrome and the error/erasures/errata locator Sigma.'''

# Omega(x) = [ Synd(x) * Error_loc(x) ] mod x^(n-k+1)
_, remainder = gf_poly_div( gf_poly_mul(synd, err_loc), ([1] + [0]*(nsym+1)) ) # first multiply syndromes * errata_locator, then do a  polynomial division to truncate the polynomial to the required length

# Faster way that is equivalent
# remainder = gf_poly_mul(synd, err_loc) # first multiply the syndromes with the errata locator polynomial
# remainder = remainder[len(remainder)-(nsym+1):] # then slice the list to truncate it (which represents the polynomial), which is equivalent to dividing by a polynomial of the length we want

return remainder


def rs_correct_errata(msg_in, synd, err_pos): # err_pos is a list of the positions of the errors/erasures/errata
'''Forney algorithm, computes the values (error magnitude) to correct the input message.'''
# calculate errata locator polynomial to correct both errors and erasures (by combining the errors positions given by the error locator polynomial found by BM with the erasures positions given by caller)
coef_pos = [len(msg_in) - 1 - p for p in err_pos] # need to convert the positions to coefficients degrees for the errata locator algo to work (eg: instead of [0, 1, 2] it will become [len(msg)-1, len(msg)-2, len(msg) -3])
err_loc = rs_find_errata_locator(coef_pos)
# calculate errata evaluator polynomial (often called Omega or Gamma in academic papers)
err_eval = rs_find_error_evaluator(synd[::-1], err_loc, len(err_loc)-1)[::-1]

# Second part of Chien search to get the error location polynomial X from the error positions in err_pos (the roots of the error locator polynomial, ie, where it evaluates to 0)
X = [] # will store the position of the errors
for i in range(0, len(coef_pos)):
l = 255 - coef_pos[i]
X.append( gf_pow(2, -l) )

# Forney algorithm: compute the magnitudes
E = [0] * (len(msg_in)) # will store the values that need to be corrected (substracted) to the message containing errors. This is sometimes called the error magnitude polynomial.
Xlength = len(X)
for i, Xi in enumerate(X):

Xi_inv = gf_inverse(Xi)

# Compute the formal derivative of the error locator polynomial (see Blahut, Algebraic codes for data transmission, pp 196-197).
# the formal derivative of the errata locator is used as the denominator of the Forney Algorithm, which simply says that the ith error value is given by error_evaluator(gf_inverse(Xi)) / error_locator_derivative(gf_inverse(Xi)). See Blahut, Algebraic codes for data transmission, pp 196-197.
err_loc_prime_tmp = []
for j in range(0, Xlength):
if j != i:
err_loc_prime_tmp.append( gf_sub(1, gf_mul(Xi_inv, X[j])) )
# compute the product, which is the denominator of the Forney algorithm (errata locator derivative)
err_loc_prime = 1
for coef in err_loc_prime_tmp:
err_loc_prime = gf_mul(err_loc_prime, coef)
# equivalent to: err_loc_prime = functools.reduce(gf_mul, err_loc_prime_tmp, 1)

# Compute y (evaluation of the errata evaluator polynomial)
# This is a more faithful translation of the theoretical equation contrary to the old forney method. Here it is an exact reproduction:
# Yl = omega(Xl.inverse()) / prod(1 - Xj*Xl.inverse()) for j in len(X)
y = gf_poly_eval(err_eval[::-1], Xi_inv) # numerator of the Forney algorithm (errata evaluator evaluated)
y = gf_mul(gf_pow(Xi, 1), y)

# Check: err_loc_prime (the divisor) should not be zero.
if err_loc_prime == 0:
raise ReedSolomonError("Could not find error magnitude")    # Could not find error magnitude

# Compute the magnitude
magnitude = gf_div(y, err_loc_prime) # magnitude value of the error, calculated by the Forney algorithm (an equation in fact): dividing the errata evaluator with the errata locator derivative gives us the errata magnitude (ie, value to repair) the ith symbol
E[err_pos[i]] = magnitude # store the magnitude for this error into the magnitude polynomial

# Apply the correction of values to get our message corrected! (note that the ecc bytes also gets corrected!)
# (this isn't the Forney algorithm, we just apply the result of decoding here)
msg_in = gf_poly_add(msg_in, E) # equivalent to Ci = Ri - Ei where Ci is the correct message, Ri the received (senseword) message, and Ei the errata magnitudes (minus is replaced by XOR since it's equivalent in GF(2^p)). So in fact here we substract from the received message the errors magnitude, which logically corrects the value to what it should be.
return msg_in


Python 注解：该函数使用[::-1]反转列表中元素的顺序。这是必要的，因为函数并不都使用相同的排序约定（例如，有些先使用最少的项，另一些先使用最大的项）。它还使用了列表推导式，这只是编写for循环的一种简洁方法，其中的条目被附加到列表中，但是 Python 解释器可以对其进行比循环更多的优化。

>>> msg[0] = 0
>>> synd = rs_calc_syndromes(msg, 10)
>>> msg = rs_correct_errata(msg, synd, [0]) # [0] is the list of the erasures locations, here it's the first character, at position 0
>>> print(hex(msg[0]))
0x40


#### 纠错

def rs_find_error_locator(synd, nsym, erase_loc=None, erase_count=0):
'''Find error/errata locator and evaluator polynomials with Berlekamp-Massey algorithm'''
# The idea is that BM will iteratively estimate the error locator polynomial.
# To do this, it will compute a Discrepancy term called Delta, which will tell us if the error locator polynomial needs an update or not
# (hence why it's called discrepancy: it tells us when we are getting off board from the correct value).

# Init the polynomials
if erase_loc: # if the erasure locator polynomial is supplied, we init with its value, so that we include erasures in the final locator polynomial
err_loc = list(erase_loc)
old_loc = list(erase_loc)
else:
err_loc = [1] # This is the main variable we want to fill, also called Sigma in other notations or more formally the errors/errata locator polynomial.
old_loc = [1] # BM is an iterative algorithm, and we need the errata locator polynomial of the previous iteration in order to update other necessary variables.
#L = 0 # update flag variable, not needed here because we use an alternative equivalent way of checking if update is needed (but using the flag could potentially be faster depending on if using length(list) is taking linear time in your language, here in Python it's constant so it's as fast.

# Fix the syndrome shifting: when computing the syndrome, some implementations may prepend a 0 coefficient for the lowest degree term (the constant). This is a case of syndrome shifting, thus the syndrome will be bigger than the number of ecc symbols (I don't know what purpose serves this shifting). If that's the case, then we need to account for the syndrome shifting when we use the syndrome such as inside BM, by skipping those prepended coefficients.
# Another way to detect the shifting is to detect the 0 coefficients: by definition, a syndrome does not contain any 0 coefficient (except if there are no errors/erasures, in this case they are all 0). This however doesn't work with the modified Forney syndrome, which set to 0 the coefficients corresponding to erasures, leaving only the coefficients corresponding to errors.
synd_shift = 0
if len(synd) > nsym: synd_shift = len(synd) - nsym

for i in range(0, nsym-erase_count): # generally: nsym-erase_count == len(synd), except when you input a partial erase_loc and using the full syndrome instead of the Forney syndrome, in which case nsym-erase_count is more correct (len(synd) will fail badly with IndexError).
if erase_loc: # if an erasures locator polynomial was provided to init the errors locator polynomial, then we must skip the FIRST erase_count iterations (not the last iterations, this is very important!)
K = erase_count+i+synd_shift
else: # if erasures locator is not provided, then either there's no erasures to account or we use the Forney syndromes, so we don't need to use erase_count nor erase_loc (the erasures have been trimmed out of the Forney syndromes).
K = i+synd_shift

# Compute the discrepancy Delta
# Here is the close-to-the-books operation to compute the discrepancy Delta: it's a simple polynomial multiplication of error locator with the syndromes, and then we get the Kth element.
#delta = gf_poly_mul(err_loc[::-1], synd)[K] # theoretically it should be gf_poly_add(synd[::-1], [1])[::-1] instead of just synd, but it seems it's not absolutely necessary to correctly decode.
# But this can be optimized: since we only need the Kth element, we don't need to compute the polynomial multiplication for any other element but the Kth. Thus to optimize, we compute the polymul only at the item we need, skipping the rest (avoiding a nested loop, thus we are linear time instead of quadratic).
# This optimization is actually described in several figures of the book "Algebraic codes for data transmission", Blahut, Richard E., 2003, Cambridge university press.
delta = synd[K]
for j in range(1, len(err_loc)):
delta ^= gf_mul(err_loc[-(j+1)], synd[K - j]) # delta is also called discrepancy. Here we do a partial polynomial multiplication (ie, we compute the polynomial multiplication only for the term of degree K). Should be equivalent to brownanrs.polynomial.mul_at().
#print "delta", K, delta, list(gf_poly_mul(err_loc[::-1], synd)) # debugline

# Shift polynomials to compute the next degree
old_loc = old_loc + [0]

# Iteratively estimate the errata locator and evaluator polynomials
if delta != 0: # Update only if there's a discrepancy
if len(old_loc) > len(err_loc): # Rule B (rule A is implicitly defined because rule A just says that we skip any modification for this iteration)
#if 2*L <= K+erase_count: # equivalent to len(old_loc) > len(err_loc), as long as L is correctly computed
# Computing errata locator polynomial Sigma
new_loc = gf_poly_scale(old_loc, delta)
old_loc = gf_poly_scale(err_loc, gf_inverse(delta)) # effectively we are doing err_loc * 1/delta = err_loc // delta
err_loc = new_loc
# Update the update flag
#L = K - L # the update flag L is tricky: in Blahut's schema, it's mandatory to use L = K - L - erase_count (and indeed in a previous draft of this function, if you forgot to do - erase_count it would lead to correcting only 2*(errors+erasures) <= (n-k) instead of 2*errors+erasures <= (n-k)), but in this latest draft, this will lead to a wrong decoding in some cases where it should correctly decode! Thus you should try with and without - erase_count to update L on your own implementation and see which one works OK without producing wrong decoding failures.

# Update with the discrepancy

# Check if the result is correct, that there's not too many errors to correct
while len(err_loc) and err_loc[0] == 0: del err_loc[0] # drop leading 0s, else errs will not be of the correct size
errs = len(err_loc) - 1
if (errs-erase_count) * 2 + erase_count > nsym:
raise ReedSolomonError("Too many errors to correct")    # too many errors to correct

return err_loc


def rs_find_errors(err_loc, nmess): # nmess is len(msg_in)
'''Find the roots (ie, where evaluation = zero) of error polynomial by brute-force trial, this is a sort of Chien's search
(but less efficient, Chien's search is a way to evaluate the polynomial such that each evaluation only takes constant time).'''
errs = len(err_loc) - 1
err_pos = []
for i in range(nmess): # normally we should try all 2^8 possible values, but here we optimize to just check the interesting symbols
if gf_poly_eval(err_loc, gf_pow(2, i)) == 0: # It's a 0? Bingo, it's a root of the error locator polynomial,
# in other terms this is the location of an error
err_pos.append(nmess - 1 - i)
# Sanity check: the number of errors/errata positions found should be exactly the same as the length of the errata locator polynomial
if len(err_pos) != errs:
# couldn't find error locations
raise ReedSolomonError("Too many (or few) errors found by Chien Search for the errata locator polynomial!")
return err_pos


>>> print(hex(msg[10]))
0x96
>>> msg[0] = 6
>>> msg[10] = 7
>>> msg[20] = 8
>>> synd = rs_calc_syndromes(msg, 10)
>>> err_loc = rs_find_error_locator(synd, nsym)
>>> pos = rs_find_errors(err_loc[::-1], len(msg)) # find the errors locations
>>> print(pos)
[20, 10, 0]
>>> msg = rs_correct_errata(msg, synd, pos)
>>> print(hex(msg[10]))
0x96


#### 错误和擦除校正

Reed–Solomon 译码器可以同时对擦除和错误进行解码，最大限度（称为 Singleton Bound）为 $2*e+v <= (n-k)$，其中$e$是错误数，$v$是擦除数，$(n-k)$是 RS 码字符数（代码中为nsym）。基本上，这意味着对于每一个擦除，您只需要一个 RS 码字符来修复它，而对于每一个错误，您需要两个 RS 码字符（因为您需要找到除了值/大小之外的位置来更正）。这种解码器被称为错误与擦除解码器，或勘误解码器。

def rs_forney_syndromes(synd, pos, nmess):
# Compute Forney syndromes, which computes a modified syndromes to compute only errors (erasures are trimmed out). Do not confuse this with Forney algorithm, which allows to correct the message based on the location of errors.
erase_pos_reversed = [nmess-1-p for p in pos] # prepare the coefficient degree positions (instead of the erasures positions)

# Optimized method, all operations are inlined
fsynd = list(synd[1:])      # make a copy and trim the first coefficient which is always 0 by definition
for i in range(0, len(pos)):
x = gf_pow(2, erase_pos_reversed[i])
for j in range(0, len(fsynd) - 1):
fsynd[j] = gf_mul(fsynd[j], x) ^ fsynd[j + 1]

# Equivalent, theoretical way of computing the modified Forney syndromes: fsynd = (erase_loc * synd) % x^(n-k)
# See Shao, H. M., Truong, T. K., Deutsch, L. J., & Reed, I. S. (1986, April). A single chip VLSI Reed-Solomon decoder. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'86. (Vol. 11, pp. 2151-2154). IEEE.ISO 690
#erase_loc = rs_find_errata_locator(erase_pos_reversed, generator=generator) # computing the erasures locator polynomial
#fsynd = gf_poly_mul(erase_loc[::-1], synd[1:]) # then multiply with the syndrome to get the untrimmed forney syndrome
#fsynd = fsynd[len(pos):] # then trim the first erase_pos coefficients which are useless. Seems to be not necessary, but this reduces the computation time later in BM (thus it's an optimization).

return fsynd


def rs_correct_msg(msg_in, nsym, erase_pos=None):
'''Reed-Solomon main decoding function'''
if len(msg_in) > 255: # can't decode, message is too big
raise ValueError("Message is too long (%i when max is 255)" % len(msg_in))

msg_out = list(msg_in)     # copy of message
# erasures: set them to null bytes for easier decoding (but this is not necessary, they will be corrected anyway, but debugging will be easier with null bytes because the error locator polynomial values will only depend on the errors locations, not their values)
if erase_pos is None:
erase_pos = []
else:
for e_pos in erase_pos:
msg_out[e_pos] = 0
# check if there are too many erasures to correct (beyond the Singleton bound)
if len(erase_pos) > nsym: raise ReedSolomonError("Too many erasures to correct")
# prepare the syndrome polynomial using only errors (ie: errors = characters that were either replaced by null byte
# or changed to another character, but we don't know their positions)
synd = rs_calc_syndromes(msg_out, nsym)
# check if there's any error/erasure in the input codeword. If not (all syndromes coefficients are 0), then just return the message as-is.
if max(synd) == 0:
return msg_out[:-nsym], msg_out[-nsym:]  # no errors

# compute the Forney syndromes, which hide the erasures from the original syndrome (so that BM will just have to deal with errors, not erasures)
fsynd = rs_forney_syndromes(synd, erase_pos, len(msg_out))
# compute the error locator polynomial using Berlekamp-Massey
err_loc = rs_find_error_locator(fsynd, nsym, erase_count=len(erase_pos))
# locate the message errors using Chien search (or brute-force search)
err_pos = rs_find_errors(err_loc[::-1] , len(msg_out))
if err_pos is None:
raise ReedSolomonError("Could not locate error")    # error location failed

# Find errors values and apply them to correct the message
# compute errata evaluator and errata magnitude polynomials, then correct errors and erasures
msg_out = rs_correct_errata(msg_out, synd, (erase_pos + err_pos)) # note that we here use the original syndrome, not the forney syndrome
# (because we will correct both errors and erasures, so we need the full syndrome)
# check if the final message is fully repaired
synd = rs_calc_syndromes(msg_out, nsym)
if max(synd) > 0:
raise ReedSolomonError("Could not correct message")     # message could not be repaired
# return the successfully decoded message
return msg_out[:-nsym], msg_out[-nsym:] # also return the corrected ecc block so that the user can check()


Python 注解：列表erase_poserr_pos使用+号连接。

### 最后给出一个例子

# Configuration of the parameters and input message
prim = 0x11d
n = 20 # set the size you want, it must be > k, the remaining n-k symbols will be the ECC code (more is better)
k = 11 # k = len(message)
message = "hello world" # input message

# Initializing the log/antilog tables
init_tables(prim)

# Encoding the input message
mesecc = rs_encode_msg([ord(x) for x in message], n-k)
print("Original: %s" % mesecc)

# Tampering 6 characters of the message (over 9 ecc symbols, so we are above the Singleton Bound)
mesecc[0] = 0
mesecc[1] = 2
mesecc[2] = 2
mesecc[3] = 2
mesecc[4] = 2
mesecc[5] = 2
print("Corrupted: %s" % mesecc)

# Decoding/repairing the corrupted message, by providing the locations of a few erasures, we get below the Singleton Bound
# Remember that the Singleton Bound is: 2*e+v <= (n-k)
corrected_message, corrected_ecc = rs_correct_msg(mesecc, n-k, erase_pos=[0, 1, 2])
print("Repaired: %s" % (corrected_message+corrected_ecc))
print(''.join([chr(x) for x in corrected_message]))


Original:  [104, 101, 108, 108, 111,  32, 119, 111, 114, 108, 100, 145, 124, 96, 105, 94, 31, 179, 149, 163]
Corrupted: [  0,   2,   2,   2,   2,   2, 119, 111, 114, 108, 100, 145, 124, 96, 105, 94, 31, 179, 149, 163]
Repaired:  [104, 101, 108, 108, 111,  32, 119, 111, 114, 108, 100, 145, 124, 96, 105, 94, 31, 179, 149, 163]
hello world


## 结论与更进一步

• 使用更高的伽罗华域，如 $2^{16}$，允许 65536 个字符，或 $2^{32}$，$2^{64}$，$2^{128}$ 等。这里的问题是，用大多项式来编码和解码 Reed–Solomon 所需的多项式计算变得非常昂贵（大多数算法在二次时间内，效率最高的是$nlog\ n$，例如数论变换）。
• 使用“chunking”，这意味着您只需用 256 个字符块对大数据流进行编码。
• 使用包含数据包尺寸的可变算法，如 Cauchy Reed–Solomon（见下文）。

;