编译原理 第四次实验 自下而上语法分析2(LR分析法)

 

校名


校徽


《编译原理实验》

实验报告



题  目 第四次实验 LR分析法
姓  名 高星杰
学  号 2021307220712
专  业 计算机科学与技术
上课时间 2024春
授课教师 刘善梅


2024 年 6月 13 日

[TOC]

编译原理 第四次实验 自下而上语法分析2(LR分析法)

实验目的

  • 能采用LR分析法对一个算术表达式(b+9)*a做自下而上的语法分析;
  • 可自行设计一个LR文法,能识别含有句子(b+9)*a的语言;
  • 也可基于PL/0语言的文法(完整文法参见本文档最后的附录)来做,若基于PL/0语言文法,需重点关注以下几条文法的EBNF,若不习惯看文法的巴科斯范式EBNF,可先将文法改写成常规的产生式形式P75。

分析对象〈算术表达式〉的EBNF定义如下:

<表达式> ::= [+|-]<项>{<加法运算符> <项>} <项> ::= <因子>{<乘法运算符> <因子>} <因子> ::= <标识符>|<无符号整数>| ‘(’<表达式>‘)’ <加法运算符> ::= +|- <乘法运算符> ::= *|/ <关系运算符> ::= =|#|<|<=|>|>= <标识符> ::=<字母>{<字母>|<数字>} <无符号整数> ::= <数字>{<数字>} <字母> ::= a|b|…|X|Y|Z <数字> ::= 0|1|…|8|9

实验要求

  • 编程基础较扎实的同学,建议用程序构造文法的LR(0)项目集规范族(构造算法见教材P107);判断文法是LR(0)、SLR和LR(1)中的哪一种,然后用程序构造LR分析表(LR(0)分析表构造算法P109;SLR分析表构造算法P112;LR(1)构造算法P115)
  • 编程基础非常非常薄弱的同学,可以人工求解LR(0)项目集规范族,判断文法是LR(0)、SLR和LR(1)中的哪一种,构造LR分析表,然后直接在程序中给出手工设计好的LR分析表。
  • 编程基础尚可的同学,可根据自身情况编程构造文法的LR(0)项目集规范族、构造LR分析表(书上都有算法,建议同学们尽量用程序实现)

实验完成程度

由于本次实验给出的样例文法直接使用LR(0)分析会产生冲突不能正确判断句子,所以在LR(0)的基础上实现了SLR(1)分析,最终成功解决了LR(0)的分析。

实现的内容 实现的方式
实现文法SLR(1) 程序实现
求解闭包 程序实现
求解转移goto 程序实现
构建项目集规范族 程序实现
求解FIRST和FOLLOW集合 程序实现
构建分析表 程序实现
语法分析过程 程序实现
是否仅支持PL/0文法 否、可以输出其他文法和句子进行判断

设计思想

1. 文法扩展与预处理

1.1 文法表示

文法由以下部分组成:

  • q0:起始符号。
  • vn:非终结符集合。
  • vt:终结符集合。
  • prods:产生式集合,每个产生式表示为 (head, body) 的形式。

1.2 增广文法

为简化分析过程,我们引入一个新的开始符号 FAKE_S,并添加相应的产生式。同时,增加终结符 EOF 作为输入结束符号。

1.3 生成初始项目

初始项目集从新引入的开始符号的产生式开始,表示项目 (0, 0),其中 0 表示产生式索引,0 表示项目点位置。

2. 闭包与转移函数

2.1 闭包函数 (closure)

对一个项目集进行扩展,包含所有可能的后续项目。

  • 从当前项目集中的每个项目出发,找到点后面的符号。
  • 对于每个非终结符,将其所有产生式作为新项目加入集合,直到不再有新的项目加入。

2.2 转移函数 (goto)

计算从一个项目集经过某个符号转移后到达的下一个项目集。

  • 从当前项目集中选取点后符号为给定符号的项目,将点后移一位。
  • 对这些新项目集合应用闭包操作,得到下一个项目集。

3. 构建项目集规范族

3.1 项目集规范族

通过不断应用转移函数,从初始项目集出发,构建整个项目集规范族。

  • 使用集合保存所有已处理和未处理的项目集。
  • 对每个未处理的项目集,计算其通过所有符号的转移,得到新的项目集,直到所有项目集都处理完毕。

4. FIRST和FOLLOW集合

4.1 FIRST集合

计算每个非终结符和终结符的FIRST集合。

  • 对终结符,其FIRST集合是其本身。
  • 对非终结符,通过递归计算其产生式右部的FIRST集合,考虑空字 LAMBDA

4.2 FOLLOW集合

计算每个非终结符的FOLLOW集合,用于规约时确定接下来的符号。

  • 起始符号的FOLLOW集合包含 EOF
  • 对每个产生式的右部,更新后续符号的FOLLOW集合,直到集合不再变化。

5. 构建分析表

5.1 动作表 (action_table)

用于存储移入、规约、接受等操作。

  • 对于每个项目集中的项目,如果是移入项目,添加移入操作。
  • 如果是规约项目,添加规约操作。
  • 特殊处理接受项目,添加接受操作。

5.2 Goto表 (goto_table)

用于存储状态间的转移关系。

  • 对于每个项目集和每个符号,记录其转移后的项目集。

6. 语法分析过程

6.1 分析过程

通过移入和规约操作,对输入的令牌序列进行分析,输出分析结果。

  • 使用栈维护当前状态。
  • 根据当前状态和输入符号,从动作表中查找相应操作。
    • 移入操作:将符号移入栈,并转移到下一个状态。
    • 规约操作:根据产生式进行规约,弹出符号并转移到下一个状态。
    • 接受操作:成功匹配输入序列,结束分析。
    • 错误处理:遇到无法处理的输入,报告错误。

算法流程

SLR(1)分析程序为了实现对输入令牌序列的有效解析,能够判断输入是否符合给定的文法规则。

整个流程可以分为六步

  1. 文法扩展
  2. 闭包与转移函数
  3. 项目集规范族构建
  4. FIRST和FOLLOW集合计算
  5. 分析表构建
  6. 语法分析过程

Untitled diagram-2024-06-18-161132

1. 文法扩展与预处理

  1. 扩展文法

    • 增加一个新的开始符号 FAKE_S 和相应的产生式 FAKE_S -> 原始起始符号
    • 在终结符集合中添加特殊符号 EOF 表示输入结束。
     def grammar_with_fake_start(grammar):
         (prev_q0, vn, vt, prods) = deepcopy(grammar)
         q0 = FAKE_S
         vn.insert(0, q0)
         vt.append(EOF)
         prods.insert(0, (q0, [prev_q0]))
         return (q0, vn, vt, prods)
    

2. 闭包与转移函数

  1. 闭包函数 (closure)
    • 对给定项目集,添加所有可能的后续项目,直到不再有新的项目加入为止。
     def closure(items, grammar):
         (q0, vn, vt, prods) = grammar
         clo = items
         while True:
             new_items = set([])
             for item in clo:
                 stacktop = item_to_stacktop(item, grammar)
                 if stacktop != None and stacktop not in vn:
                     continue
                 for prod_index, (head, body) in get_prods_with_head(stacktop, grammar):
                     new_item = (prod_index, 0)
                     new_items.add(new_item)
             if new_items <= clo:
                 break
             else:
                 clo = clo.union(new_items)
         return frozenset(clo)
    
  2. 转移函数 (goto)
    • 计算从一个项目集经过某个符号转移后到达的下一个项目集。
     def goto(cc_i, symbol, grammar):
         (q0, vn, vt, prods) = grammar
         items = set([])
         for item in cc_i:
             stacktop = item_to_stacktop(item, grammar)
             if symbol == stacktop:
                 new_item = (item[0], item[1] + 1)
                 items.add(new_item)
         return closure(items, grammar)
    

3. 构建项目集规范族

  1. 项目集规范族
    • 从初始项目集开始,反复应用转移函数,构建整个项目集规范族。
     def canonical_collection(grammar):
         (q0, vn, vt, prods) = grammar
         grammar_symbols = vn + vt
         goto_table = {}
         q0_item = (0, 0)
         cc0 = closure(set([q0_item]), grammar)
         cc = set([cc0])
         done = set([])
         while True:
             new_ccs = set([])
             for cc_i in cc:
                 if cc_i in done:
                     continue
                 for symbol in grammar_symbols:
                     cc_next = goto(cc_i, symbol, grammar)
                     if len(cc_next) > 0 :
                         new_ccs.add(cc_next)
                         goto_table[(cc_i, symbol)] = cc_next
                 done.add(cc_i)
             if new_ccs <= cc:
                 break
             else:
                 cc = cc.union(new_ccs)
         return (cc0, cc, goto_table)
    

4. FIRST和FOLLOW集合

  1. 计算FIRST集合
    • 对每个非终结符和终结符计算其FIRST集合。
     def first(grammar):
         (q0, vn, vt, prods) = grammar
         first_table = {}
         first_table_snapshot = {}
         for t in vt:
             first_table[t] = set([t])
         for nt in vn:
             first_table[nt] = set([])
         while first_table != first_table_snapshot:
             first_table_snapshot = deepcopy(first_table)
             for head, body in prods:
                 rhs = set([])
                 for i, b in enumerate(body):
                     if i == 0 or LAMBDA in first_table[body[i - 1]]:
                         if i == len(body) - 1:
                             rhs = rhs.union(first_table[b])
                         else:
                             rhs = rhs.union(first_table[b] - set([LAMBDA]))
                 first_table[head] = first_table[head].union(rhs)
         return first_table
    
  2. 计算FOLLOW集合
    • 对每个非终结符计算其FOLLOW集合,用于在规约时确定接下来的符号。
     def follow(grammar, first_table):
         (q0, vn, vt, prods) = grammar
         follow_table = {}
         follow_table_snapshot = {}
         for nt in vn:
             follow_table[nt] = set([])
         follow_table[q0] = set([EOF])
         while follow_table != follow_table_snapshot:
             follow_table_snapshot = deepcopy(follow_table)
             for head, body in prods:
                 trailer = follow_table[head]
                 for b in reversed(body):
                     if b in vt:
                         trailer = set([b])
                     else:
                         follow_table[b] = follow_table[b].union(trailer)
                         if LAMBDA in first_table[b]:
                             trailer = trailer.union(first_table[b] - set([LAMBDA]))
                         else:
                             trailer = first_table[b]
         return follow_table
    

5. 构建分析表

  1. 构建动作表 (action_table)Goto表 (goto_table)
    • 动作表用于存储移入、规约、接受等操作。
    • Goto表用于存储状态间的转移。
     def build_action_table(cc, goto_table, follow_table, grammar):
         (_, _, vt, _) = grammar
         action_table = {}
         for cc_i in cc:
             for item in cc_i:
                 stacktop = item_to_stacktop(item, grammar)
                 if not (item_is_complete(item, grammar) or stacktop in vt):
                     continue
                 head, body = item_to_prod(item, grammar)
                 if head == FAKE_S:
                     action_table.setdefault((cc_i, EOF), []).append((ACCEPT,))
                 elif item_is_complete(item, grammar):
                     for a in follow_table[head]:
                         action_table.setdefault((cc_i, a), []).append((REDUCE, item[0]))
                 else:
                     next_state = goto_table.get((cc_i, stacktop))
                     action_table.setdefault((cc_i, stacktop), []).append((SHIFT, next_state))
         return action_table
    

6. 语法分析过程

  1. 分析输入串
    • 使用栈维护当前状态,根据当前状态和输入符号,从动作表中查找相应操作,并执行移入、规约或接受操作。
     def parse(grammar, tokens):
         grammar = grammar_with_fake_start(grammar)
         cc0, cc, goto_table = canonical_collection(grammar)
         first_table = first(grammar)
         follow_table = follow(grammar, first_table)
         action_table = build_action_table(cc, goto_table, follow_table, grammar)
         stack = [cc0]
         token_index = 0
         ok = True
         iter = 0
         while True:
             iter += 1
             token = tokens[token_index]
             stacktop_state = stack[-1]
             actions = action_table.get((stacktop_state, token[0]))
             if actions == None or len(actions) == 0:
                 return (False, "No actions")
             if len(actions) > 1:
                 print("Conflicts in the action table")
             action = actions[0]
             action_str = action[0]
             if len(action) == 2:
                 if action[0] == "shift":
                     action_str += " {0}".format(action[1])
                 else:
                     action_str += " {0}".format(action[1])
             if action[0] == SHIFT:
                 next_state = action[1]
                 stack.append(next_state)
                 token_index += 1
             elif action[0] == REDUCE:
                 prod_index= action[1]
                 prods = grammar[3]
                 (head, body) = prods[prod_index]
                 for _ in range(len(body)):
                     stack.pop()
                 stacktop_state = stack[-1]
                 next_state = goto_table.get((stacktop_state, head), "DEFAULT2")
                 stack.append(next_state)
             elif action[0] == ACCEPT:
                 break
             else:
                 ok 									= False
                 break
         return (ok,)
    

源程序

代码分为两个程序:

  1. 主程序
  2. 测试程序(单元测试)(由于程序的功能比较复杂,在写的过程就需要进行测试,一部分一部分的测试,这样汇总起来才不会有什么bug难找到。)
  1. 主程序
from copy import deepcopy

# 定义一些常量
FAKE_S = "FAKE_S"  # 假的开始符号
EOF = "EOF"  # 输入结束符
SHIFT = "shift"  # 移进操作
REDUCE = "reduce"  # 规约操作
ACCEPT = "accept"  # 接受操作
LAMBDA = "lambda"  # 空字

# 增广文法:为文法增加一个假的开始符号和结束符
def grammar_with_fake_start(grammar):
    (prev_q0, vn, vt, prods) = deepcopy(grammar)
    q0 = FAKE_S
    vn.insert(0, q0)  # 将假开始符号加入非终结符集合
    vt.append(EOF)  # 将结束符加入终结符集合
    prods.insert(0, (q0, [prev_q0]))  # 增加新的产生式:FAKE_S -> 原始起始符号
    return (q0, vn, vt, prods)

# 构建规范LR(1)项目集族
def canonical_collection(grammar):
    (q0, vn, vt, prods) = grammar
    grammar_symbols = vn + vt  # 文法符号 = 非终结符 + 终结符
    goto_table = {}

    q0_item = (0, 0)  # 初始项目
    cc0 = closure(set([q0_item]), grammar)  # 求初始项目的闭包
    cc = set([cc0])
    done = set([])  # 已处理的项目集

    while True:
        new_ccs = set([])
        for cc_i in cc:
            if cc_i in done:
                continue
            for symbol in grammar_symbols:
                cc_next = goto(cc_i, symbol, grammar)  # 计算转移
                if len(cc_next) > 0 :
                    new_ccs.add(cc_next)
                    goto_table[(cc_i, symbol)] = cc_next
            done.add(cc_i)

        if new_ccs <= cc:  # 如果没有新的项目集,跳出循环
            break
        else:
            cc = cc.union(new_ccs)  # 合并新的项目集

    return (cc0, cc, goto_table)

# 闭包函数:对一个项目集进行闭包操作
def closure(items, grammar):
    (q0, vn, vt, prods) = grammar
    clo = items
    while True:
        new_items = set([])
        for item in clo:
            stacktop = item_to_stacktop(item, grammar)
            if stacktop != None and stacktop not in vn:
                continue
            for prod_index, (head, body) in get_prods_with_head(stacktop, grammar):
                new_item = (prod_index, 0)
                new_items.add(new_item)

        if new_items <= clo:
            break
        else:
            clo = clo.union(new_items)

    return frozenset(clo)

# 转移函数:计算从一个项目集经过某个符号转移后得到的项目集
def goto(cc_i, symbol, grammar):
    (q0, vn, vt, prods) = grammar
    items = set([])

    for item in cc_i:
        stacktop = item_to_stacktop(item, grammar)
        if symbol == stacktop:
            new_item = (item[0], item[1] + 1)
            items.add(new_item)

    return closure(items, grammar)

# 构建分析表
def build_action_table(cc, goto_table, follow_table, grammar):
    (_, _, vt, _) = grammar
    action_table = {}

    for cc_i in cc:
        for item in cc_i:
            stacktop = item_to_stacktop(item, grammar)

            if not (item_is_complete(item, grammar) or stacktop in vt):
                continue

            head, body = item_to_prod(item, grammar)

            if head == FAKE_S:
                action_table.setdefault((cc_i, EOF), []).append((ACCEPT,))
            elif item_is_complete(item, grammar):
                for a in follow_table[head]:
                    action_table.setdefault((cc_i, a), []).append((REDUCE, item[0]))
            else:
                next_state = goto_table.get((cc_i, stacktop))
                action_table.setdefault((cc_i, stacktop), []).append((SHIFT, next_state))

    return action_table

# 计算FIRST集合
def first(grammar):
    (q0, vn, vt, prods) = grammar
    first_table = {}
    first_table_snapshot = {}

    for t in vt:
        first_table[t] = set([t])

    for nt in vn:
        first_table[nt] = set([])

    while first_table != first_table_snapshot:
        first_table_snapshot = deepcopy(first_table)
        for head, body in prods:
            rhs = set([])
            for i, b in enumerate(body):
                if i == 0 or LAMBDA in first_table[body[i - 1]]:
                    if i == len(body) - 1:
                        rhs = rhs.union(first_table[b])
                    else:
                        rhs = rhs.union(first_table[b] - set([LAMBDA]))

            first_table[head] = first_table[head].union(rhs)

    return first_table

# 计算FOLLOW集合
def follow(grammar, first_table):
    (q0, vn, vt, prods) = grammar
    follow_table = {}
    follow_table_snapshot = {}

    for nt in vn:
        follow_table[nt] = set([])

    follow_table[q0] = set([EOF])

    while follow_table != follow_table_snapshot:
        follow_table_snapshot = deepcopy(follow_table)
        for head, body in prods:
            trailer = follow_table[head]
            for b in reversed(body):
                if b in vt:
                    trailer = set([b])
                else:
                    follow_table[b] = follow_table[b].union(trailer)
                    if LAMBDA in first_table[b]:
                        trailer = trailer.union(first_table[b] - set([LAMBDA]))
                    else:
                        trailer = first_table[b]

    return follow_table

# 解析输入串
def parse(grammar, tokens):
    print()
    print("++++++++++++++++++++++++")
    print("++++++++++++++++++++++++")
    print("PARSE: {0}".format(tokens))
    print("++++++++++++++++++++++++")
    print("++++++++++++++++++++++++")
    grammar = grammar_with_fake_start(grammar)
    cc0, cc, goto_table = canonical_collection(grammar)
    first_table = first(grammar)
    follow_table = follow(grammar, first_table)
    action_table = build_action_table(cc, goto_table, follow_table, grammar)
    print_action_table(action_table, cc, grammar)
    stack = [cc0]
    token_index = 0
    ok = True
    iter = 0
    while True:
        print("")
        print("============")
        print("ITER: {0}".format(iter))
        print("============")
        iter += 1

        print_stack(stack, cc)

        token = tokens[token_index]
        print("Token: {0}".format(token[0]))

        stacktop_state = stack[-1]
        actions = action_table.get((stacktop_state, token[0]))
        if actions == None or len(actions) == 0:
            return (False, "No actions")
        if len(actions) > 1:
            print("Conflicts in the action table")
        action = actions[0]
        action_str = action[0]
        if len(action) == 2:
            if action[0] == "shift":
                action_str += " {0}".format(action[1])
            else:
                action_str += " {0}".format(action[1])
        print("Action: {0}".format(action_str))

        if action[0] == SHIFT:
            next_state = action[1]
            stack.append(next_state)
            token_index += 1

        elif action[0] == REDUCE:
            prod_index= action[1]
            prods = grammar[3]
            (head, body) = prods[prod_index]
            for _ in range(len(body)):
                stack.pop()
            stacktop_state = stack[-1]
            next_state = goto_table.get((stacktop_state, head), "DEFAULT2")
            stack.append(next_state)
            print("reducing by {0} -> {1}".format(head, body))

        elif action[0] == ACCEPT:
            break
        else:
            print("ERROR")
            print_stack(stack, cc, True)
            ok = False
            break

    return (ok,)

##################
# 辅助函数
###################

# 获取项目的栈顶符号
def item_to_stacktop(item, grammar):
    (_, _, _, prods) = grammar
    (prod_index, stacktop_index) = item
    (head, body) = prods[prod_index]
    if stacktop_index >= len(body):
        # 完成的项目
        return None
    else:
        return body[stacktop_index]

# 获取项目对应的产生式
def item_to_prod(item, grammar):
    (_, _, _, prods) = grammar
    (prod_index, _) = item
    return prods[prod_index]

# 判断项目是否完成
def item_is_complete(item, grammar):
    return item_to_stacktop(item, grammar) == None

# 根据产生式头部获取产生式列表
def get_prods_with_head(desired_head, grammar):
    if desired_head == None:
        return []
    (_, _, _, prods) = grammar
    result = []
    for prod_index, (current_head, body) in enumerate(prods):
        if current_head != desired_head:
            continue
        result.append((prod_index, (current_head, body)))
    return result

# 打印goto表
def print_goto_table(goto_table, cc):
    id_map = {}
    cc_list = list(cc)

    print("INDEX")
    for i, cc_i in enumerate(cc_list):
        id_map[cc_i] = i
        print("{0} -> {1}".format(i, cc_i))

    print("GOTO")
    for key, next_state in goto_table.iteritems():
        state, symbol = key
        print("{0:<10} {1:<10} -> {2}".format(id_map.get(state), symbol, id_map.get(next_state)))

# 打印action表
def print_action_table(action_table, cc, grammar):
    id_map = {}
    cc_list = list(cc)

    print("INDEX")
    for i, cc_i in enumerate(cc_list):
        id_map[cc_i] = i
        print("{0} -> {1}".format(i, cc_i))

    print("ACTION")
    for key, actions in action_table.items():
        state, symbol = key
        action_str = []
        for action in actions:
            if action[0] == ACCEPT:
                action_str.append("ACCEPT")
            elif action[0] == REDUCE:
                action_str.append("Reduce {0}".format(prod_to_string(grammar[3][action[1]])))
            elif action[0] == SHIFT:
                action_str.append("Shift {0}".format(id_map.get(action[1])))

        print("{0:<10} {1:<10} -> {2}".format(id_map.get(state), symbol, "".join(action_str)))

# 打印栈内容
def print_stack(stack, cc, index=False):
    id_map = {}
    cc_list = list(cc)

    if index:
        print("INDEX")
    for i, cc_i in enumerate(cc_list):
        id_map[cc_i] = i
        if index:
            print("{0} -> {1}".format(i, cc_i))

    print("STACK: {0}".format(" ".join(map(lambda x: "{0}".format(id_map.get(x)), stack))))
    print("=======")
    print(" ".join(map(lambda x: "{0}".format(id_map.get(x)), stack)))
    for state in reversed(stack):
        print("{0}".format(id_map.get(state)))
        print(state)

    print("=======")

# 将产生式转换为字符串
def prod_to_string(prod):
    head, body = prod
    return "{0} -> {1}".format(head, " ".join(body))

# 主函数
if __name__ == '__main__':
    prods = [
    ("E", ["E", "+", "T"]),
    ("E", ["T"]),
    ("T", ["T", "*", "F"]),
    ("T", ["F"]),
    ("F", ["(", "E", ")"]),
    ("F", ["id"]),
    ]
    q0 = "E"
    vn = ["E", "T", "F"]
    vt = ["+", "*", "(", ")", "id"]

    lines = []
    tokens = []

    while True:
        try:
            line = input()
            if line == "":
                break
            lines.append(line)
        except EOFError:
            break

    for line in lines:
        type = line[1:line.index(",")]
        if type == "lparen":
            tokens.append("(")
        elif type == "rparen" :
            tokens.append(")")
        elif type == "ident" or type == "number":
            tokens.append("id")
        elif type == "plus":
            tokens.append("+")
        elif type == "times" :
            tokens.append("*")
        else :
            print("Error")
            break
    tokens = [(element,) for element in tokens] + [(EOF,)]
    grammar = (q0, vn, vt, prods)
    if(parse(grammar, tokens)[0]):
        print("Yes,it is correct.")
    else:
        print("No,it is wrong.")
  1. 测试程序
import unittest
from main import *


prods = [
    ("E", ["E", "+", "T"]),
    ("E", ["T"]),
    ("T", ["T", "*", "F"]),
    ("T", ["F"]),
    ("F", ["(", "E", ")"]),
    ("F", ["id"]),
]
q0 = "E"
vn = ["E", "T", "F"]
vt = ["+", "*", "(", ")", "id"]

grammar = (q0, vn, vt, prods)

class Test(unittest.TestCase):
    def test_grammar_with_fake_start(self):
        grammar_ext = grammar_with_fake_start(grammar)

        expected = (
            FAKE_S,
            [FAKE_S, "E", "T", "F"],
            ["+", "*", "(", ")", "id", EOF],
            [
                (FAKE_S, ["E"]),
                ("E", ["E", "+", "T"]),
                ("E", ["T"]),
                ("T", ["T", "*", "F"]),
                ("T", ["F"]),
                ("F", ["(", "E", ")"]),
                ("F", ["id"]),
            ]
        )
        self.assertEqual(grammar_ext, expected)

    def test_closure(self):
        items = closure(set([(0, 0)]), grammar)
        expected = frozenset([
            (0, 0),
            (1, 0),
            (2, 0),
            (3, 0),
            (4, 0),
            (5, 0)
        ])
        self.assertEqual(items, expected)

    def test_goto(self):
        cc0 = closure(set([(0, 0)]), grammar)
        cc_i = goto(cc0, "E", grammar)
        expected = frozenset([
            (0, 1),
        ])

        self.assertEqual(cc_i, expected)

    def test_first(self):
        (q0, vn, vt, prods) = grammar
        first_table = first(grammar)
        for t in vt:
            self.assertEqual(first_table[t], set([t]))

        self.assertEqual(first_table["E"], set(["(", "id"]))
        self.assertEqual(first_table["T"], set(["(", "id"]))
        self.assertEqual(first_table["F"], set(["(", "id"]))

    def test_follow(self):
        (q0, vn, vt, prods) = grammar
        first_table = first(grammar)
        follow_table = follow(grammar, first_table)

        self.assertEqual(follow_table["E"], set([EOF, "+", ")"]))
        self.assertEqual(follow_table["T"], set([EOF, "+", "*", ")"]))
        self.assertEqual(follow_table["F"], set([EOF, "+", "*", ")"]))

    def test_parse(self):
        ok = parse(grammar, [("id", ), ("+", ), ("id",), (EOF, )])
        self.assertTrue(ok)

        ok = parse(grammar, [("id", ), (EOF, )])
        self.assertTrue(ok)

        ok = parse(grammar, [ (EOF, )])
        self.assertTrue(ok)




    def test_canonical_collection(self):
        q0, cc, goto_table = canonical_collection(grammar)

        self.assertEqual(q0, frozenset([
            (0, 0),
            (1, 0),
            (2, 0),
            (3, 0),
            (4, 0),
            (5, 0)
        ]))

        self.assertEqual(
            cc,
            set([
                frozenset([(2, 3)]),
                frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 0),
                    ]),
                frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]),
                frozenset([(0, 1)]),
                frozenset([(0, 1), (4, 2)]),
                frozenset([(1, 1), (2, 1)]),
                frozenset([(2, 2), (5, 0), (4, 0)]),
                frozenset([(3, 0), (2, 0), (0, 2), (5, 0), (4, 0)]),
                frozenset([(3, 1)]),
                frozenset([(5, 1)]),
                frozenset([(4, 3)]),
                frozenset([(0, 3), (2, 1)]),
            ])
        )

        self.assertEqual(
            goto_table,
            {
                (frozenset([(0, 1)]), '+'): frozenset([(3, 0), (2, 0), (0, 2), (5,
                        0), (4, 0)]),
                (frozenset([(3, 0), (2, 0), (0, 2), (5, 0), (4, 0)]), 'id'
                ): frozenset([(5, 1)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 0),
                    ]), 'E'): frozenset([(0, 1)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]), '('): frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]),
                (frozenset([(2, 2), (5, 0), (4, 0)]), 'id'): frozenset([(5, 1)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 0),
                    ]), 'T'): frozenset([(1, 1), (2, 1)]),
                (frozenset([(3, 0), (2, 0), (0, 2), (5, 0), (4, 0)]), 'T'
                ): frozenset([(0, 3), (2, 1)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 0),
                    ]), 'F'): frozenset([(3, 1)]),
                (frozenset([(0, 1), (4, 2)]), ')'): frozenset([(4, 3)]),
                (frozenset([(3, 0), (2, 0), (0, 2), (5, 0), (4, 0)]), 'F'
                ): frozenset([(3, 1)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]), 'E'): frozenset([(0, 1), (4, 2)]),
                (frozenset([(3, 0), (2, 0), (0, 2), (5, 0), (4, 0)]), '('
                ): frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 0),
                    ]), '('): frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]),
                (frozenset([(0, 3), (2, 1)]), '*'): frozenset([(2, 2), (5, 0), (4,
                        0)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]), 'T'): frozenset([(1, 1), (2, 1)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 0),
                    ]), 'id'): frozenset([(5, 1)]),
                (frozenset([(2, 2), (5, 0), (4, 0)]), '('): frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]),
                (frozenset([(2, 2), (5, 0), (4, 0)]), 'F'): frozenset([(2, 3)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]), 'F'): frozenset([(3, 1)]),
                (frozenset([(1, 1), (2, 1)]), '*'): frozenset([(2, 2), (5, 0), (4,
                        0)]),
                (frozenset([(0, 1), (4, 2)]), '+'): frozenset([(3, 0), (2, 0), (0,
                        2), (5, 0), (4, 0)]),
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (4, 1),
                    (4, 0),
                    ]), 'id'): frozenset([(5, 1)]),
                }
        )

    def test_build_action_table(self):
        grammar_ext = grammar_with_fake_start(grammar)
        cc0, cc, goto_table = canonical_collection(grammar_ext)
        first_table = first(grammar_ext)
        follow_table = follow(grammar_ext, first_table)
        action_table = build_action_table(cc, goto_table, follow_table, grammar_ext)

        print_action_table(action_table, cc, grammar_ext)

        self.assertEqual(
            action_table,
            {
                (frozenset([(4, 1)]), ')'): [('reduce', 4)],
                (frozenset([(5, 3)]), '*'): [('reduce', 5)],
                (frozenset([(1, 2), (3, 0), (6, 0), (5, 0), (4, 0)]), 'id'
                ): [('shift', frozenset([(6, 1)]))],
                (frozenset([(5, 2), (1, 1)]), '+'): [('shift', frozenset([(1, 2),
                        (3, 0), (6, 0), (5, 0), (4, 0)]))],
                (frozenset([(0, 1), (1, 1)]), 'EOF'): [('accept', )],
                (frozenset([(3, 3)]), ')'): [('reduce', 3)],
                (frozenset([(3, 2), (6, 0), (5, 0)]), 'id'): [('shift',
                        frozenset([(6, 1)]))],
                (frozenset([(3, 1), (2, 1)]), '+'): [('reduce', 2)],
                (frozenset([(1, 3), (3, 1)]), '*'): [('shift', frozenset([(3, 2),
                        (6, 0), (5, 0)]))],
                (frozenset([(5, 3)]), ')'): [('reduce', 5)],
                (frozenset([(3, 3)]), 'EOF'): [('reduce', 3)],
                (frozenset([(5, 3)]), '+'): [('reduce', 5)],
                (frozenset([(6, 1)]), '+'): [('reduce', 6)],
                (frozenset([(4, 1)]), 'EOF'): [('reduce', 4)],
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]), '('): [('shift', frozenset([
                    (2, 0),
                    (5, 0),
                    (3, 0),
                    (5, 1),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]))],
                (frozenset([(5, 3)]), 'EOF'): [('reduce', 5)],
                (frozenset([(3, 3)]), '*'): [('reduce', 3)],
                (frozenset([(1, 3), (3, 1)]), ')'): [('reduce', 1)],
                (frozenset([(3, 2), (6, 0), (5, 0)]), '('): [('shift', frozenset([
                    (2, 0),
                    (5, 0),
                    (3, 0),
                    (5, 1),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]))],
                (frozenset([(0, 1), (1, 1)]), '+'): [('shift', frozenset([(1, 2),
                        (3, 0), (6, 0), (5, 0), (4, 0)]))],
                (frozenset([(6, 1)]), '*'): [('reduce', 6)],
                (frozenset([(4, 1)]), '+'): [('reduce', 4)],
                (frozenset([(6, 1)]), 'EOF'): [('reduce', 6)],
                (frozenset([(3, 3)]), '+'): [('reduce', 3)],
                (frozenset([(1, 3), (3, 1)]), 'EOF'): [('reduce', 1)],
                (frozenset([(3, 1), (2, 1)]), 'EOF'): [('reduce', 2)],
                (frozenset([(5, 2), (1, 1)]), ')'): [('shift', frozenset([(5,
                        3)]))],
                (frozenset([
                    (2, 0),
                    (5, 0),
                    (3, 0),
                    (5, 1),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]), '('): [('shift', frozenset([
                    (2, 0),
                    (5, 0),
                    (3, 0),
                    (5, 1),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]))],
                (frozenset([(3, 1), (2, 1)]), ')'): [('reduce', 2)],
                (frozenset([(4, 1)]), '*'): [('reduce', 4)],
                (frozenset([(6, 1)]), ')'): [('reduce', 6)],
                (frozenset([
                    (2, 0),
                    (0, 0),
                    (5, 0),
                    (3, 0),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]), 'id'): [('shift', frozenset([(6, 1)]))],
                (frozenset([(1, 2), (3, 0), (6, 0), (5, 0), (4, 0)]), '('
                ): [('shift', frozenset([
                    (2, 0),
                    (5, 0),
                    (3, 0),
                    (5, 1),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]))],
                (frozenset([(3, 1), (2, 1)]), '*'): [('shift', frozenset([(3, 2),
                        (6, 0), (5, 0)]))],
                (frozenset([(1, 3), (3, 1)]), '+'): [('reduce', 1)],
                (frozenset([
                    (2, 0),
                    (5, 0),
                    (3, 0),
                    (5, 1),
                    (1, 0),
                    (6, 0),
                    (4, 0),
                    ]), 'id'): [('shift', frozenset([(6, 1)]))],
                }
        )



if __name__ == '__main__':
    unittest.main()

调试数据

样例输入

(lparen,()

(ident,b)

(plus,+)

(number,9)

(rparen,))

(times,\*)

(ident,a)

样例输出

Yes,it is correct.

运行结果

  1. 分析产生action表的过程:

image-20240619001518476

  1. 语法分析过程

    image-20240619001730470

实验调试情况及体会

在这次实验中,尽管题目仅仅要求编写一个分析程序,但我却主动挑战自己,完成了SLR(1)分析表的构建。这无疑加大了实验的难度,但也为我提供了难得的挑战和成长机会。选择使用SLR(1)文法进行分析,不仅需要考虑向前搜索符,更是对自己能力的一次全面考验。

本次实验得益于以前良好的编程习惯,每写一部分就调试一部分查看一部分的bug,这样总的程序虽然非常长,但是中间的bug都被一一排除了,然后就能够得出了正确的结果,还是非常的不容易,最终实现了SLR(1)的分析,并且适用于其他文法,可以自己输入产生式,然后进行分析。并且实验过程中,我一步步地完成实验各个环节。当设计尚未明确时,我了分阶段完成。首先生成一个状态的所有项目,然后考虑如何求该状态的转移,再进一步求转移后的状态。在这个过程中,逐步深化细节,例如如何判断两个状态集是否相同等问题。最终,我成功地求出了所有状态,并记录了状态转移情况。这样,编写分析表和分析程序的过程变得更加顺利。

这次实验带给我巨大的收获,与之前编写的LL(1)文法形成了鲜明对比。例如,在一开始编写时,我没有注意到左递归的消除问题,这让我在完成实验后对此问题印象更加深刻。通过这次实验,我对每一个过程和步骤都有了更深的理解。

这次实验不仅让我提升了技术能力,更让我体会到了从挑战中获得成长的喜悦。实验过程中的每一个小细节,都让我对编译原理有了更深入的理解。尽管实验过程充满挑战,但最终的成功让我感到无比的满足和自豪。感谢这次实验,让我在编程的道路上迈出了重要的一步,未来我将继续努力,不断提升自己的技能,迎接更多的挑战。