将多个MSA连超级高铁网络，如何用最少的轨道连接所有MSA？

假设我们要把所有15个最大的MSA都连入超级高铁网络，目标是要最大限度地降低网络的铺设成本，于是就意味着所用的轨道数量要最少。于是问题就成了：如何用最少的轨道连接所有MSA？

4.4.1　权重的处理

要了解建造某条边所需的轨道数量，就需要知道这条边表示的距离。现在是再次引入权重概念的时候了。在超级高铁网络中，边的权重是两个所连MSA之间的距离。图4-5与图4-2几乎相同，差别只是每条边多了权重，表示边所连的两个顶点之间的距离（以英里为单位）。

图4-5　美国15个最大的MSA的加权图，权重代表两个MSA之间的距离，单位英里（1英里≈1.6093 km）

为了处理权重，需要建立Edge的子类WeightedEdge和Graph的子类WeightedGraph。每个WeightedEdge都带有一个与其关联的表示其权重的float类型数据。下面马上就会介绍Jarník算法，它能够比较两条边并确定哪条边权重较低。采用数值型的权重就很容易进行比较了。具体代码如代码清单4-6所示。

代码清单4-6　weighted_edge.py

from __future__ import annotations
from dataclasses import dataclass
from edge import Edge

@dataclass
class WeightedEdge(Edge):
    weight: float

    def reversed(self) -> WeightedEdge:
        return WeightedEdge(self.v, self.u, self.weight)

    # so that we can order edges by weight to find the minimum weight edge
    def __lt__(self, other: WeightedEdge) -> bool:
        return self.weight str:
        return f"{self.u} {self.weight}> {self.v}"

WeightedEdge的实现代码与Edge的实现代码并没有太大的区别，只是添加了一个weight属性，并通过__lt__()实现了“<”操作符，这样两个WeightedEdge就可以相互比较了。“<”操作符只涉及权重（而不涉及继承而来的属性u和v），因为Jarník的算法只关注如何找到权重最小的边。

如代码清单4-7所示，WeightedGraph从Graph继承了大部分功能，此外，它还包含了__init__方法和添加WeightedEdge的便捷方法，并且实现了自己的__str__()方法。它还有一个新方法neighbors_for_index_with_weights()，这一方法不仅会返回每一位邻居，还会返回到达这位邻居的边的权重。这一方法对其__str__()十分有用。

代码清单4-7　weighted_graph.py

from typing import TypeVar, Generic, List, Tuple
from graph import Graph
from weighted_edge import WeightedEdge

V = TypeVar('V') # type of the vertices in the graph

class WeightedGraph(Generic[V], Graph[V]):
    def __init__(self, vertices: List[V] = []) -> None:
        self._vertices: List[V] = vertices
        self._edges: List[List[WeightedEdge]] = [[] for _ in vertices]

    def add_edge_by_indices(self, u: int, v: int, weight: float) -> None:
        edge: WeightedEdge = WeightedEdge(u, v, weight)
        self.add_edge(edge) # call superclass version

    def add_edge_by_vertices(self, first: V, second: V, weight: float) -> None:
        u: int = self._vertices.index(first)
        v: int = self._vertices.index(second)
        self.add_edge_by_indices(u, v, weight)

    def neighbors_for_index_with_weights(self, index: int) -> List[Tuple[V, float]]:
        distance_tuples: List[Tuple[V, float]] = []
        for edge in self.edges_for_index(index):
            distance_tuples.append((self.vertex_at(edge.v), edge.weight))
        return distance_tuples

    def __str__(self) -> str:
        desc: str = ""
        for i in range(self.vertex_count):
            desc += f"{self.vertex_at(i)} -> {self.neighbors_for_index_with_weights(i)}
"
        return desc

现在可以实际定义加权图了。这里将会用到图4-5表示的加权图，名为city_graph2。具体代码如代码清单4-8所示。

代码清单4-8　weighted_graph.py（续）

if __name__ == "__main__":
    city_graph2: WeightedGraph[str] = WeightedGraph(["Seattle", "San Francisco", 
      "Los Angeles", "Riverside", "Phoenix", "Chicago", "Boston", "New York", "Atlanta",
      "Miami", "Dallas", "Houston", "Detroit", "Philadelphia", "Washington"])
    city_graph2.add_edge_by_vertices("Seattle", "Chicago", 1737)
    city_graph2.add_edge_by_vertices("Seattle", "San Francisco", 678)
    city_graph2.add_edge_by_vertices("San Francisco", "Riverside", 386)
    city_graph2.add_edge_by_vertices("San Francisco", "Los Angeles", 348)
    city_graph2.add_edge_by_vertices("Los Angeles", "Riverside", 50)
    city_graph2.add_edge_by_vertices("Los Angeles", "Phoenix", 357)
    city_graph2.add_edge_by_vertices("Riverside", "Phoenix", 307)
    city_graph2.add_edge_by_vertices("Riverside", "Chicago", 1704)
    city_graph2.add_edge_by_vertices("Phoenix", "Dallas", 887)
    city_graph2.add_edge_by_vertices("Phoenix", "Houston", 1015)
    city_graph2.add_edge_by_vertices("Dallas", "Chicago", 805)
    city_graph2.add_edge_by_vertices("Dallas", "Atlanta", 721)
    city_graph2.add_edge_by_vertices("Dallas", "Houston", 225)
    city_graph2.add_edge_by_vertices("Houston", "Atlanta", 702)
    city_graph2.add_edge_by_vertices("Houston", "Miami", 968)
    city_graph2.add_edge_by_vertices("Atlanta", "Chicago", 588)
    city_graph2.add_edge_by_vertices("Atlanta", "Washington", 543)
    city_graph2.add_edge_by_vertices("Atlanta", "Miami", 604)
    city_graph2.add_edge_by_vertices("Miami", "Washington", 923)
    city_graph2.add_edge_by_vertices("Chicago", "Detroit", 238)
    city_graph2.add_edge_by_vertices("Detroit", "Boston", 613)
    city_graph2.add_edge_by_vertices("Detroit", "Washington", 396)
    city_graph2.add_edge_by_vertices("Detroit", "New York", 482)
    city_graph2.add_edge_by_vertices("Boston", "New York", 190)
    city_graph2.add_edge_by_vertices("New York", "Philadelphia", 81)
    city_graph2.add_edge_by_vertices("Philadelphia", "Washington", 123)

    print(city_graph2)

因为WeightedGraph实现了__str__()，所以我们可以美观打印出city_graph2。在输出结果中会同时显示每个顶点连接的所有顶点及这些连接的权重。

Seattle -> [('Chicago', 1737), ('San Francisco', 678)]
San Francisco -> [('Seattle', 678), ('Riverside', 386), ('Los Angeles', 348)]
Los Angeles -> [('San Francisco', 348), ('Riverside', 50), ('Phoenix', 357)]
Riverside -> [('San Francisco', 386), ('Los Angeles', 50), ('Phoenix', 307), ('Chicago',
   1704)]
Phoenix -> [('Los Angeles', 357), ('Riverside', 307), ('Dallas', 887), ('Houston', 1015)]
Chicago -> [('Seattle', 1737), ('Riverside', 1704), ('Dallas', 805), ('Atlanta', 588), 
    ('Detroit', 238)]
Boston -> [('Detroit', 613), ('New York', 190)]
New York -> [('Detroit', 482), ('Boston', 190), ('Philadelphia', 81)] 
Atlanta -> [('Dallas', 721), ('Houston', 702), ('Chicago', 588), ('Washington', 543),
    ('Miami', 604)]
Miami -> [('Houston', 968), ('Atlanta', 604), ('Washington', 923)]
Dallas -> [('Phoenix', 887), ('Chicago', 805), ('Atlanta', 721), ('Houston', 225)]
Houston -> [('Phoenix', 1015), ('Dallas', 225), ('Atlanta', 702), ('Miami', 968)]
Detroit -> [('Chicago', 238), ('Boston', 613), ('Washington', 396), ('New York', 482)]
Philadelphia -> [('New York', 81), ('Washington', 123)]
Washington -> [('Atlanta', 543), ('Miami', 923), ('Detroit', 396), ('Philadelphia', 123)]

4.4.2　查找最小生成树

树是一种特殊的图，它在任意两个顶点之间只存在一条路径，这意味着树中没有环路（cycle），有时被称为无环（acyclic）。环路可以被视作循环。如果可以从一个起始顶点开始遍历图，不会重复经过任何边，并返回到起始顶点，则称存在一条环路。任何不是树的图都可以通过修剪边而成为树。图4-6演示了通过修剪边把图转换为树的过程。

图4-6　在左图中，在顶点B、C和D之间存在一个环路，因此它不是树。在右图中，
连通C和D的边已被修剪掉了，因此它是一棵树

连通图（connected graph）是指从图的任一顶点都能以某种路径到达其他任何顶点的图。本章中的所有图都是连通图。生成树（spanning tree）是把图所有顶点都连接起来的树。最小生成树（minimum spanning tree）是以最小总权重把加权图的每个顶点都连接起来的树（相对于其他的生成树而言）。对于每张加权图，我们都能高效地找到其最小生成树。

这里出现了一大堆术语！“查找最小生成树”和“以权重最小的方式连接加权图中的所有顶点”的意思相同，这是关键点。对任何设计网络（交通网络、计算机网络等）的人来说，这都是一个重要而实际的问题：如何能以最低的成本连接网络中的所有节点呢？这里的成本可能是电线、轨道、道路或其他任何东西。以电话网络来说，这个问题的另一种提法就是：连通每个电话机所需要的最短电缆长度是多少？

1．重温优先队列

优先队列在第2章中已经介绍过了。Jarník算法将需要用到优先队列。我们可以从第2章的程序包中导入PriorityQueue类，要获得详情请参阅紧挨着代码清单4-5之前的注意事项，也可以把该类复制为一个新文件并放入本章的程序包中。为完整起见，在代码清单4-9中，我们将重新创建第2章中的PriorityQueue，这里假定import语句会被放入单独的文件中。

代码清单4-9　priority_queue.py

from typing import TypeVar, Generic, List
from heapq import heappush, heappop

T = TypeVar('T')

class PriorityQueue(Generic[T]):
    def __init__(self) -> None:
         self._container: List[T] = []

    @property
    def empty(self) -> bool:
        return not self._container  # not is true for empty container

    def push(self, item: T) -> None:
        heappush(self._container, item)  # in by priority

    def pop(self) -> T:
        return heappop(self._container)  # out by priority

    def __repr__(self) -> str:
        return repr(self._container)

2．计算加权路径的总权重

在开发查找最小生成树的方法之前，我们需要开发一个用于检测某个解的总权重的函数。最小生成树问题的解将由组成树的加权边列表构成。首先，我们会将WeightedPath定义为WeightedEdge的列表，然后会定义一个total_weight()函数，该函数以WeightedPath的列表为参数并把所有边的权重相加，以便得到总权重。具体代码如代码清单4-10所示。

代码清单4-10　mst.py

from typing import TypeVar, List, Optional
from weighted_graph import WeightedGraph
from weighted_edge import WeightedEdge
from priority_queue import PriorityQueue

V = TypeVar('V') # type of the vertices in the graph
WeightedPath = List[WeightedEdge] # type alias for paths

def total_weight(wp: WeightedPath) -> float:
    return sum([e.weight for e in wp])

3．Jarník算法

查找最小生成树的Jarník算法把图分为两部分：正在生成的最小生成树的顶点和尚未加入最小生成树的顶点。其工作步骤如下所示。

（1）选择要被包含于最小生成树中的任一顶点。

（2）找到连通最小生成树与尚未加入树的顶点的权重最小的边。

（3）将权重最小边末端的顶点添加到最小生成树中。

（4）重复第2步和第3步，直到图中的每个顶点都加入了最小生成树。

注意　Jarník算法常被称为Prim算法。在20世纪20年代末，两位捷克数学家OtakarBorůvka和VojtěchJarník致力于尽量降低铺设电线的成本，提出了解决最小生成树问题的算法。他们提出的算法在几十年后又被其他人“重新发现”[3]。

为了高效地运行Jarník算法，需要用到优先队列。每次将新的顶点加入最小生成树时，所有连接到树外顶点的出边都会被加入优先队列中。从优先队列中弹出的一定是权重最小的边，算法将持续运行直至优先队列为空为止。这样就确保了权重最小的边一定会优先加入树中。如果被弹出的边与树中的已有顶点相连，则它将被忽略。

代码清单4-11中的mst()完整实现了Jarník算法[4]，它还带了一个用来打印WeightedPath的实用函数。

警告　Jarník算法在有向图中不一定能正常工作，它也不适用于非连通图。

代码清单4-11　mst.py（续）

def mst(wg: WeightedGraph[V], start: int = 0) -> Optional[WeightedPath]:
    if start > (wg.vertex_count - 1) or start < 0:
        return None
    result: WeightedPath = [] # holds the final MST
    pq: PriorityQueue[WeightedEdge] = PriorityQueue()
    visited: [bool] = [False] * wg.vertex_count # where we've been

    def visit(index: int):
        visited[index] = True # mark as visited
        for edge in wg.edges_for_index(index): 
            # add all edges coming from here to pq
            if not visited[edge.v]:
                pq.push(edge)

    visit(start) # the first vertex is where everything begins

    while not pq.empty: # keep going while there are edges to process
        edge = pq.pop()
        if visited[edge.v]:
            continue # don't ever revisit
        # this is the current smallest, so add it to solution
        result.append(edge) 
        visit(edge.v) # visit where this connects

    return result

def print_weighted_path(wg: WeightedGraph, wp: WeightedPath) -> None:
    for edge in wp:
        print(f"{wg.vertex_at(edge.u)} {edge.weight}> {wg.vertex_at(edge.v)}")
    print(f"Total Weight: {total_weight(wp)}")

下面逐行过一遍mst()。

def mst(wg: WeightedGraph[V], start: int = 0) -> Optional[WeightedPath]:
    if start > (wg.vertex_count - 1) or start < 0:
        return None

本算法将返回某一个代表最小生成树的WeightedPath对象。运算本算法的起始位置无关紧要（假定图是连通和无向的），因此默认设为索引为0的顶点。如果start无效，则mst()返回None。

result: WeightedPath = [] # holds the final MST
pq: PriorityQueue[WeightedEdge] = PriorityQueue()
visited: [bool] = [False] * wg.vertex_count # where we've been

result将是最终存放加权路径的地方，也即包含了最小生成树。随着权重最小的边不断被弹出以及图中新的区域不断被遍历，WeightedEdge会不断被添加到result中。因为Jarník算法总是选择权重最小的边，所以被视为贪婪算法（greedy algorithm）之一。pq用于存储新发现的边并弹出次低权重的边。visited用于记录已经到过的顶点索引，这用Set也可以实现，类似于bfs()中的explored。

def visit(index: int):
    visited[index] = True # mark as visited
    for edge in wg.edges_for_index(index): 
        # add all edges coming from here
        if not visited[edge.v]:
            pq.push(edge)

visit()是一个便于内部使用的函数，用于把顶点标记为已访问，并把尚未访问过的顶点所连的边都加入pq中。不妨注意一下，使用邻接表模型能够轻松地找到属于某个顶点的边。

visit(start) # the first vertex is where everything begins

除非图是非连通的，否则先访问哪个顶点是无所谓的。如果图是非连通的，是由多个不相连的部分组成的，那么mst()返回的树只会涵盖图的某一部分，也就是起始节点所属的那部分图。

while not pq.empty: # keep going while there are edges to process
    edge = pq.pop()
    if visited[edge.v]:
        continue # don't ever revisit
    # this is the current smallest, so add it
    result.append(edge) 
    visit(edge.v) # visit where this connects

return result

只要优先队列中还有边存在，我们就将它们弹出并检查它们是否会引出尚未加入树的顶点。因为优先队列是以升序排列的，所以会先弹出权重最小的边。这就确保了结果确实具有最小总权重。如果弹出的边不会引出未探索过的顶点，那么就会被忽略，否则，因为该条边是目前为止权重最小的边，所以会被添加到结果集中，并且对其引出的新顶点进行探索。如果已没有边可供探索了，则返回结果。

最后再回到用轨道最少的超级高铁网络连接美国15个最大的MSA的问题吧。结果路径就是city_graph2的最小生成树。下面尝试对city_graph2运行一下mst()，具体代码如代码清单4-12所示。

代码清单4-12　mst.py（续）

if __name__ == "__main__":
    city_graph2: WeightedGraph[str] = WeightedGraph(["Seattle", "San Francisco", "Los 
       Angeles", "Riverside", "Phoenix", "Chicago", "Boston", "New York", "Atlanta",
      "Miami", "Dallas", "Houston", "Detroit", "Philadelphia", "Washington"])

    city_graph2.add_edge_by_vertices("Seattle", "Chicago", 1737)
    city_graph2.add_edge_by_vertices("Seattle", "San Francisco", 678)
    city_graph2.add_edge_by_vertices("San Francisco", "Riverside", 386)
    city_graph2.add_edge_by_vertices("San Francisco", "Los Angeles", 348)
    city_graph2.add_edge_by_vertices("Los Angeles", "Riverside", 50)
    city_graph2.add_edge_by_vertices("Los Angeles", "Phoenix", 357)
    city_graph2.add_edge_by_vertices("Riverside", "Phoenix", 307)
    city_graph2.add_edge_by_vertices("Riverside", "Chicago", 1704)
    city_graph2.add_edge_by_vertices("Phoenix", "Dallas", 887)
    city_graph2.add_edge_by_vertices("Phoenix", "Houston", 1015)
    city_graph2.add_edge_by_vertices("Dallas", "Chicago", 805)
    city_graph2.add_edge_by_vertices("Dallas", "Atlanta", 721)
    city_graph2.add_edge_by_vertices("Dallas", "Houston", 225)
    city_graph2.add_edge_by_vertices("Houston", "Atlanta", 702)
    city_graph2.add_edge_by_vertices("Houston", "Miami", 968)
    city_graph2.add_edge_by_vertices("Atlanta", "Chicago", 588)
    city_graph2.add_edge_by_vertices("Atlanta", "Washington", 543)
    city_graph2.add_edge_by_vertices("Atlanta", "Miami", 604)
    city_graph2.add_edge_by_vertices("Miami", "Washington", 923)
    city_graph2.add_edge_by_vertices("Chicago", "Detroit", 238)
    city_graph2.add_edge_by_vertices("Detroit", "Boston", 613)
    city_graph2.add_edge_by_vertices("Detroit", "Washington", 396)
    city_graph2.add_edge_by_vertices("Detroit", "New York", 482)
    city_graph2.add_edge_by_vertices("Boston", "New York", 190)
    city_graph2.add_edge_by_vertices("New York", "Philadelphia", 81)
    city_graph2.add_edge_by_vertices("Philadelphia", "Washington", 123)

    result: Optional[WeightedPath] = mst(city_graph2)
    if result is None:
        print("No solution found!")
    else:
        print_weighted_path(city_graph2, result)

好在有美观打印方法printWeightedPath()，最小生成树的可读性很不错。

Seattle 678> San Francisco
San Francisco 348> Los Angeles
Los Angeles 50> Riverside
Riverside 307> Phoenix
Phoenix 887> Dallas
Dallas 225> Houston
Houston 702> Atlanta
Atlanta 543> Washington
Washington 123> Philadelphia
Philadelphia 81> New York
New York 190> Boston
Washington 396> Detroit
Detroit 238> Chicago
Atlanta 604> Miami
Total Weight: 5372

换句话说，这是加权图中连通所有MSA的总边长最短的组合，至少需要轨道8645 km。图4-7呈现了这棵最小生成树。

图4-7　高亮的边代表连通全部15个MSA的最小生成树

本文摘自《算法精粹：经典计算机科学问题的Python实现》，大卫·科帕克（David Kopec）著，戴旭译。

本书是一本面向中高级程序员的算法教程，借助Python语言，用经典的算法、编码技术和原理来求解计算机科学的一些经典问题。全书共9章，不仅介绍了递归、结果缓存和位操作等基本编程组件，还讲述了常见的搜索算法、常见的图算法、神经网络、遗传算法、k均值聚类算法、对抗搜索算法等，运用了类型提示等Python高级特性，并通过各级方案、示例和习题展开具体实践。

本书将计算机科学与应用程序、数据、性能等现实问题深度关联，定位独特，示例经典，适合有一定编程经验的中高级Python程序员提升用Python解决实际问题的技术、编程和应用能力。

展开阅读全文

页面更新：2024-05-16

标签：轨道环路网络权重队列顶点算法函数清单最小成本距离两个代码经典科技

1 2 3 4 5

将多个MSA连超级高铁网络，如何用最少的轨道连接所有MSA？

4.4.1　权重的处理

4.4.2　查找最小生成树

1．重温优先队列

2．计算加权路径的总权重

3．Jarník算法

这个月最受程序员喜欢的好书都在这了

自然语言处理实战：机器学习常见工具与技术

有趣的Python和正则表达式

NLP:词中的数学

医学研究人员为什么要学习R语言？哪本书最适合学习？

致敬Evi，UNIX/Linux 系统管理技术手册第5版

Python入门篇：处理Python中的异常

Python开发人员入门自然语言处理必备的参考指南

C++之父Bjarne Stroustrup不可替代的经典书

10月重磅程序员新书上架7本，每一本都很专业

算法中最关键的术语表

ASP.NET Core MVC中的两种404错误

R语言中包括哪些数据结构？

9月程序员新书：每一本拿出来都堪称经典，如图灵奖获得者经典书

学习计算机除了编程之外应该看哪些书？

C++之父Bjarne Stroustrup不可替代的经典书

算法中最关键的术语表

9月程序员新书：每一本拿出来都堪称经典，如图灵奖获得者

全能科技旗舰荣耀Magic3系列赋活《千里江山图》大揭秘

共建网上丝绸之路发展数字新基建丨中国电信揭牌两个

大牛眼中的好代码是什么样子的？

Python拾珍：用这些功能写出更简洁、更可读或更高效的代

NLP：循环网络的记忆功能

从感知机到人工神经网络

CGTN张施磊：新闻界要主动拥抱新科技主动进行媒体融合

将多个MSA连超级高铁网络，如何用最少的轨道连接所有MSA？

4.4.1 权重的处理

4.4.2 查找最小生成树

1．重温优先队列

2．计算加权路径的总权重

3．Jarník算法

4.4.1　权重的处理

4.4.2　查找最小生成树