Analyze ApacheCommonsText With Joern

近期 ApacheCommonsText 出现了 RCE 漏洞，CVE-2022-42889。刚好最近在用 Joern，想尝试下能否利用 Joern 发现这个 CVE，分析过程中遇到了不少的问题，记录一下。

1. 漏洞原理

关于原理这块，网上已经有很多的分析文章了，原理也比较简单，这里也不再赘述了。

简单来说，这个漏洞很像 log4j，也是替换字符串的功能出现了问题，在替换字符串的过程中调用了 org.apache.commons.text.lookup.StringLookup.lookup:java.lang.String(java.lang.String) 方法，根据实际内容调用了不同的 Lookup ，进而利用 ScriptStringLookup 导致了 RCE 漏洞。

2. 初次尝试

因为我们已经知道了漏洞触发点是 StringLookup.lookup 方法，现在我们想知道另外一个问题，除了网络上流传的 POC 中的 replace 方法外，是否还有其他的方法可以触发该漏洞？如果能发现其他的入口，不管是对编写白盒检查规则、渗透测试都有帮助。

sink比较好说，我们直接定义：

def sink = cpg.method.fullName(".+\\.StringLookup.lookup.*")

对于 source 来说，由于本次分析目标是一个单独的库，我们姑且认为所有在非 abstract、interface 内的 public 方法均可以被访问到，可以作为入口被其他的应用调用。

def source = cpg.method.where(_.isPublic).whereNot(_.isConstructor).whereNot(_.typeDecl.isAbstract).filter(_.parameter.size != 0)

接下来把他们串到一起，我们暂时先不使用 joern 自身的数据流分析功能，我们先从调用链上看看是否有更多的发现：

joern> importCode("commons-text-1.9.jar", "commons-text")

joern> sink.repeat(_.caller)(_.until(_.where(_.isPublic).whereNot(_.isConstructor).whereNot(_.typeDecl.isAbstract).filter(_.parameter.size != 0))).l

很不幸，这条查询并没有给出想要的结果，反而会持续卡死无响应。

3. 问题分析

起初，我以为是迭代查询次数太多，导致路径爆炸，进而导致 joern 迟迟无法结束查询。于是我尝试对 source 进行优化，发现无论怎么优化，最终符合条件的 source 数量都不少于 700 个。而且我无法干涉 joern 内置的 repeat 行为，想对其在迭代过程中进行剪枝也不太可能。

既然这条查询会卡死，那么我们一步一步来，看看每次迭代的结果，修改一下刚刚的查询语句，把路径打印出来：

joern> def findStep(idx: Int) : Traversal[Vector[Long]] = {
    return cpg.method.fullName(".+\\.StringLookup.lookup.*").enablePathTracking.repeat(_.caller.dedup)(_.times(idx)).path.map(
        path => path.filter(n => n.isInstanceOf[Method]).map(
            n => {
                val nn = n.asInstanceOf[Method];
                nn.id
            }
        )
    )
}

直接定义一个函数，参数表示迭代多少次，并把路径上所有节点ID返回，我们开始查询吧：

joern> findStep(1).l
res483: List[Vector[Long]] = List(
    Vector(47685L, 24708L), 
    Vector(47685L, 45583L)
)

joern> findStep(2).l
res484: List[Vector[Long]] = List(
    Vector(47685L, 24708L, 23750L)
)

joern> findStep(3).l
res485: List[Vector[Long]] = List(
  Vector(47685L, 24708L, 23750L, 23725L),
  Vector(47685L, 24708L, 23750L, 23750L)
)

当迭代到第三次的时候，出问题了，可以很明显的看到第二条链路上出现了递归，而 joern 内置的 repeat 并没有处理这种情况，只会傻乎乎的不停的调用 caller，导致这条链路永远也无法结束。我们继续迭代：

joern> findStep(4).l
res486: List[Vector[Long]] = List(
  Vector(47685L, 24708L, 23750L, 23725L, 23750L),
  Vector(47685L, 24708L, 23750L, 23725L, 24790L),
  Vector(47685L, 24708L, 23750L, 23725L, 24871L),
  Vector(47685L, 24708L, 23750L, 23725L, 24927L),
  Vector(47685L, 24708L, 23750L, 23725L, 24952L),
  Vector(47685L, 24708L, 23750L, 23725L, 25086L),
  Vector(47685L, 24708L, 23750L, 23725L, 25131L),
  Vector(47685L, 24708L, 23750L, 23725L, 25195L),
  Vector(47685L, 24708L, 23750L, 23725L, 25239L),
  Vector(47685L, 24708L, 23750L, 23725L, 25278L),
  Vector(47685L, 24708L, 23750L, 23725L, 25322L),
  Vector(47685L, 24708L, 23750L, 23725L, 25380L),
  Vector(47685L, 24708L, 23750L, 23725L, 25425L),
  Vector(47685L, 24708L, 23750L, 23725L, 25464L),
  Vector(47685L, 24708L, 23750L, 23725L, 25509L),
  Vector(47685L, 24708L, 23750L, 23750L, 23725L),
  Vector(47685L, 24708L, 23750L, 23750L, 23750L)
)

不出所料，递归的函数还在递归，除此之外，还出现了环的情况，比如第一条链路，出现了23750 -> 23725 -> 23750，这导致这条链路也无法结束，继续迭代就出现了路径爆炸问题。

简单看了一下，基本上都是 replace 函数的不同重载版本在互相调用，而在 Java 代码中，这种情况还是比较常见的，所以这问题还是需要解决的。

4. 写个脚本

简单的查询肯定搞不定了，我们写个小脚本吧：

//repeat..times(x)
def findTimes(initStep: Traversal[Method], maxIdx: Int) : List[Vector[Method]] = {
    var nextBuffer: List[Vector[Method]] = List()
    var finalResult: List[Vector[Method]] = List()
    for (idx <- 1 to maxIdx) {

        // 第一次查找，使用初始条件作为起始
        if (idx == 1) {
            for (it <- initStep) {
                finalResult = finalResult :+ Vector(it)
            }
        }

        // 处理 finalResult 中的每一条路径，取每条 path 的最后一项调用 caller
        for (eachPath <- finalResult) {
            
            var eachPathIdList = eachPath.filter(n => n.isInstanceOf[Method]).map(n => {
                n.asInstanceOf[Method].id
            }).l

            var newNodes = eachPath.last.asInstanceOf[Method].caller.dedup
            for (newNode <- newNodes) {
                // 检查 newPath 是否存在环，如果存在，则跳过，如果不存在，加到结果列表中
                if (!eachPathIdList.contains(newNode.id)) {
                    val newPath = eachPath :+ newNode
                    nextBuffer = nextBuffer :+ newPath
                }
            }
        }

        // 所有的路径都处理完了，结果放在 nextBuffer 中
        finalResult = nextBuffer
        nextBuffer = List()
    }
    return finalResult
}

findTimes 模拟了原始的 repeat(_.caller.dedup)(_.times(x))，给定初始节点和重复次数就可以了。但是还不够完美，我们再稍微修改一下，模拟出 repeat(_.caller.dedup)(_.until(x))：

def findUntil(initStep: Traversal[Method], stopStep: Traversal[Method], maxIdx: Int) : List[Vector[Method]] = {
    var nextBuffer: List[Vector[Method]] = List()
    var finalResult: List[Vector[Method]] = List()
    var results: List[Vector[Method]] = List()
    val stopList = stopStep.l
    val stopIdList = stopList.map(n => n.id).l
    println("stopList.size:" + stopList.size)
    println("stopIdList: " + stopIdList)

    for (idx <- 1 to maxIdx) {
        // 第一次查找，使用初始条件作为起始
        if (idx == 1) {
            for (it <- initStep) {
                finalResult = finalResult :+ Vector(it)
            }
        }

        // 处理 finalResult 中的每一条路径，取每条 path 的最后一项调用 caller
        for (eachPath <- finalResult) {
            
            var eachPathIdList = eachPath.filter(n => n.isInstanceOf[Method]).map(n => {
                n.asInstanceOf[Method].id
            }).l

            var newNodes = eachPath.last.asInstanceOf[Method].caller.dedup
            for (newNode <- newNodes) {
                // 检查 newPath 是否存在环，如果存在，则跳过，如果不存在，加到结果列表中
                if (!eachPathIdList.contains(newNode.id)) {
                    val newPath = eachPath :+ newNode
                    nextBuffer = nextBuffer :+ newPath

                    // 检查是否满足终结条件，如果满足，就加到resutls里
                    if (stopIdList.contains(newNode.id)) {
                        results = results :+ newPath
                    }
                }
            }
        }

        // 所有的路径都处理完了，结果放在 nextBuffer 中
        finalResult = nextBuffer
        nextBuffer = List()
    }
    return results
}

这次只要给定初始条件、终结条件、最大迭代次数就可以了。测试一下：

joern> def initStep = cpg.method.fullName("org.apache.commons.text.lookup.StringLookup.lookup.*")

joern> def stopStep = cpg.method.where(_.isPublic).whereNot(_.isConstructor).whereNot(_.typeDecl.isAbstract).filter(_.parameter.size != 0)

joern> findUntil(initStep, stopStep, 10).map(path => path.map(node => (node.id, node.name, node.fullName))).l

一共打印出了 26 条路径。

5. 分析结果

经过简单的分析，26 条路径中，主要分为三类：

org.apache.commons.text.StringSubstitutor.replace
org.apache.commons.text.StringSubstitutor.replaceIn
org.apache.commons.text.io.StringSubstitutorReader.read

第一类 replace 就是网络上流传的 POC 中提及的：

String x = StringSubstitutor.createInterpolator().replace("${java:version}");
System.out.println(x);

第二类 replaceIn 和 replace 用法略有不同，但也能触发：

StringBuilder sb = new StringBuilder();
sb.append("${java:version}");
StringSubstitutor.createInterpolator().replaceIn(sb);
System.out.println(sb.toString());

第三类 Reader 的比较复杂，也能触发：

StringSubstitutor ss = StringSubstitutor.createInterpolator();
String template = "${java:version}${a}";
StringReader sr = new StringReader(template);
StringSubstitutorReader ssr = new StringSubstitutorReader(sr, ss);

while (true) {
    int x = ssr.read();
    if (x == -1) {
        break;
    } else {
        System.out.print((char)x);
    }
}

lightless blog