尝试测量Java对象的销毁时间

抛砖引玉

May 21, 2021 · 66 mins read

Oh, and one more thing: there is a severe performance penalty for using finalizers.

On my machine, the time to create and destroy a simple object is about 5.6 ns. Adding a finalizer increases the time to 2,400 ns.

In other words, it is about 430 times slower to create and destroy objects with finalizers.

——Effective Java Second Edition

很好奇大佬是如何测量时间的。

创建一个简单对象还比较容易精确测量,销毁一个对象的时间就没那么容易精确测量了。因为JVM屏蔽了底层细节,程序员并不知道它什么时候真正的销毁对象并回收资源。你可以提建议,但什么时候听就看GC心情了:sunglasses:

想了一下,大概有这两条路:

  • JVM入手。这应该是我所能想到的最精确的方法,比如在对象创建后或销毁后JVM有个Callback什么的。以上纯属YY,如有雷同,绝非巧合:joy:。时间、精力、脑力确实有限,没有近一步去研究。
  • 回到Java语言本身提供的功能。不求精确的度量时间,只是粗略的测量然后比较一下,能表明有finalizer的对象创建和销毁相对比较耗时就行。比较差异的话,很多因素都可以忽略,比如GC的延迟时间、对象销毁的真正终点,因为比较的双方都是一样的。下面我们尝试走这一条路。

实验环境Win8.1:

java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

V1

首先不能依赖finalize()方法来打卡时间点,没有这个方法的简单对象表示臣妾~做不到啊:sob:

后来刚好看到java.lang.ref包,PhantomReference代表最接近生命周期终点的phantom reachable,当GC认为referent,也就是我们的简单对象,变成phantom reachable的时候,它就会将该reference添加到queue中。

由于在Java语言里和GC的交互程度非常有限,那么我们就以这个时间点作为销毁对象的打卡时间点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public class WithoutFinalizationV1 {

    public static void main(String[] args) throws Exception {
        ReferenceQueue<WithoutFinalizationV1> phantomQueue = new ReferenceQueue<>();
        long start = System.nanoTime();
        
        PhantomReference<WithoutFinalizationV1> rf = new PhantomReference<>(
                new WithoutFinalizationV1(), phantomQueue);
        System.gc();//advise JVM to do GC
        phantomQueue.remove();//wait it to enqueue
        //only need this time point
        long time = System.nanoTime() - start;
        System.out.println("WithoutV1 "+time);
    }
}

在首次尝试中,创建和销毁的时间合在一起测量,结果也比较出乎意料,完全就不在一个数量级啊。后面想了想,因为包含了GC可能的磨蹭时间,这个时间也在情理之中。

运行了几次,最好记录是:5248174 ns:sweat:

下面给类加上finalize()方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public class WithFinalizationV1 {
    
    private static Object lock = new Object();
    private static boolean finalized = false;
    
    @Override
    protected void finalize() throws Throwable{
        synchronized(lock){
            finalized = true;
            lock.notify();
        }
    }

    public static void main(String[] args) throws Exception {
        ReferenceQueue<WithFinalizationV1> phantomQueue = new ReferenceQueue<>();
        long start = System.nanoTime();
        PhantomReference<WithFinalizationV1> rf = new PhantomReference<>(
                new WithFinalizationV1(), phantomQueue);
        System.gc();//advise JVM to do GC
        synchronized(lock){
            while(!finalized){
                lock.wait();
            }
        }
        System.gc();//advise JVM to do GC
        phantomQueue.remove();//wait it to enqueue
        //only need this time point
        long time = System.nanoTime() - start;
        System.out.println("WithV1 "+time);
    }
}

需要注意的是,finalize()的方法体不能为空白,否则就不算finalizable类。

首先简要说明下finalization的流程:当构造finalizable对象的时候,JVM把这个对象包装成java.lang.ref.Finalizer,这个对象是一个双向链表。GC后,JVMFinalizer对象放入它的queue里,如果是第一次调用,FinalizerThread先把该对象从双向链表里剥离,然后调用它的finalize()方法。

这个时候,我们的finalizable对象就是漂浮的垃圾了,在下次GC的时候进入到我们的phantomQueue里。

下次调用gc()的时机通过wait-notify机制实现,这样就不可避免有额外的开销。最好记录是11789575 ns,耗时大概是非finalizable版本的两倍。

V2

为了更贴近一点,我们分开测量创建和销毁时间。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public class WithoutFinalizationV2 {
    
    public static void main(String[] args) throws Exception {
        long createTime = System.nanoTime();
        WithoutFinalizationV2 o = new WithoutFinalizationV2();
        createTime = System.nanoTime() - createTime;
        
        ReferenceQueue<WithoutFinalizationV2> phantomQueue = new ReferenceQueue<>();
        PhantomReference<WithoutFinalizationV2> rf = new PhantomReference<>(
                o, phantomQueue);
        
        o = null;//explicitly clear reference
        long destroyTime = System.nanoTime();
        System.gc();//advise JVM to do GC
        phantomQueue.remove();//wait it to enqueue
        //only need this time point
        destroyTime = System.nanoTime() - destroyTime;
        System.out.println("WithoutV2 " 
                +" "+createTime +" "+ destroyTime);
    }
}

这次的结果稍微精确点儿,但也差不太多:5987+5291795=5297782 ns

finalize()方法的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
public class WithFinalizationV2 {
    
    private static Object lock = new Object();
    private static boolean finalized = false;
    
    @Override
    protected void finalize() throws Throwable{
        synchronized(lock){
            finalized = true;
            lock.notify();
        }
    }

    public static void main(String[] args) throws Exception {
        ReferenceQueue<WithFinalizationV2> phantomQueue = new ReferenceQueue<>();
        long createTime;
        long destroyTime;
        
        long createStart = System.nanoTime();
        WithFinalizationV2 o = new WithFinalizationV2();
        createTime = System.nanoTime() - createStart;
        
        PhantomReference<WithFinalizationV2> rf = new PhantomReference<>(o, phantomQueue);
        o = null;//explicitly clear reference
        long destroyStart = System.nanoTime();
        System.gc();//advise JVM to do GC
        synchronized(lock){
            while(!finalized){
                lock.wait();
            }
        }
        System.gc();//advise JVM to do GC
        phantomQueue.remove();//wait it to enqueue
        //only need this time point
        destroyTime = System.nanoTime() - destroyStart;
        System.out.println("WithV2 " 
                +" "+createTime +" "+ destroyTime);
    }
}

最快结果是:6415+12250587=12257002 ns,是非finalizable版本的2.5倍左右。

V3?

在程序上已经做到尽量贴近测量了,要想进一步提高成绩,只有利用JVM的HotSpot来优化性能。

在官方文档只找到了方法调用次数的阈值,Server VM是一万次调用。网上搜了下,OSR也差不多是这个次数。

这个版本采用先批量构造对象,然后挨个销毁,最后计算平均时间。由于需要循环很多次,我们能用多线程吗?我觉得不能,因为一次GC就可能回收多个对象,测量就更不准了。

然后又尝试运行在-Xcomp模式,不幸的是程序会卡在等phantomQueue入列的地方,具体原因就不清楚了,可能优化太激进了吧。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public class WithoutFinalizationV3 {

    public static void main(String[] args) throws Exception {
        int times = 30*1000;
        long createTime = 0;
        long destroyTime= 0;
        
        WithoutFinalizationV3[] oarray = new WithoutFinalizationV3[times];
        for(int i=0; i<times; i++){//On-Stack Replacement
            long createStart = System.nanoTime();
            WithoutFinalizationV3 o = new WithoutFinalizationV3();
            createTime += System.nanoTime() - createStart;
            oarray[i] = o;
        }
        
        for(int i=0; i<times; i++){//On-Stack Replacement
            ReferenceQueue<WithoutFinalizationV3> phantomQueue = new ReferenceQueue<>();
            PhantomReference<WithoutFinalizationV3> rf = new PhantomReference<>(
                    oarray[i], phantomQueue);
            oarray[i] = null;//explicitly clear reference
            
            long destroyStart = System.nanoTime();
            System.gc();//advise JVM to do GC
            phantomQueue.remove();//wait it to enqueue
            //only need this time point
            destroyTime += System.nanoTime() - destroyStart;
            //indicator that code is running instead of stuck
            System.out.println("WithoutV3 "+i);
        }
        
        System.out.println("WithoutV3 createTime="+createTime/times);
        System.out.println("WithoutV3 destroyTime="+destroyTime/times);
    }
}

60+7094570=7094630 ns,在大量对象中回收一个确实比较费时。

finalize()方法的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
public class WithFinalizationV3 {
    
    private static Object lock = new Object();
    private static boolean finalized = false;
    
    @Override
    protected void finalize() throws Throwable{
        synchronized(lock){
            finalized = true;
            lock.notify();
        }
    }

    public static void main(String[] args) throws Exception {
        int times = 30*1000;
        long createTime = 0;
        long destroyTime= 0;
        
        WithFinalizationV3[] oarray = new WithFinalizationV3[times];
        for(int i=0; i<times; i++){//On-Stack Replacement
            long createStart = System.nanoTime();
            WithFinalizationV3 o = new WithFinalizationV3();
            createTime += System.nanoTime() - createStart;
            oarray[i] = o;
        }
        
        for(int i=0; i<times; i++){//On-Stack Replacement
            ReferenceQueue<WithFinalizationV3> phantomQueue = new ReferenceQueue<>();
            PhantomReference<WithFinalizationV3> rf = new PhantomReference<>(
                    oarray[i], phantomQueue);
            oarray[i] = null;//explicitly clear reference
            
            long destroyStart = System.nanoTime();
            System.gc();//advise JVM to do GC
            synchronized(lock){
                while(!finalized){
                    lock.wait();
                }
            }
            System.gc();//advise JVM to do GC
            phantomQueue.remove();//wait it to enqueue
            //only need this time point
            destroyTime += System.nanoTime() - destroyStart;
            finalized = false;
            //indicator that code is running instead of stuck
            System.out.println("WithV3 "+i);
        }
        
        System.out.println("WithV3 createTime="+createTime/times);
        System.out.println("WithV3 destroyTime="+destroyTime/times);
    }
}

运行这个版本的程序遇到个问题,当循环到一万五六的时候,程序卡在等待入列的地方Line 41,就好像GC不再把对象放入phantomQueue一样。

刚开始以为是时序的问题,因为我们能同步main线程和FinalizerThread,但无法同步FinalizerThreadGC线程。如果第二次GC发生在FinalizerThread做完工作之前,这个对象就不会进入phantomQueue了。

但我在第二次GC之前尝试调用了Thread.yield()Thread.sleep(100),还是同样的结果。

为什么会这样,难道说持续构建大量finalizable类的对象,GC就不回收了:question:还没想到怎么查,有点儿束手无策了。

V4

这个版本改成创建一个对象销毁一个的方式,避免批量创建过多对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public class WithoutFinalizationV4 {

    public static void main(String[] args) throws Exception {
        
        int times = 30*1000;
        long createTime = 0;
        long destroyTime = 0;
        
        for(int i=0; i<times; i++){//On-Stack Replacement
            long createStart = System.nanoTime();
            WithoutFinalizationV4 o = new WithoutFinalizationV4();
            createTime += System.nanoTime() - createStart;
            
            ReferenceQueue<WithoutFinalizationV4> phantomQueue = new ReferenceQueue<>();
            PhantomReference<WithoutFinalizationV4> rf = new PhantomReference<>(
                    o, phantomQueue);
            
            o = null;//explicitly clear reference
            long destroyStart = System.nanoTime();
            System.gc();//advise JVM to do GC
            phantomQueue.remove();//wait it to enqueue
            //only need this time point
            destroyTime += System.nanoTime() - destroyStart;
            //indicator that code is running instead of stuck
            System.out.println("WithoutV4 "+i);
        }
        
        System.out.println("WithoutV4 createTime="+createTime/times);
        System.out.println("WithoutV4 destroyTime="+destroyTime/times);
    }
}

这次的成绩相对V2版的提高不少:545+2515831=2516376 ns

finalize()方法的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
public class WithFinalizationV4 {
    
    private static Object lock = new Object();
    private static boolean finalized = false;
    
    @Override
    protected void finalize() throws Throwable{
        synchronized(lock){
            finalized = true;
            lock.notify();
        }
    }

    public static void main(String[] args) throws Exception {
        int times = 30*1000;
        
        long createTime = 0;
        long destroyTime= 0;
        
        for(int i=0; i<times; i++){//On-Stack Replacement
            long createStart = System.nanoTime();
            WithFinalizationV4 o = new WithFinalizationV4();
            createTime += System.nanoTime() - createStart;
            
            ReferenceQueue<WithFinalizationV4> phantomQueue = new ReferenceQueue<>();
            PhantomReference<WithFinalizationV4> rf = new PhantomReference<>(o, phantomQueue);
            
            o = null;//explicitly clear reference
            long destroyStart = System.nanoTime();
            System.gc();//advise JVM to do GC
            synchronized(lock){
                while(!finalized){
                    lock.wait();
                }
            }
            System.gc();//advise JVM to do GC
            phantomQueue.remove();//wait it to enqueue
            //only need this time point
            destroyTime += System.nanoTime() - destroyStart;
            finalized = false;
            //indicator that code is running instead of stuck
            System.out.println("WithV4 "+i);
        }
        
        System.out.println("WithV4 createTime="+createTime/times);
        System.out.println("WithV4 destroyTime="+destroyTime/times);
    }

}

奇怪的是,这个版本的循环就能跑完:sweat_smile:结果是:3829+4378470=4382299 ns

甩开一切包袱,加上所有Buff,看能跑多快!以这个版本为蓝本,先注释掉销毁对象的部分,然后运行的时候加-Xcomp参数,测量了一下创建一个简单对象最快需要多少时间,得到的最好结果是21 ns:tada:

结论

finalizable类的实例需要额外的一道手续才能被回收,显然比回收普通类的实例更加耗时。

如果你有更好的办法,请留言赐教:watermelon:


:clock9:上次更新: 2021-12-13