1. 生產(chǎn)環(huán)境的異常現(xiàn)象及初步分析
最近發(fā)現(xiàn)系統(tǒng)程序內(nèi)存消耗越來越大,開始并沒特別注意,就簡單調(diào)了一下jvm參數(shù)。但直到前些天內(nèi)存爆滿,持續(xù)Full GC,這肯定出現(xiàn)了內(nèi)存泄露。
原以為哪里出現(xiàn)了比較低級的錯誤,所以很直接想到先去看看程序是在跑哪段代碼。jstack -l
I/O dispatcher 125" #739 prio=5 os_prio=0 tid=0x0000000002394800 nid=0x1e2a runnable [0x00007f5c2125b000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000007273401d0> (a sun.nio.ch.Util$2) - locked <0x00000007273401c0> (a java.util.Collections$UnmodifiableSet) - locked <0x00000007273401e0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:257) at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:106) at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:590) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None "pool-224-thread-1" #738 prio=5 os_prio=0 tid=0x00007f5c463f4000 nid=0x1e29 runnable [0x00007f5c2024b000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x0000000727340478> (a sun.nio.ch.Util$2) - locked <0x0000000727340468> (a java.util.Collections$UnmodifiableSet) - locked <0x0000000727340488> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:342) at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:191) at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None
我以下的思考路徑都未能解決(自己記錄一下,看官可以跳過...)
查看線程的stack,看調(diào)用處是否有問題。這個一般都能解決問題,但是上面的異常線程棧確實沒什么信息量,無法定位。
Google了一下有關(guān)大量這個線程停在epollwait的資料,發(fā)現(xiàn)這個現(xiàn)象和epoll nio的bug是一樣的,還以為碰到了一個無法處理的高級問題。第一反應(yīng)就是去HttpClient的官網(wǎng)查bug日志,結(jié)果還真發(fā)現(xiàn)了最近的升級有解決類似問題的,然后升級到最新版問題依舊。但是最后仔細(xì)想想,也確實不太可能,畢竟應(yīng)用場景還是比較普通的。
jmap -histo
查了調(diào)用棧和異常對象的package,發(fā)現(xiàn)是HttpClient的,把本地所有相關(guān)調(diào)用都查了一遍,看起來寫的也都是對的。
搬出jvirtualvm的性能分析工具,發(fā)現(xiàn)只能看到泄露現(xiàn)象,無法定位問題。
這下懵逼了,剛好忙其他事,就放了幾天順帶考慮一下,還好泄露比較慢,問題處理不著急。。。
2. 線程泄露的分析方法
處理這個問題的關(guān)鍵:必須準(zhǔn)確知道是什么泄露了線程!
在Google過程中突然受到啟發(fā),JDK中的工具是應(yīng)該可以分析引用的。最后發(fā)現(xiàn)jhat - Java Heap Analysis Tool正是我要的。
最終解決方式:
jmap -F -dump:format=b,file=tomcat.bin
jhat -J-Xmx4g
查看相關(guān)對象的reference,OQL也可以用,但是網(wǎng)頁版直接點鏈接也夠用了。
3. 鎖定原因并解決
從之前異常heap中發(fā)現(xiàn)存在的問題對象有如下這些:
$ cat histo | grep org.apache.http. | grep 1944 | less 197: 1944 217728 org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl 232: 1944 171072 org.apache.http.impl.nio.conn.CPool 233: 1944 171072 org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor 248: 1944 155520 org.apache.http.impl.nio.reactor.BaseIOReactor 249: 1944 155520 org.apache.http.impl.nio.reactor.IOSessionImpl 276: 1944 139968 org.apache.http.impl.nio.client.InternalHttpAsyncClient 277: 1944 139968 org.apache.http.impl.nio.conn.CPoolEntry 323: 1944 108864 org.apache.http.impl.nio.client.MainClientExec 363: 1944 93312 org.apache.http.impl.nio.codecs.DefaultHttpResponseParser 401: 1944 77760 org.apache.http.impl.nio.reactor.SessionInputBufferImpl 402: 1944 77760 org.apache.http.impl.nio.reactor.SessionOutputBufferImpl 403: 1944 77760 org.apache.http.nio.protocol.HttpAsyncRequestExecutor$State 442: 1944 62208 org.apache.http.impl.cookie.DefaultCookieSpecProvider 443: 1944 62208 org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager 444: 1944 62208 org.apache.http.nio.conn.ssl.SSLIOSessionStrategy 445: 1944 62208 org.apache.http.nio.pool.AbstractNIOConnPool$2 511: 1944 46656 [Lorg.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker; 512: 1944 46656 [Lorg.apache.http.impl.nio.reactor.BaseIOReactor; 513: 1944 46656 org.apache.http.conn.ssl.DefaultHostnameVerifier 514: 1944 46656 org.apache.http.impl.cookie.DefaultCookieSpec 515: 1944 46656 org.apache.http.impl.cookie.NetscapeDraftSpecProvider 516: 1944 46656 org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1 517: 1944 46656 org.apache.http.impl.nio.client.InternalIODispatch 518: 1944 46656 org.apache.http.impl.nio.codecs.DefaultHttpRequestWriter 519: 1944 46656 org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$ConfigData 520: 1944 46656 org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalAddressResolver 521: 1944 46656 org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalConnectionFactory 522: 1944 46656 org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker 523: 1944 46656 org.apache.http.nio.protocol.HttpAsyncRequestExecutor 603: 1944 31104 org.apache.http.client.protocol.RequestExpectContinue 604: 1944 31104 org.apache.http.conn.routing.BasicRouteDirector 605: 1944 31104 org.apache.http.impl.auth.HttpAuthenticator 606: 1944 31104 org.apache.http.impl.conn.DefaultRoutePlanner 607: 1944 31104 org.apache.http.impl.cookie.IgnoreSpecProvider 608: 1944 31104 org.apache.http.impl.nio.SessionHttpContext 609: 1944 31104 org.apache.http.impl.nio.reactor.AbstractIOReactor$1 610: 1944 31104 org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReacto
接下來要找出到底誰new了這些對象,這些異常Object中很多是內(nèi)部field,所以要先找出最外層的對象。這個就只是邊猜邊看了,結(jié)果發(fā)現(xiàn)就是InternalHttpAsyncClient。點開進(jìn)去看了一下,發(fā)現(xiàn)有一堆Instance,最后了發(fā)現(xiàn)泄露的對象。也可以用OQL select referrers(c) from org.apache.http.impl.nio.client.InternalHttpAsyncClient c
instance of org.apache.http.impl.nio.client.InternalHttpAsyncClient@0x932be638 (128 bytes) Class: class org.apache.http.impl.nio.client.InternalHttpAsyncClientInstance data members: ... References to this object: org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1@0x932be6c8 (40 bytes) : field this$0 com.aliyun.mqs.common.http.DefaultServiceClient@0x931cc588 (32 bytes) : field httpClient
這里的信息就是阿里云的mqs創(chuàng)建了這些對象。去看了一下代碼,書寫看似沒有問題,實際上,連接壓根忘記關(guān)了。有問題的阿里云MQS文檔是這個,但是最新版本的官網(wǎng)文檔已經(jīng)改用了org.eclipse.jetty.client.HttpClient,也是沒有顯式調(diào)用stop函數(shù),希望這個類庫不會出現(xiàn)此問題。
@Service public class AliyunService implements IAliyunService { private static Logger logger = Logger.getLogger(AliyunService.class.getName()); @Autowired private AliyunConfig aliyunConfig; @Override public void sendMessage(String content) { MQSClient client = new DefaultMQSClient(aliyunConfig.mqEndpoint, aliyunConfig.mqAccessId, aliyunConfig.mqAccessKey); String queueName = aliyunConfig.mqQueue; try { CloudQueue queue = client.getQueueRef(queueName); // queue沒做關(guān)閉處理,應(yīng)該最后加上 // finally{ queue.close(); } Message message = new Message(); message.setMessageBody(content); queue.putMessage(message); } catch (Exception e) { logger.warning(e.getMessage()); } } }
以下是MQS給的jar中相應(yīng)關(guān)閉的源碼
public final class CloudQueue { private ServiceClient serviceClient; ... public void close() { if(this.serviceClient != null) { this.serviceClient.close(); } } }
真相大白!至此修改后,問題順利解決。
4. 總結(jié)
首先,這個問題的解決確實還是要善用并熟悉JDK工具*,之前對jhat的理解不深,導(dǎo)致第一時間沒有想到這個解決方案。日后再有內(nèi)存問題,會有更犀利的解決方法了。
其次,熟悉了線程泄露的現(xiàn)象,解決方式還是去找線程的對象,說到底,還是對象的泄露。
最后,真的要吐槽一下阿里,我之前接阿里百川就惡心的不行,這次又出現(xiàn)低級錯誤。我一直認(rèn)為阿里是中國軟件技術(shù)最好的公司,但基層研發(fā)的是水平真心一般,研發(fā)質(zhì)量控制有問題啊

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)
