模型INT8转换整理文档1. 背景为评估在GPU T4模型推理性能和资源消耗,需要考虑将模型进行量化后的处理FP16模型 的量化比较简单,只需要改动API的一些参数即可;INT8的实现比较复杂,特作此整理,以 方便以后重复此操作2. 环境服务器:192.168.1.103CPU: 48 x E5-2650 v4 @ 2.20GHz内存:62GGPU: Tesla T4显存:15GTensorRT: 5.1.5.0CUDA: 10.1GPU 驱动版本:418.116.00CUDNN: 7.5.0CUBLAS: 10.1模型:googlenet3. 实现方法参考文档 TensorR「5.1.5.0/samples/sampleINT8/README.md 说明,根据非 Caffe 模型转换方 法实现,原文部分如下:Generating batch files for non-Caffe usersFor developers that are not using Caffe, or cannot easily convert to Caffe, the batch files can be generated via the following sequence of steps on the input training data.Subtract out the normalized mean from the dataset.Crop all of the input data to the same dimensions.Split the data into batch files where each batch file hasN preprocessed images and labels.Generate the batch files based on the format specified inBatch files for calibration.4. 操作步骤4.1.准备校正集文件Calibration File这一步主要是将预处理后的图片数据进行保存作为校正集去后续处理Demo代码如下:〃对于校正器 Caibrator 的操作//首先需要提供calibratiyche就是图像文件进行预处理后的缓存〃可以通过如下方式获取//需要将Input_ptr保存到文件cacheFile中void float int const〃二进制文件存储+后加ifreturn〃保存成文件const char//主函数入口int int charif "Usage: ./exe cacheFile imageFile\nreturn 1int224int224int^^^^^^^^^^^^^^_float new floatconstchar2constchar_^J编译命令:g++ createCalibrationCache.cpp-lopencv_imgproc-lopencv_core-lopencv_imgcodecs -o ccc -g执行命令:./ccc cacheImgs.calib ILSVRC2012_val_00001234.JPEG注:1、 这里ccc即测试程序,cacheImgs.calib是保存的校正集文件第三个参数是使用的图片2、 可以多次使用不同的图片,增加校正集中图片的数量,这里只为测试只采用了一张图片4.2.使用校正集进行模型转换并序列化CudaEngine保存这里使用了 TensorRT提供的trtexec样例程序,执行命令:cd /home/reny/downloads/TensorR『5.1.5.0/samples/trtexec./trtexec--deploy=data/googlenet/deploy.prototxt--output="prob"--model=data/googlenet/googlenet.caffemodel--int8 --batch=4--saveEngine=googlenet_int8_bs4.engine --calib=CalibratorcreateCache/cacheImgs.calib4.3.加载CudaEngine文件并反序列化这里采用了 TensorRT提供的SampleGoogleNet样例程序主要改动:增加一个加载CudaEngine文件的函数,该函数思路源于trtexec样例代码//add by reny 2019/12/27static nvinfer1::ICudaEngine* LoadEngine(const string &sSavedEngineFile){ICudaEngine* engine;std::vector trtModelStream;size_t size{0};std::ifstream file(sSavedEngineFile.data(), std::ios::binary);if (file.good()){file.seekg(0, file.end);size = file.tellg();file.seekg(0, file.beg);trtModelStream.resize(size);file.read(trtModelStream.data(), size); file.close();}IRuntime* infer = createInferRuntime(gLogger.getTRTLogger());/*if (.useDLACore >= 0){infer->setDLACore(gParams.useDLACore);}*/engine = infer->deserializeCudaEngine(trtModelStream.data(), size, nullptr);gLogInfo << sSavedEngineFile << " has been successfully loaded." << std::endl;infer->destroy();return engine;}修改后测试程序路径:/home/reny/downloads/TensorRT-5.1.5.0/samples/sampleGoogleNet_ry4.4.执行正常步骤进行推理cd /home/reny/downloads/TensorRT5.1.5.0./bin/sample_googlenet --loadEngine /home/reny/nvidia/googlenet_int8_bs4.engine5. 实现截图./bin/^iaapl g a.oafllgnpt --la^dEngi nto /horaa/rony/niiidia/EDog.lanflt_intE_tiGi4 -anginc-RULING Tians dpRT . sanp姑—眸oglmn砒止[I] Bulldlmg and riMwlng a GPU inference engine for GaogleNetconstrurtfletwork Is going on[k] [TftT] TensorRT was e叩,]ed against cu*LA5 10,2,0 but is liriked against cuBLAS 10.1.&[l] /hDne/renj/nwidi-n?,g.aaglenet_intS_b54. erngine has- been ^ucce-ii-Fully iDiaded.Infer 姑 going on[h] [TRI] TensorRf was e叩:HE against cl#LAS iBrl, 0 but is linked against cuSlAS 1S.1.&input buf-Far aIzd: 24Q344Einference starts afxer 2 secoMs 10000 times In-Fere-rxes costs time:9860 ks [I] Ran ./bin/saBplejcwglenet with:[I] Inpirt(5)i: data[I] OutputprobPASSED lensorRT.53r?ple_gM>glenet 律,/frin/sa^plejgcwglen&t --loadEngine 代码 TensorR「5.1.5.0/samples/trtexec 代码 TensorR「5.1.5.0/samples/sampleGoogleNet 文档 TensorR「5.1.5.0/doc/TensorR「Developer-Guide.pdf 文档 TensorR「5.1.5.0/samples/sampleINT8/README.md 7ticime/r&ny/nvidia/gDDglienet_int0_b54,enginiB [raDtiSkBE -raactar TcnsorfiT -5 _ 1.5.0]# pwd6. 参考资料。