Finite element core calculations and stream processing

Finite element core calculations and stream processing

Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń

AGH University of Science and Technology, al. A. Mickiewicza 30, 30-059 Kraków, Poland.

DOI:

https://doi.org/10.7494/cmms.2016.4.0591

Abstract:

We present the execution model and performance analysis for the important phase of finite element calculations, the creation of systems of linear equations. We assume that the process is realized using a set of CPU cores and GPU multiprocessors, with CPU and GPU memories connected using PCIe links for data transfer. We analyse the use of linear data structures that are designed specially for GPU processing. We present the examples of calculations for the standard first order FEM approximation and typical contemporary hardware. We draw the conclusions on the feasibility of the proposed approach. 

Cite as:

Banaś, K., Bielański, J., Chłoń, K. (2016). Finite element core calculations and stream processing. Computer Methods in Materials Science, 16(4), 213 – 223. https://doi.org/10.7494/cmms.2016.4.0591

Article (PDF):

Keywords:

Finite element method, Solvers of linear equations, GPU, OpenCL, High performance computing, Technical simulations

References:

Anzt, H., Tomov, S., Luszczek, P., Sawyer, W., Dongarra, J.,2015, Acceleration of gpu-based krylov solvers via datatransfer reduction, International Journal of High PerformanceComputing Applications, 29(3), 366-383.

Banaś, K., Kruzel, F., Bielanski, J., 2015, Finite element numericalintegration for first order approximations on multicorearchitectures, CoRR, abs/1504.01023.

Banaś, K., Michalik, K., 2010, Design and development of anadaptive mesh manipulation module for detailed FEMsimulation of flows, Proceedings of the InternationalConference on Computational Science, ICCS 2010, eds,Peter M. A. Sloot, G. Dick van Albada, and Jack Dongarra,University of Amsterdam, The Netherlands, May31-June 2, 2010, 1 of Procedia Computer Science, 2043-2051.

Banaś, K., Płaszewski, P., Macioł, P., 2014a, Numerical integrationon GPUs for higher order finite elements, Computersand Mathematics with Applications, 67(6), 1319-1344.

Banaś, K., Chłoń, K., 2016, Design of interface modules forflexible coupling of finite element codes with solvers oflinear equations, Computer Assisted Methods in Engineeringand Science, 23(1), 3-17.

Banaś, K., Chłoń, K., Cybułka, P., Michalik, K., Płaszewski, P.,Siwek, A., 2014b, Adaptive finite element modelling ofwelding processes, eScience on Distributed ComputingInfrastructure – Achievements of PLGrid Plus Domain-Specific Services and Tools, eds, Bubak M., Kitowski,J., Wiatr, K., 8500 of Lecture Notes in Computer Science,Springer International Publishing, 391-406.

Banaś, K., Krużel, F., 2014, Opencl performance portability forxeon phi coprocessor and NVIDIA gpus: A case studyof finite element numerical integration, Euro-Par 2014:Parallel Processing Workshops – Euro-Par 2014 InternationalWorkshops, Porto, Portugal, August 25-26,2014, Revised Selected Papers, Part II, 8806 of LectureNotes in Computer Science, Springer, 158-169.

Banaś, K., Krużel, F., Bielański, J., 2016, Finite element numericalintegration for first order approximations on multiandmany-core architectures, Computer Methods in AppliedMechanics and Engineering, 305, 827-848.

Barrett, B., Berry, M., Chan, T.F., Demmel, J., Donato, J.M.,Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., van derVorst, H., 1994, Templates for the Solution of LinearSystems: Building Blocks for Iterative Methods. SIAM,Philadelphia, PA.

Cecka, C., Lew, A. J., Darve, E., 2011, Assembly of 10 finiteelement methods on graphics processors, InternationalJournal for Numerical Methods in Engineering, 85(5),640-669.

Choi, J. W., Singh, A., Vuduc, R. W., 2010, Modeldriven autotuningof sparse matrix-vector multiply on gpus,SIGPLAN Not., 45(5), 115-126.

Ciarlet, P.G., 1978, The Finite Element Method for EllipticProblems, North-Holland, Amsterdam.

Demkowicz, L., Kurtz, J., Pardo, D., Paszynski, M., Rachowicz,W., Zdunek, A., 2007, Computing with Hp-Adaptive FiniteElements, Frontiers Three Dimensional Elliptic andMaxwell Problems with Applications, Chapman &Hall/CRC.

Du, P., Weber, R., Luszczek, P., Tomov, S., Peterson, G., Dongarra,J., 2012, From CUDA to opencl:Towards a performance-portable solution for multi-platformGPU programming, Parallel Computing, 38(8), 391-407. (Application accelerators in HPC).

Dziekonski, A., Lamecki, A., Mrozowski, M., 2011, Gpu accelerationof multilevel solvers for analysis of microwavecomponents with finite element method, Microwave andWireless Components Letters, IEEE, 21(1), 1-3.

Dziekonski, A., Sypek, P., Lamecki, A., Mrozowski, M., 2012,Finite element matrix generation on a gpu, Progress inElectromagnetics Research, 128, 249-265.

Dziekonski, A., Sypek, P., Lamecki, A., Mrozowski, M., 2013,Generation of large finite-element matrices on multiplegraphics processors, International Journal for NumericalMethods in Engineering, 94(2), 204-220.

Geveler, M., Ribbrock, D., Göddeke, D., Zajac, P., Turek, S.,2013, Towards a complete fembased simulation toolkiton gpus: Unstructured grid finite element geometricmultigrid solvers with strong smoothers based on sparseapproximate inverses, Computers & Fluids, 80(0), 327-332.

Karatarakis, A., Karakitsios, P., Papadrakakis, M., 2014, GPUaccelerated computation of the isogeometric analysisstiffness matrix, Computer Methods in Applied Mechanicsand Engineering, 269, 334-355.

Klöckner, A., Warburton, T., Bridge, J., Hesthaven, J. S., 2009,Nodal discontinuous galerkin methods on graphics processors,J. Comput. Phys., 228, 7863-7882.

Komatitsch, D., Erlebacher, G., Göddeke, D., Michéa, D., 2010,High-order finite-element seismic wave propagationmodeling with mpi on a large gpu cluster, Journal ofComputational Physics, 229(20), 7692-7714.

Koza, Z., Matyka, M., Szkoda, S., Mirosław, Ł., 2014, Compressedmultirow storage format for sparse matrices ongraphics processing units, SIAM Journal on ScientificComputing, 36(2), C219-C239.

Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A. R.,2014, A unified sparse matrix data format for efficientgeneral sparse matrix-vector multiplication on modernprocessors with wide SIMD units, SIAM J. ScientificComputing, 36(5).

Krużel, F., Banaś, K., 2013, Vectorized OpenCL implementationof numerical integration for higher order finite elements,Computers and Mathematics with Applications,66(10), 2030-2044.

Krużel, F., Banaś, K., 2015, AMD APU systems as a platformfor scientific computing, Computer Methods in MaterialsScience, 15(2), 362-369.

Lipski, P., Wózniak, M., Paszynski, M., 2015, Comparison ofthe structure of equation systems and the GPU multifrontalsolver for finite difference, collocation and finiteelement method, Proceedings of the InternationalConference on Computational Science, ICCS 2015,Computational Science at the Gates of Nature, eds,Koziel, S., Leifsson, L., Lees, M., Krzhizhanovskaya,V., Dongarra, J., Sloot, P. M. A., Reykjavík, Iceland, 1-3June, 2015, 2014, 51, 1072-1081.

Markall, G. R., Slemmer, A., Ham, D. A., Kelly, P. H. J., Cantwell,C. D., Sherwin, S. J., 2013, Finite element assemblystrategies on multi-core and many-core architectures,International Journal for Numerical Methods in Fluids,71(1), 80-97.

Reguly, I., Giles, M., 2012, Efficient sparse matrixvector multiplicationon cache-based gpus, Innovative ParallelComputing (InPar), 1-12.

Remacle, J.-F., Karamete, B. K., Shephard, M. S., 2000, AlgorithmOriented Mesh Database, Report 5, SCOREC.

Smith, B., Bjorstad, P., Gropp,W., 1996, Domain Decomposition.Parallel Multilevel Methods for Elliptic PartialDifferential Equation, Cambridge University Press,Cambridge.

Zienkiewicz, O.C., Taylor, R.L., 2000, Finite element method,1-3, Butterworth Heinemann, London.