# How To Introduce a New Operation Into Runtime **ONE**'s runtime has three main modules: **core**, **frontend** and **backend**. This document provides some lightweight guidance about how to introduce a new operation into these modules to make onert support the operation. ## Index - [How To Introduce a New Operation Into Runtime](#how-to-introduce-a-new-operation-into-runtime) - [Index](#index) - [Core](#core) - [Frontend](#frontend) - [Loaders](#loaders) - [Base Loader](#base-loader) - [TFLite Loader](#tflite-loader) - [Circle Loader](#circle-loader) - [NNAPI](#nnapi) - [Backend](#backend) - [ShapeFixer](#shapefixer) - [acl_cl](#acl_cl) - [acl_neon](#acl_neon) - [cpu](#cpu) - [KernelGenerator](#kernelgenerator) - [acl_cl](#acl_cl-1) - [acl_neon](#acl_neon-1) - [cpu](#cpu-1) - [ConstantInitializer (in some cases)](#constantinitializer-in-some-cases) - [cpu](#cpu-2) - [Samples (to be updated)](#samples-to-be-updated) ## Core This module has graph-based IR(intermediate representation). You have to add IR for the new operation. 1. Add name of new operation at [Operations.lst](/runtime/onert/core/include/ir/Operations.lst) ```cpp OP(Select) ``` 2. Create a class for node of new operation in [here](/runtime/onert/core/include/ir/operation/) ```cpp #include "ir/Operation.h" namespace onert { namespace ir { namespace operation { class Select : public Operation { public: enum Input { COND = 0, INPUT1 = 1, INPUT2 = 2 }; enum Output { OUTPUT = 0, }; public: Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs); public: void accept(OperationVisitor &v) const override; OpCode opcode() const final { return OpCode::Select; } }; } // namespace operation } // namespace ir } // namespace onert ``` You can also define the class in other source file like below ```cpp #include "ir/operation/Select.h" #include "ir/OperationVisitor.h" namespace onert { namespace ir { namespace operation { void Select::accept(OperationVisitor &v) const { v.visit(*this); } Select::Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs) : Operation{OperandConstraint::createExact(3u), inputs, outputs} { } ``` - [Operations.Include.h](/runtime/onert/core/include/ir/Operations.Include.h) ```cpp #include "ir/operation/Select.h" ``` 3. Add to the OperationValidator to check if the node is valid. - [OperationValidator.h](/runtime/onert/core/src/compiler/OperationValidator.h) ```cpp void visit(const operation::Select &node) override; ``` - [OperationValidator.cc](/runtime/onert/core/src/compiler/OperationValidator.cc) ```cpp void OperationValidator::visit(const ir::operation::Select &node) { const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)}; const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)}; const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)}; const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)}; UNUSED_RELEASE(output_index); UNUSED_RELEASE(cond_index); UNUSED_RELEASE(input1_index); UNUSED_RELEASE(input2_index); const auto output_type = _ctx.at(output_index).typeInfo(); const auto cond_type = _ctx.at(cond_index).typeInfo(); const auto input1_type = _ctx.at(input1_index).typeInfo(); const auto input2_type = _ctx.at(input2_index).typeInfo(); UNUSED_RELEASE(output_type); UNUSED_RELEASE(cond_type); UNUSED_RELEASE(input1_type); UNUSED_RELEASE(input2_type); assert(cond_type.type() == ir::DataType::BOOL8); assert(output_type.type() == ir::DataType::FLOAT32 || output_type.type() == ir::DataType::INT32 || output_type.type() == ir::DataType::QUANT8_ASYMM); assert(output_type.type() == input1_type.type()); assert(output_type.type() == input2_type.type()); const auto output_shape = _ctx.at(output_index).shape(); const auto cond_shape = _ctx.at(cond_index).shape(); const auto input1_shape = _ctx.at(input1_index).shape(); const auto input2_shape = _ctx.at(input2_index).shape(); UNUSED_RELEASE(output_shape); UNUSED_RELEASE(cond_shape); UNUSED_RELEASE(input1_shape); UNUSED_RELEASE(input2_shape); assert(output_shape == input1_shape); assert(cond_shape == input1_shape); assert(input2_shape == input1_shape); } ``` 4. Add to the Dumper to dump IR information of new operation. - [Dumper.cc](/runtime/onert/core/src/ir/dumper/Dumper.cc) ```cpp void Dumper::visit(const Select &node) { VERBOSE(LIR) << "* Select" << std::endl; VERBOSE(LIR) << " - Inputs : Cond(" << node.getInputs().at(Select::Input::COND).value() << ") Input1" << node.getInputs().at(Select::Input::INPUT1).value() << ") Input2" << node.getInputs().at(Select::Input::INPUT2).value() << ")" << std::endl; VERBOSE(LIR) << " - Output : Output(" << node.getOutputs().at(Select::Output::OUTPUT).value() << ")" << std::endl; } ``` 5. Add code for shape inference - ONE runtime tries to calculate shapes and allocate memory during compilation time. For some calculations of output shapes that cannot be done during compilation time, ONE runtime will calculate shapes and allocate memory during execution time. - Calculation of shapes during compilation time is called _static shape inference_ and calculation of shapes during execution time is called _dynamic shape inference_. - [`StaticShapeInferer.h`](`/runtime/onert/compiler/StaticShapeInferer.h`) ```CPP void visit(const ir::operation::Select &op) override; ``` - [`StaticShapeInferer.cc`](/runtime/onert/core/src/compiler/StaticShapeInferer.cc) ```CPP void StaticShapeInferer::visit(const ir::operation::Select &op) { const auto input_cond_idx{op.getInputs().at(ir::operation::Select::Input::CONDITION)}; const auto &input_cond = _operands.at(input_cond_idx); const auto &input_true = ... const auto &input_false = ... ir::Operand &output = ... // Select output shpae ir::Shape new_shape = shape_inference::inferSelectShape( input_cond.info().shape(), input_true.info().shape(), input_false.info().shape()); output.info().shape(new_shape); } ``` - [`DynamicShapeInference.h`](/runtime/onert/core/include/exec/DynamicShapeInference.h) ```CPP void visit(const ir::operation::Select &op) override; ``` - [`DynamicShapeInference.cc`](/runtime/onert/core/src/exec/DynamicShapeInference.cc) ```CPP void DynamicShapeInferer::visit(const ir::operation::Select &op) { const auto input_cond_idx = op.getInputs().at(ir::operation::Select::Input::CONDITION); const auto &input_cond = _tensor_registry->getITensor(input_cond_idx); const auto &input_true = ... const auto &input_false = ... auto output = ... if ((!input_cond->is_dynamic()) && (!input_true->is_dynamic()) && (!input_false->is_dynamic())) { return; } auto input_cond_shape = input_cond->getShape(); auto input_true_shape = input_true->getShape(); auto input_false_shape = input_false->getShape(); // Select output shpae ir::Shape new_shape = shape_inference::inferSelectShape(input_cond_shape, input_true_shape, input_false_shape); output->applyShape(new_shape); } ``` ## Frontend This module generates IR from a model. There are two kinds of frontend: Loader and NNAPI. First, Loader loads a model file and generates IR from it. Second, NNAPI generates IR from a model set via [Neural Networks API of android](https://developer.android.com/ndk/guides/neuralnetworks) ### Loaders #### Base Loader This is where the common parts of loaders are implemented. 1. Add to base_loader to load new operation and to generate IR from it - [base_loader](/runtime/onert/core/src/loader/base_loader.h) ```cpp case BuiltinOperator::BuiltinOperator_SELECT: loadSelect(op); return; ``` ```cpp template void BaseLoader::loadSelect(const Operator *op) { ir::OperandIndexSequence inputs; ir::OperandIndexSequence outputs; loadOperationIO(op, inputs, outputs); std::unique_ptr new_op{new ir::operation::Select{inputs, outputs}}; _graph.addOperation(std::move(new_op)); } ``` #### TFLite Loader This loads a tflite file. If you want new operation to be loaded on only TFLite Loader, you only need to implement loading the operation here. #### Circle Loader This loads a circle file generated by the compiler. If you want new operation to be loaded on only Circle Loader, you only need to implement loading the operation here. ### NNAPI 1. Add to the OperationFactory to generate IR of new operation - [OperationFactory](/runtime/onert/api/nnapi/wrapper/OperationFactory.cc) ```cpp _map[ANEURALNETWORKS_SELECT] = [](const OperationFactory::Param &init_param, Operands &) { assert(init_param.input_count == 3 && init_param.output_count == 1); OperandIndexSequence outputs{init_param.outputs[0]}; // Each input should be interpreted as follows: // // 0 -> Cond Tensor Index // 1 -> Input1 Tensor Index // 2 -> Input2 Tensor Index OperandIndexSequence inputs; for (uint32_t n = 0; n < init_param.input_count; ++n) { inputs.append(OperandIndex{init_param.inputs[n]}); } return new operation::Select{inputs, outputs}; }; ``` 2. If you want that NNAPI supports new operation of TFLite's model, you need to update the things related to the operation in [nnapi_delegate](/runtime/libs/tflite/port/1.13.1/src/nnapi_delegate.cpp) like below ```cpp case tflite::BuiltinOperator_SELECT: nnapi_version = 12; // require NNAPI 1.2 nn_op_type = ANEURALNETWORKS_SELECT; break; ``` ## Backend This module generates kernels and tensors of backend such as [ComputeLibrary](https://github.com/ARM-software/ComputeLibrary/) from generated graph-based IR. For this, the runtime fairly works on it internally. But this is not enough because of dependence on backend. So, there are several components that require additional implementation on each backend. ### ShapeFixer Even for tensors of the same operation, the shape required for each backend can be different. Therefore, this component modifies and fixes shape of tensors of the backend. #### acl_cl The kernel of the ACL for the Add operation needs to match the same rank to support the broadcast. - [ShapeFixer.h](/runtime/onert/backend/acl_cl/ShapeFixer.h) ```cpp void visit(const ir::operation::Add &) override; ``` - [ShapeFixer.cc](/runtime/onert/backend/acl_cl/ShapeFixer.cc) ```cpp void ShapeFixer::visit(const ir::operation::Add &node) { const auto lhs_index{node.getInputs().at(ir::operation::Add::Input::LHS)}; const auto rhs_index{node.getInputs().at(ir::operation::Add::Input::RHS)}; if (!(_ctx.at(lhs_index).shape() == _ctx.at(rhs_index).shape())) { const auto broadcast_rank = std::max(_ctx.at(lhs_index).shape().rank(), _ctx.at(rhs_index).shape().rank()); const_cast(_ctx.at(lhs_index).shape()).extendRank(broadcast_rank); const_cast(_ctx.at(rhs_index).shape()).extendRank(broadcast_rank); } } ``` #### acl_neon Same implementation as acl_cl is required. #### cpu This backend doesn't usually require a change of shape. - [ShapeFixer.h](/runtime/onert/backend/cpu/ShapeFixer.h) ```cpp void visit(const ir::operation::Select &) override; ``` - [ShapeFixer.cc](/runtime/onert/backend/cpu/ShapeFixer.cc) ```cpp void ShapeFixer::visit(const ir::operation::Select &) { /* DO NOTHING */} ``` ### KernelGenerator This component generates kernels of backend. You have to generate kernel of new operation. And then append it to execution builder. You can obtain information of the node from IR and necessary tensors from tensor builder. #### acl_cl - [KernelGenerator.h](/runtime/onert/backend/acl_cl/KernelGenerator.h) ```cpp void visit(const ir::operation::Select &) override; ``` - [KernelGenerator.cc](/runtime/onert/backend/acl_cl/KernelGenerator.cc) ```cpp void KernelGenerator::visit(const ir::operation::Select &node) { const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)}; const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)}; const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)}; const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)}; auto output_alloc = _tensor_builder->at(output_index).get(); auto cond_alloc = _tensor_builder->at(cond_index).get(); auto input1_alloc = _tensor_builder->at(input1_index).get(); auto input2_alloc = _tensor_builder->at(input2_index).get(); auto fn = std::make_unique<::arm_compute::CLSelect>(); fn->configure(cond_alloc->handle(), input1_alloc->handle(), input2_alloc->handle(), output_alloc->handle()); auto acl_fn = asAclFunction(std::move(fn)); _execution_builder->append(std::move(acl_fn)); } ``` #### acl_neon Similar implementation as acl_cl is required. #### cpu - [KernelGenerator.h](/runtime/onert/backend/cpu/KernelGenerator.h) ```cpp void visit(const ir::operation::Select &) override; ``` - [KernelGenerator.cc](/runtime/onert/backend/cpu/KernelGenerator.cc) ```cpp void KernelGenerator::visit(const ir::operation::Select &node) { const auto output_index{node.getOutputs().at(0)}; const auto condition_index{node.getInputs().at(ir::operation::Select::Input::CONDITION)}; const auto true_index{node.getInputs().at(ir::operation::Select::Input::INPUT_TRUE)}; const auto false_index{node.getInputs().at(ir::operation::Select::Input::INPUT_FALSE)}; auto output_tensor = _tensor_reg->getPortableTensor(output_index); auto condition_tensor = _tensor_reg->getPortableTensor(condition_index); auto true_tensor = _tensor_reg->getPortableTensor(true_index); auto false_tensor = _tensor_reg->getPortableTensor(false_index); auto fn = std::make_unique(); fn->configure(condition_tensor, true_tensor, false_tensor, output_tensor); _return_fn = std::move(fn); } ``` ### ConstantInitializer (in some cases) This component registers function initializing constant tensors and initialize constant tensor layer. Most tensors will be automatically registered internally. And there are some exceptions. #### cpu - [ConstantInitializer.h](/runtime/onert/backend/cpu/ConstantInitializer.h) ```cpp void visit(const ir::operation::Conv2D &) override; ``` - [ConstantInitializer.cc](/runtime/onert/backend/cpu/ConstantInitializer.cc) ```cpp void ConstantInitializer::visit(const ir::operation::Conv2D &node) { const auto &kernel_index = node.getInputs().at(ir::operation::Conv2D::KERNEL); const auto &kernel_obj = _operands.at(kernel_index); registerCopyInitializer(kernel_index, kernel_obj); const auto &bias_index = node.getInputs().at(ir::operation::Conv2D::BIAS); const auto &bias_obj = _operands.at(bias_index); registerCopyInitializer(bias_index, bias_obj); } ``` ## Samples (to be updated) - `Select` operation - Simple explanation : `Output[i] = Condition[i] ? input1[i] : input2[i]` - PR : https://github.com/Samsung/ONE/pull/XXX