Практическая работа №4. Автоматическая оптимизация нейронной сети с помощью Apache TVM¶
1. Цели и задачи работы¶
Цель работы – изучить программный интерфейс для автоматической оптимизации нейронных сетей с помощью Apache TVM на процессорах архитектуры x86-64.
Достижение указанной цели предполагает решение следующих задач:
- Обучение архитектур логистической регрессии и полносвязной нейронной сети на наборе данных MNIST на x86-устройстве. Сохранение модели в формате Apache TVM, а также сохранение метрик качества и набора данных в формате NumPy для дальнейшего тестирования.
- Установка LLVM и сборка Apache TVM с LLVM.
- Оптимизация модели логистической регрессии.
- Загрузка модели логистической регрессии. Запуск, проверка корректности и измерение времени инференса без оптимизации.
- Оптимизация модели логистической регрессии с помощью AutoTVM, Auto-scheduler, MetaScheduler.
- Анализ результатов оптимизации логистической регрессии.
- Оптимизация полносвязной нейронной сети.
- Загрузка модели полносвязной нейронной сети. Запуск, проверка корректности и измерение времени инференса без оптимизации.
- Оптимизация модели полносвязной нейронной сети с помощью AutoTVM, Auto-scheduler, MetaScheduler.
- Анализ результатов оптимизации полносвязной нейронной сети.
- Оптимизация сверточной нейронной сети.
- Загрузка модели сверточной нейронной сети. Запуск, проверка корректности и измерение времени инференса без оптимизации.
- Оптимизация модели сверточной нейронной сети с помощью AutoTVM, Auto-scheduler, MetaScheduler.
- Анализ результатов оптимизации сверточной нейронной сети.
Полезные ссылки:
Примечание: в настоящее время Apache TVM не полностью портирован на архитектуру RISC-V, в связи с этим имеется ряд ограничений, не позволяющих в полном объеме продемонстрировать имеющийся функционал для автоматической оптимизации сетей на RISC-V-устройствах. Ниже приведен примерный перечень проблем, с которыми авторы столкнулись в процессе подготовки материалов настоящей практической работы.
- Во время оптимизации сверточных нейронных сетей возникают критические ошибки, которые не позволяют выполнить оптимизацию этих архитектур нейронных сетей на устройствах с архитектурой RISC-V. Поэтому в данной практической работе рассматриваются только полносвязные нейронные сети. В данной работе не рассматривается возможность использования оптимизации через RPC.
- При запуске на устройствах с архитектурой RISC-V используется
opt_level=2
. Более высокий уровень оптимизации вызывает ошибки компиляции модели. - На текущий момент реализация Auto-scheduler работает с серьезными ограничениями, поэтому в данной работе Auto-scheduler используется только для оптимизации логистической регрессии.
- Использование MetaScheduler для оптимизации приводит к критическим ошибкам, связанным с графом вычислений и ошибками компиляции.
2. Обучение моделей глубокого обучения¶
Обучение моделей выполняется на архитектуре x86 с использованием библиотеки PyTorch.
2.1 Установка зависимостей для обучения моделей¶
2.1.2 Установка Apache TVM¶
Необходимо установить LLVM и собрать Apache TVM из исходных кодов по аналогии с тем, как это было сделано в предыдущей практической работе. Ниже приведена соответствующая последовательность команд.
sudo apt install clang-17 llvm-17*
git clone --recursive https://github.com/apache/tvm
cd tvm
mkdir build
cd build
cmake -DUSE_LLVM=ON ..
make
2.1.2 Настройка окружения Python¶
Далее будем считать, что на x86-узле установлена Miniconda. Соответственно создадим виртуальное
окружение для подготовки тестовых моделей. Для обучения моделей используется библиотека PyTorch
и набор данных MNIST. Поэтому потребуется пакет torch
, обеспечивающий функционал, необходимый
для обучения/тестирования нейронных сетей, и torchvision
, содержащий вспомогательные функции,
в частности, для загрузки широко известных наборов данных. Далее приведена примерная
последовательность команд для создания и настройки окружения.
conda create -n torch_train python==3.10
conda activate torch_train
pip install numpy matplotlib torchmetrics
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install notebook
2.2 Обучение моделей¶
Перед обучением модели необходимо активировать созданное на предыдущем этапе виртуальное окружение и установить путь к Apache TVM.
conda activate torch_train
export PYTHONPATH=<PATH TO TVM>/python:${PYTHONPATH}
Процесс обучения реализован в файле 04_train_model_x86.ipynb
. Более подробно
возможности библиотеки PyTorch для обучения моделей рассматривались во второй практической
работе. Необходимо запустить выполнение этого файла. После завершения его работы архитектура
и веса обученных нейронных сетей будут сохранены в файл в директории model/
. В этой же
директории будет сохранен файл с показателями точности моделей. Наряду с этим, указанный скрипт
обеспечивает сохранение тестовых данных (изображения и их метки) для упрощения процедуры
их загрузки на RISC-V-устройствах. Соответственное данные сохраняются в директорию data/
.
3. Сборка и установка LLVM и Apache TVM¶
3.1. Сборка LLVM¶
Требуется собрать Apache TVM с LLVM. Рекомендуется использовать версию 15 <= LLVM <= 17.
3.1.1. Установка с помощью менеджера пакетов¶
sudo apt install clang-17 llvm-17*
3.1.1. Сборка LLVM версии llvmorg-17.0.6 (для версии llvmorg-17.0.6)¶
Для сборка LLVM из исходных кодов требуется загрузить необходимую версию LLVM из репозитори GitHub. В данном работе используется версия llvmorg-17.0.6, далее, используя утилиту CMake сгенерировать make-файлы и выполнить сборку. Ниже приведена соответствующая последовательност команд.
git clone https://github.com/llvm/llvm-project.git -b llvmorg-17.0.6
cd llvm-project
mkdir _build
cd _build
cmake -DCMAKE_BUILD_TYPE="Release" \
-DLLVM_ENABLE_PROJECTS=clang \
-DBUILD_SHARED_LIBS=True \
-DLLVM_USE_SPLIT_DWARF=True \
-DCMAKE_INSTALL_PREFIX="../../_install" ../llvm
make
Примечание: в случае сборки LLVM из исходных кодов перед сборкой Apache TVM необходимо
указать путь к LLVM в переменной окружения PATH
и создать переменную окружения LLVM_CONFIG
.
Ниже показан пример.
PATH="<PATH TO LLVM>/_build/bin:$PATH"
export LLVM_CONFIG=<PATH TO LLVM>/_build/bin/llvm-config
3.2. Установка OpenBLAS¶
Далее необходимо установить OpenBLAS, используя менеджер пакетов.
sudo apt-get install libopenblas-dev
3.3. Настройка окружения Python¶
Для выполнения практической работы создадим и настроим виртуальное окружение Python так, как показано ниже:
python3 -m venv ~/tvm_cpu/
source ~/tvm_cpu/bin/activate
pip install scipy numpy matplotlib pandas
pip install cloudpickle traitlets typing-extensions psutil pybind11 decorator attrs
pip install notebook
3.4. Сборка Apache TVM¶
Для сборки Apache TVM используем ветку main GitHub-репозитория, так как недавно были внесены критически важные исправленияя 1 и 2. Для сборки Apache TVM не обязательн использовать созданную виртуальную среду для Python.
git clone --recursive https://github.com/apache/tvm
cd tvm
mkdir build
cd build
cmake -DUSE_LLVM=ON -DUSE_BLAS=openblas ..
make
3.5. Активация окружения для практической работы¶
Для активации виртуальной среды с целью решения задач практической работы необходимо выполнить следующие команды:
source ~/tvm_cpu/bin/activate
export PYTHONPATH=<PATH TO TVM>/python:${PYTHONPATH}
4. Программная реализация вспомогательных функций¶
4.1. Импорт пакетов¶
Для использования функционала Apache TVM и других вспомогательных библиотек импортируем необходимые пакеты.
Также определим переменную, содержащую используемый тип данных для элементов тензоров -
float32
, а также установим в качестве целевого устройства для запуска CPU
.
import os
from time import time
import matplotlib.pyplot as plt
import numpy as np
import tvm
from tvm import autotvm
from tvm import auto_scheduler
from tvm import meta_schedule as ms
from tvm import relay
from tvm.autotvm.tuner import XGBTuner
from tvm.contrib import graph_executor
dtype = 'float32'
dev = tvm.cpu()
global_trial = 96
4.2. Строка компиляции¶
На данном этапе определим строку компиляции target
. Компиляция нейронных сетей
происходит на устройстве с архитектурой x86-64.
Для упрощения тестирования и отладки добавлена возможность запуска на x86_64
.
Для определения архитектуры устройства необходимо создать обьект строки компиляции
по умолчанию для LLVM - tvm.target.Target('llvm')
. Далее с помощью атрибута
mtriple
выбрать строку компиляции:
- Если атрибут
mtriple
отсутствует или содержит подстрокуx86_64
, используется стандартная строка компиляцииllvm
. - Если
mtriple
содержит подстрокуriscv64
, используется строка компиляцииllvm -jit=orcjit -mtriple=riscv64-unknown-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d
. - В противном случае генерируется исключение.
Примечание 1: если TVM устанавливался через PyPI, то mtriple
пустой.
Примечание 2: TVM поддерживает различные бэкенды, такие, как llvm, opencl, cuda и прочие. В данном случае для генерации машинного кода TIR будет транслироваться в LLVM IR, после чего из LLVM IR будет генерироваться машинный код. Краткое описание параметров строки компиляции приведено ниже.
-jit=orcjit
указывает на использование JIT-компилятора ORC (On-Request Compilation). TVM необходим данный ключ при компиляции на RISC-V.-mtriple=riscv64-unknown-linux-gnu
определяет тройку целевой архитектуры. Она указывает на платформу RISC-V 64-бит с операционной системой Linux и неуточненным вендором.-mcpu=generic-rv64
указывает целевой тип процессора.-mabi=lp64d
определяет используемый ABI (Application Binary Interface).lp64d
обозначает ABI, в котором длинные целые (long) и указатели (pointers) имеют размер 64 бита, и включена поддержка вещественных чисел двойной точности (d).-mattr=+64bit,+m,+a,+f,+d
задает атрибуты целевой архитектуры.+64bit
- поддержка 64-битной архитектуры.+m
- поддержка умножения и деления.+a
- поддержка атомарных операций.+f
- поддержка операций с плавающей запятой одинарной точности.+d
- поддержка операций с плавающей запятой двойной точности.
4.3. Уровень оптимизации графа¶
При запусках на устройствах RISC-V используется opt_level=2
, в случае запуска на архитектуре
x86-64 используется opt_level=3
.
def is_x86():
if tvm.target.Target('llvm').attrs.get('mtriple') is None:
return True
return 'x86_64' in tvm.target.Target('llvm').attrs.get('mtriple')
def is_riscv():
return 'riscv64' in tvm.target.Target('llvm').attrs.get('mtriple')
print(f"mtriple устройства {tvm.target.Target('llvm').attrs.get('mtriple')}")
if is_x86():
target = tvm.target.Target('llvm')
opt_level = 3
elif is_riscv():
target = tvm.target.Target(
'llvm -jit=orcjit -mtriple=riscv64-unknown-linux-gnu '
'-mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d'
)
opt_level = 2
else:
raise ValueError("Unsupported architecture")
print(f'{target = }')
mtriple устройства x86_64-pc-linux-gnu target = llvm -keys=cpu -mtriple=x86_64-pc-linux-gnu
4.3 Вспомогательные функции¶
Реализуем функцию load_model
для загрузки модели в формате TVM, а также функцию
load_images_and_labels
для загрузки изображений и меток из набора данных MNIST.
def load_model(mod_file, params_file):
with open(mod_file, "r") as fo:
mod = fo.read()
mod = tvm.ir.load_json(mod)
with open(params_file, "rb") as fo:
params = relay.load_param_dict(fo.read())
return mod, params
def load_images_and_labels(images_path, labels_path):
images = np.load(images_path)
labels = np.load(labels_path)
return images, labels
Далее выполним реализацию функции timeit_inference
для измерения времени инференса
и функции get_accuracy
для определения качества решения задачи.
- Функция
timeit_inference
. Измерение времени инференса проводится на наборе данных MNIST. Инференс выполняется отдельно для каждого изображения из набора данных MNIST. Время выполнения и результаты предсказания (номер класса, на котором достигается максимумальная достоверность) возвращаются из функции. - Функция
get_accuracy
. Определение качества решения задачи выполняется для всего набора данных MNIST посредством сравнения результатов предсказания и разметки. Точность вычисляется как отношение количества совпадений предсказанных и размеченных классов к общему числу изображений в наборе данных.
def timeit_inference(mod, lib, images):
input_name = mod['main'].params[0].name_hint
input_shape = mod['main'].params[0].type_annotation.shape
input_shape = [int(s) for s in input_shape]
dev = tvm.cpu()
module = graph_executor.GraphModule(lib["default"](dev))
predict = []
times = []
for i in range(len(images)):
img = np.array(images[i:i+1], dtype=np.float32).reshape(input_shape)
module.set_input(input_name, img)
ts = time()
module.run()
tf = time()
times.append((tf - ts) * 1000)
output = module.get_output(0).numpy()
predict.append(np.argmax(output))
return np.array(predict), np.array(times)
def get_accuracy(labels, predict):
return np.mean(labels == predict)
На данном этапе необходимо загрузить изображения, разметку и информацию о точности работы нейронных сетей, полученную после обучения на системе с архитектурой x86-64. Далее при решении задач практической работы точность нейронной сети необходимо сопоставлять с загруженными значениями.
images, labels = load_images_and_labels('data/test_images.npy', 'data/test_labels.npy')
metric = np.load('model/metric.npy', allow_pickle='TRUE').item()
print(metric)
{'logreg': array(0.9264, dtype=float32), 'fcnn': array(0.9804, dtype=float32), 'cnn': array(0.985, dtype=float32)}
5. Общая информация про методы автоматической оптимизации слоев в Apache TVM¶
Интерфейс методов автоматической оптимизации в Apache TVM имеет схожие элементы. Сначала происходит извлечение задач, где задачей считается слой или подграф нейронной сети. После этого каждая задача подвергается оптимизации. Результаты оптимизации логируются либо в файл, либо в отдельную директорию.
Ключевым параметром в процессе оптимизации является количество итераций оптимизации задач. Подбор этого параметра является нетривиальной задачей:
- Если значение параметра слишком маленькое, эффективное решение может не быть найдено.
- Слишком большое значение параметра приведет к значительным затратам времени.
- Оптимальное количество итераций зависит от характеристик нейронной сети, целевого устройства и используемого метода оптимизации.
На каждой итерации выполняется несколько проверок качества конкретной реализации.
Параметры этих проверок задаются через обьекты классов autotvm.measure_option
,
auto_scheduler.LocalRunner
и ms.runner.LocalRunner
, которые предоставляют интерфейс для указания числа замеров
производительности:
number
- количество запусков кода для усреднения времени выполнения в процессе одного замера.repeat
- число замеров. Всего выполняется (1 + number x repeat) запусков, где первый запуск используется для прогрева и не учитывается.enable_cpu_cache_flush
очищает кэш CPU между последовательными замерами для более точной оценки задержек.
Таким образом, чем больше значение number x repeat
, тем более точной будет оценка времени
работы планов вычислений, однако, это также увеличивает продолжительность процесса автоматической
оптимизации.
Примечание: псевдокод работы методов оценки времени выполнения в Apache TVM приведен ниже.
for r in range(repeat):
time_start = now()
for n in range(number):
func_name()
time_end = now()
total_times.append((time_end - time_start) / number)
default_logreg_time, autotvm_logreg_time, autoscheduler_logreg_time, ms_logreg_time = 0, 0, 0, 0
mod, params = load_model('model/logreg.json', 'model/logreg.params')
print(mod['main'])
fn (%input0: Tensor[(1, 784), float32] /* span=aten::linear_0.input0:0:0 */, %aten::linear_0.weight: Tensor[(10, 784), float32] /* span=aten::linear_0.weight:0:0 */, %aten::linear_0.bias: Tensor[(10), float32] /* span=aten::linear_0.bias:0:0 */) { %0 = nn.dense(%input0, %aten::linear_0.weight, units=None) /* span=aten::linear_0:0:0 */; nn.bias_add(%0, %aten::linear_0.bias, axis=-1) /* span=aten::linear_0:0:0 */ }
Следующий шаг - компиляция модели без оптимизации слоев.
with tvm.transform.PassContext(opt_level=opt_level):
lib = relay.build(mod, target=target, params=params)
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
После компиляции можно выполнить запуск вывода и измерение времени выполнения с использованием
разработанной функции timeit_inference
, а также определение качества работы логистической регрессии
с помощью функции get_accuracy
и сравнение полученной точности классификации с загруженным
значением, которое получено на x86-64.
default_logreg_predict, default_logreg_times = timeit_inference(mod, lib, images)
default_logreg_accuracy = get_accuracy(labels, default_logreg_predict)
assert np.allclose(metric['logreg'], default_logreg_accuracy, rtol=1e-5)
default_logreg_time = np.median(default_logreg_times)
print(f'Медианное время работы неоптимизированной модели: {default_logreg_time:.4f} мc')
Медианное время работы неоптимизированной модели: 0.0107 мc
6.2. Использованием возможностей AutoTVM¶
Определим функцию get_autotvm_task
для извлечения задач и вывода информации
о задачах (номер задачи и task.workload
). Для этого используем метод
autotvm.task.extract_from_program
, передав на вход модель, целевое устройство
и обученные параметры модели. В данном случае рассматриваются два типа задач:
полносвязные слои без трансформации весов и с трансформацией весов для улучшения
работы с памятью.
Для архитектур x86 и RISC-V задачи обозначаются как dense_*.x86
. На данный момент
в Apache TVM нет реализаций планов вычислений для RISC-V. Благодаря тому, что Apache TVM
опирается на возможности LLVM в процессе компиляции и сходство архитектур, инструмент
успешно использует планы вычислений, разработанные для x86-платформ, на устройствах
с архитектурой RISC-V.
def get_autotvm_task(
mod: tvm.ir.module.IRModule,
target: tvm.target.target.Target,
params: tvm.ir.container.Map
) -> list[tvm.autotvm.task.task.Task, ...]:
"""
Параметры:
mod: Модуль IRModule.
target: Строка компиляции.
params: Веса нейронной сети.
Возвращаемое значение:
Список задач.
"""
print("Извлечение задач\n")
tasks = autotvm.task.extract_from_program(
mod, target=target, params=params,
)
for idx, task in enumerate(tasks):
print(f"Номер задачи: {idx}\nИнформация о задаче: {task.workload}\n")
return tasks
Вызовем разработанную функцию get_autotvm_task
для извлечения задач
из графа вычислений для AutoTVM.
tasks = get_autotvm_task(mod, target, params)
Извлечение задач Номер задачи: 0 Информация о задаче: ('dense_nopack.x86', ('TENSOR', (1, 784), 'float32'), ('TENSOR', (10, 784), 'float32'), None, 'float32') Номер задачи: 1 Информация о задаче: ('dense_pack.x86', ('TENSOR', (1, 784), 'float32'), ('TENSOR', (10, 784), 'float32'), None, 'float32')
Следующий этап после извлечения задач - это оптимизация каждой задачи. Для этого
необходимо реализовать функцию tune_autotvm
, содержащую установку параметров
оптимизации и ее запуск.
Вначале необходимо определить параметры проверки времени выполнения каждого плана
с помощью autotvm.measure_option
и autotvm.LocalRunner
.
Затем для каждой задачи определить модель затрат. В качестве модели затрат для оценки
времени выполнения слоя используется метод градиентного бустинга деревьев, реализованный
на базе XGBoost. Apache TVM предоставляет интерфейс для нескольких методов оптимизации.
Инициализируем для каждой задачи класс
XGBTuner
.
Каждая задача оптимизируется min(n_trial, len(task.config_space))
раз, где n_trial
-
заданное количество попыток, а len(task.config_space)
- количество различных
конфигураций в плане вычислений для данного тензорного выражения.
После определения всех параметров необходимо запустить оптимизацию с помощью метода
tuner_obj.tune
,
передав в качестве параметров количество экспериментов оптимизации для каждой задачи, объект measure_option
и название файла для логирования через autotvm.callback.log_to_file(log_file)
.
def tune_autotvm(
tasks: list[tvm.autotvm.task.task.Task, ...],
n_trial: int,
log_file: str
):
"""
Параметры:
tasks: Список задач.
n_trial: Количество экспериментов для каждой задачи.
log_file: Файл для логирование результатов оптимизации.
"""
measure_option = autotvm.measure_option(
builder=autotvm.LocalBuilder(),
runner=autotvm.LocalRunner(repeat=1, number=3, enable_cpu_cache_flush=True),
)
for i, task in enumerate(tasks):
prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
tuner_obj = XGBTuner(task)
n = min(n_trial, len(task.config_space))
tuner_obj.tune(
n_trial=n,
measure_option=measure_option,
callbacks=[
autotvm.callback.progress_bar(n_trial, prefix=prefix),
autotvm.callback.log_to_file(log_file),
],
)
Для запуска оптимизации с помощью AutoTVM необходимо определить файл log_file
для логирование результатов оптимизации, установить число экспериментов при оптимизации,
а затем вызвать разработанную функцию tune_autotvm
.
os.makedirs('autotvm/', exist_ok=True)
log_file = 'autotvm/autotvm_logreg.log'
n_trial = global_trial
tune_autotvm(tasks, n_trial, log_file)
[Task 1/ 2] Current/Best: 0.64/ 16.04 GFLOPS | Progress: (60/96) | 26.38 s Done. [Task 2/ 2] Current/Best: 3.41/ 14.86 GFLOPS | Progress: (96/96) | 34.58 s Done.
Перед использованием оптимизированной модели необходимо выполнить компиляцию модели
с учетом истории оптимизации, которая была сохранена в файл log_file
.
with autotvm.apply_history_best(log_file):
with tvm.transform.PassContext(opt_level=opt_level):
lib = relay.build(mod, target=target, params=params)
На данном этапе можно выполнить измерение времени вывода с использованием функции
timeit_inference
, проверку качества работы оптимизированной модели с помощью функции
get_accuracy
и сравнение точности классификации с рефенсным значением, которое было
получено после запуска обучения модели.
autotvm_logreg_predict, autotvm_logreg_times = timeit_inference(mod, lib, images)
autotvm_logreg_accuracy = get_accuracy(labels, autotvm_logreg_predict)
assert np.allclose(metric['logreg'], autotvm_logreg_accuracy, rtol=1e-5)
autotvm_logreg_time = np.median(autotvm_logreg_times)
print(f'Медианное время работы после оптимизации слоев с помощью AutoTVM: {autotvm_logreg_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью AutoTVM: 0.0033 мc
6.3. Использование Auto-scheduler¶
Определим функцию get_auto_scheduler_task
для извлечения задач и вывода
информации о задачах (номер задачи и task.desc
).
Аналогично AutoTVM, вначале необходимо извлечь задачи, используя метод
auto_scheduler.extract_tasks
, передав в качестве входных параметров
модель, целевое устройство для запуска вывода, набор обученных параметров модели. Также Auto-scheduler
позволяет регулировать уровень оптимизации графа с помощью параметра opt_level
. Это значение
должно совпадать с уровнем оптимизации графа вычислений при компиляции модели. Отметим, что
в данном случае, граф вычислений состоит только из одного слоя, поэтому объединение слоев
не будет выполняться. Метод возвращает значение task_weights
, которое определяет вес
каждого подграфа. По умолчанию вес равен $1$. Если присутствуют $N$ одинаковых подграфов, то они
будут представлены в виде одной задачи с весом $N$.
def get_auto_scheduler_task(
mod: tvm.ir.module.IRModule,
target: tvm.target.target.Target,
params: tvm.ir.container.Map,
opt_level: int
) -> tuple[list[tvm.auto_scheduler.search_task.SearchTask, ...], list[int, ...]]:
"""
Параметры:
mod: Модуль IRModule.
target: Строка компиляции.
params: Веса нейронной сети.
opt_level: Уровень оптимизации графа вычислений.
Возвращаемое значение:
Список задач и список весов задач.
"""
tasks, task_weights = auto_scheduler.extract_tasks(mod, target=target, params=params, opt_level=opt_level)
for idx, task in enumerate(tasks):
print(f"Номер задачи: {idx}\nИнформация о задаче: {task.desc}\n")
return tasks, task_weights
Выполним извлечение задач для Auto-scheduler, вызвав функцию get_auto_scheduler_task
.
tasks, task_weights = get_auto_scheduler_task(mod, target, params, opt_level)
Номер задачи: 0 Информация о задаче: vm_mod_fused_nn_dense_add
Далее реализуем функцию tune_auto_scheduler
для автоматической настройки параметров
оптимизации нейронной сети.
В данном случае для оптимизации необходимо создать обьект класса
auto_scheduler.TaskScheduler
с описанием задач
и определить параметры оптимизации auto_scheduler.TuningOptions
. После этого можно вызвать метод
tune
для созданного объекта класса auto_scheduler.TaskScheduler
.
Примечания:
- При определении параметров оптимизации используется параметр
num_measures_per_round
. Он определяет количество конфигураций аннотированных эскизов, для которых будет измерено время перед обновлением базы результатов. После обновления базы результатов модель затрат переобучается, и запускается новая итерация эволюционного алгоритма для генерации новых эскизов. - Параметр количества оптимизаций
num_measure_trials
в Auto-scheduler задает общее количество измерений для всех подграфов.
def tune_auto_scheduler(
tasks: list[tvm.auto_scheduler.search_task.SearchTask, ...],
task_weights: list[int, ...],
log_file: str,
n_trials: int
):
"""
Параметры:
tasks: Список задач.
task_weights: Список весов задач.
n_trial: Количество экспериментов для каждой задачи.
log_file: Файл для логирования результатов оптимизации.
"""
tuner = auto_scheduler.TaskScheduler(tasks, task_weights, strategy='round-robin')
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=n_trials,
num_measures_per_round=8,
runner=auto_scheduler.LocalRunner(repeat=1, number=3, enable_cpu_cache_flush=True),
measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
verbose=1,
)
tuner.tune(tune_option)
На данном этапе можно выполнить запуск оптимизации с помощью Auto-scheduler.
Определим файл с навзанием log_file
для логирования результатов оптимизации.
Установим число экспериментов при оптимизации равным N * len(tasks)
. Выполним
запуск оптимизации посредством вызова функции tune_auto_scheduler
.
os.makedirs('auto_schedule/', exist_ok=True)
log_file = 'auto_schedule/auto-schedule_logreg.log'
n_trial_per_task = global_trial
tune_auto_scheduler(tasks, task_weights, log_file, n_trial_per_task * len(tasks))
| ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials |---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | - | - | 0 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: - ms Trials: 0 Used time : 0 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Generate Sketches #s: 5 Sample Initial Population #s: 905 fail_ct: 694 Time elapsed: 0.81 GA Iter: 0 Max score: 0.9990 Min score: 0.9849 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 1.0000 Min score: 0.9986 #Pop: 16 #M+: 1383 #M-: 71 EvolutionarySearch #s: 16 Time elapsed: 3.28 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ....E.E.E.E.E*** Time elapsed for measurement: 2.38 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.03 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.008 | 1.88 | 8 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.008 ms Trials: 8 Used time : 7 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 953 fail_ct: 701 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9981 Min score: 0.9893 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9998 Min score: 0.9980 #Pop: 16 #M+: 1380 #M-: 78 EvolutionarySearch #s: 16 Time elapsed: 3.38 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ...E..E.E.E.E*** Time elapsed for measurement: 2.44 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.03 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.008 | 2.00 | 16 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.008 ms Trials: 16 Used time : 13 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 933 fail_ct: 666 Time elapsed: 0.75 GA Iter: 0 Max score: 0.9854 Min score: 0.9375 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9854 Min score: 0.9375 #Pop: 16 #M+: 1376 #M-: 69 EvolutionarySearch #s: 16 Time elapsed: 3.50 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.45 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.008 | 2.00 | 24 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.008 ms Trials: 24 Used time : 20 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 936 fail_ct: 666 Time elapsed: 0.76 GA Iter: 0 Max score: 1.0220 Min score: 0.9424 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 1.0220 Min score: 0.9872 #Pop: 16 #M+: 1384 #M-: 76 EvolutionarySearch #s: 16 Time elapsed: 3.51 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.50 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.008 | 2.00 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.008 ms Trials: 32 Used time : 27 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 915 fail_ct: 673 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9778 Min score: 0.9376 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9993 Min score: 0.9681 #Pop: 16 #M+: 1376 #M-: 76 EvolutionarySearch #s: 16 Time elapsed: 3.52 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.45 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.007 | 2.10 | 40 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.007 ms Trials: 40 Used time : 34 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 933 fail_ct: 713 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9421 Min score: 0.8828 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9439 Min score: 0.9176 #Pop: 16 #M+: 1386 #M-: 82 EvolutionarySearch #s: 16 Time elapsed: 3.52 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.43 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.007 | 2.10 | 48 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.007 ms Trials: 48 Used time : 40 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 925 fail_ct: 700 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9296 Min score: 0.8383 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9296 Min score: 0.9032 #Pop: 16 #M+: 1383 #M-: 77 EvolutionarySearch #s: 16 Time elapsed: 3.49 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.48 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.007 | 2.10 | 56 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.007 ms Trials: 56 Used time : 47 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 951 fail_ct: 693 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9323 Min score: 0.8621 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9323 Min score: 0.8974 #Pop: 16 #M+: 1383 #M-: 76 EvolutionarySearch #s: 16 Time elapsed: 3.50 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.43 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.005 | 3.18 | 64 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.005 ms Trials: 64 Used time : 54 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 927 fail_ct: 680 Time elapsed: 0.75 GA Iter: 0 Max score: 0.6080 Min score: 0.5780 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9661 Min score: 0.8379 #Pop: 16 #M+: 1378 #M-: 74 EvolutionarySearch #s: 16 Time elapsed: 3.46 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.38 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.005 | 3.18 | 72 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.005 ms Trials: 72 Used time : 60 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 922 fail_ct: 696 Time elapsed: 0.75 GA Iter: 0 Max score: 0.6066 Min score: 0.5840 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9249 Min score: 0.7680 #Pop: 16 #M+: 1385 #M-: 74 EvolutionarySearch #s: 16 Time elapsed: 3.45 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.54 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.005 | 3.22 | 80 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.005 ms Trials: 80 Used time : 67 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 942 fail_ct: 669 Time elapsed: 0.78 GA Iter: 0 Max score: 0.6339 Min score: 0.5862 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9170 Min score: 0.7797 #Pop: 16 #M+: 1374 #M-: 65 EvolutionarySearch #s: 16 Time elapsed: 3.48 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.46 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_dense_add | 0.005 | 3.22 | 88 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.005 ms Trials: 88 Used time : 74 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 918 fail_ct: 734 Time elapsed: 0.77 GA Iter: 0 Max score: 0.6298 Min score: 0.6084 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9234 Min score: 0.7981 #Pop: 16 #M+: 1387 #M-: 61 EvolutionarySearch #s: 16 Time elapsed: 3.48 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.53 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s
По завершении оптимизации необходимо скомпилировать модель с учетом истории оптимизации.
with auto_scheduler.ApplyHistoryBest(log_file):
with tvm.transform.PassContext(
opt_level=opt_level, config={"relay.backend.use_auto_scheduler": True},
):
lib = relay.build(mod, target=target, params=params)
Далее для скомпилированной модели можно выполнить измерение времени выполнения
с использованием вызова функции timeit_inference
, определить качество работы
с помощью функции get_accuracy
и проверить корректность, сравнив полученное
значение показателя точности с референсным значением.
autoscheduler_logreg_predict, autoscheduler_logreg_times = timeit_inference(mod, lib, images)
autoscheduler_logreg_accuracy = get_accuracy(labels, autoscheduler_logreg_predict)
assert np.allclose(metric['logreg'], autoscheduler_logreg_accuracy, rtol=1e-5)
autoscheduler_logreg_time = np.median(autoscheduler_logreg_times)
print(f'Медианное время работы после оптимизации слоев с помощью Auto-scheduler: {autoscheduler_logreg_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью Auto-scheduler: 0.0062 мc
6.4. Применение MetaScheduler¶
Использование MetaScheduler требует указания числа ядер при формировании строки,
содержащей параметры целевого устройства, например, -num-cores 4
. Данный
параметр можно указать равным количеству физических ядер на устройстве. Внесем
соответствующие изменения в исходный код.
print(f"mtriple устройства {tvm.target.Target('llvm').attrs.get('mtriple')}")
if is_x86():
target = tvm.target.Target('llvm -num-cores 6')
elif is_riscv():
target = tvm.target.Target(
'llvm -jit=orcjit -mtriple=riscv64-unknown-linux-gnu '
'-mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d -num-cores 4'
)
else:
raise ValueError("Unsupported architecture")
print(f'{target = }')
mtriple устройства x86_64-pc-linux-gnu
Определим функцию get_ms_task
для извлечения задач и вывода информации
о задачах (номер задачи и task.task_name
). Аналогично предыдущим методам
оптимизации, извлечение задач выполняется с помощью методов ms.relay_integration.extract_tasks
и ms.relay_integration.extracted_tasks_to_tune_contexts
.
def get_ms_task(
mod: tvm.ir.module.IRModule,
target: tvm.target.target.Target,
params: tvm.ir.container.Map,
opt_level: int,
work_dir: str
) -> tuple[list[tvm.meta_schedule.tune_context.TuneContext, ...], list[int, ...]]:
"""
Параметры:
mod: Модуль IRModule.
target: Строка компиляции.
params: Веса нейронной сети.
opt_level: Уровень оптимизации графа вычислений.
work_dir: Директория для логирования результатов оптимизации.
Возвращаемое значение:
Список задач и список весов задач.
"""
extracted_tasks = ms.relay_integration.extract_tasks(
mod, target=target, params=params, opt_level=opt_level,
)
tasks, task_weights = ms.relay_integration.extracted_tasks_to_tune_contexts(
extracted_tasks, work_dir
)
for idx, task in enumerate(tasks):
print(f"Номер задачи: {idx}\nИнформация о задаче: {task.task_name}\n")
return tasks, task_weights
Вызовем разработанную функцию get_ms_task
, предварительно определив
директорию work_dir
для логирования результатов оптимизации.
work_dir = "meta_schedule_logreg"
if is_x86():
tasks, task_weights = get_ms_task(mod, target, params, opt_level, work_dir)
2024-11-06 18:02:07 [INFO] Logging directory: meta_schedule_logreg/logs Номер задачи: 0 Информация о задаче: fused_nn_dense_add
По аналогии с другими рассмотренными методами реализуем функцию tune_ms
для автоматической настройки параметров запуска вывода нейронной сети. Данная функция
должна вызывать метод ms.tune.tune_tasks
, который принимает на вход набор задач,
веса этих задач и параметры оптимизации.
Примечание: для указания количества запусков при оценке качества эскиза
на вход ms.tune.tune_tasks
передается объект ms.runner.LocalRunner
с указанием параметра ms.runner.config.EvaluatorConfig
.
def tune_ms(
tasks: list[tvm.meta_schedule.tune_context.TuneContext, ...],
task_weights: list[int, ...],
work_dir: str,
n_trials: int
):
"""
Параметры:
tasks: Список задач.
task_weights: Список весов задач.
work_dir: Директория для логирования результатов оптимизации.
n_trial: Количество экспериментов для каждой задачи.
"""
if not os.path.exists(work_dir):
os.mkdir(work_dir)
ms.tune.tune_tasks(
tasks=tasks,
task_weights=task_weights,
work_dir=work_dir,
max_trials_global=n_trials,
num_trials_per_iter=8,
builder=ms.builder.LocalBuilder(),
runner=ms.runner.LocalRunner(
evaluator_config=ms.runner.config.EvaluatorConfig(repeat=1, number=3, enable_cpu_cache_flush=True)
),
)
Далее выполним запуск оптимизации с помощью MetaScheduler посредством вызова
функции tune_ms
, установив число экспериментов при оптимизации равным
N * len(tasks)
.
n_trial_per_task = global_trial
if is_x86():
tune_ms(tasks, task_weights, work_dir, n_trial_per_task * len(tasks))
2024-11-06 18:02:15 [INFO] LocalBuilder: max_workers = 12 2024-11-06 18:02:16 [INFO] LocalRunner: max_workers = 1 2024-11-06 18:02:17 [INFO] [task_scheduler.cc:159] Initializing Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | N/A | N/A | N/A | 0 |
Total trials: 0 Total latency (us): 0 2024-11-06 18:02:17 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | N/A | N/A | N/A | 0 | ------------------------------------------------------------------------------------------------------------------ Total trials: 0 Total latency (us): 0 2024-11-06 18:02:17 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:18 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:20 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:02:22 [DEBUG] XGB iter 0: tr-p-rmse: 0.599053 tr-a-peak@32: 0.810757 tr-rmse: 0.332563 tr-rmse: 0.332563 2024-11-06 18:02:22 [DEBUG] XGB iter 25: tr-p-rmse: 0.047430 tr-a-peak@32: 1.000000 tr-rmse: 0.368312 tr-rmse: 0.368312 2024-11-06 18:02:22 [DEBUG] XGB iter 50: tr-p-rmse: 0.047438 tr-a-peak@32: 1.000000 tr-rmse: 0.368301 tr-rmse: 0.368301 2024-11-06 18:02:22 [DEBUG] XGB stopped. Best iteration: [18] tr-p-rmse:0.04720 tr-a-peak@32:1.00000 tr-rmse:0.36865 tr-rmse:0.36865 2024-11-06 18:02:22 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 14.3007 | 1.0971 | 1.0971 | 8 |
Total trials: 8 Total latency (us): 1.09715 2024-11-06 18:02:22 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 14.3007 | 1.0971 | 1.0971 | 8 | ------------------------------------------------------------------------------------------------------------------ Total trials: 8 Total latency (us): 1.09715 2024-11-06 18:02:22 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:24 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:25 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:02:28 [DEBUG] XGB validation: p-rmse: 0.238527 a-peak@32: 0.947573 2024-11-06 18:02:28 [DEBUG] XGB iter 0: tr-p-rmse: 0.616919 tr-a-peak@32: 0.735797 tr-rmse: 0.295998 tr-rmse: 0.295998 2024-11-06 18:02:28 [DEBUG] XGB iter 25: tr-p-rmse: 0.063694 tr-a-peak@32: 1.000000 tr-rmse: 0.322048 tr-rmse: 0.322048 2024-11-06 18:02:28 [DEBUG] XGB iter 50: tr-p-rmse: 0.061487 tr-a-peak@32: 1.000000 tr-rmse: 0.322399 tr-rmse: 0.322399 2024-11-06 18:02:28 [DEBUG] XGB iter 75: tr-p-rmse: 0.061487 tr-a-peak@32: 1.000000 tr-rmse: 0.322399 tr-rmse: 0.322399 2024-11-06 18:02:28 [DEBUG] XGB stopped. Best iteration: [39] tr-p-rmse:0.06149 tr-a-peak@32:1.00000 tr-rmse:0.32240 tr-rmse:0.32240 2024-11-06 18:02:28 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 16 |
Total trials: 16 Total latency (us): 0.85589 2024-11-06 18:02:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 16 | ------------------------------------------------------------------------------------------------------------------ Total trials: 16 Total latency (us): 0.85589 2024-11-06 18:02:28 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:29 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:31 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:02:33 [DEBUG] XGB validation: p-rmse: 0.143498 a-peak@32: 0.985518 2024-11-06 18:02:33 [DEBUG] XGB iter 0: tr-p-rmse: 0.558818 tr-a-peak@32: 0.863130 tr-rmse: 0.303502 tr-rmse: 0.303502 2024-11-06 18:02:33 [DEBUG] XGB iter 25: tr-p-rmse: 0.044666 tr-a-peak@32: 1.000000 tr-rmse: 0.339051 tr-rmse: 0.339051 2024-11-06 18:02:33 [DEBUG] XGB iter 50: tr-p-rmse: 0.044602 tr-a-peak@32: 1.000000 tr-rmse: 0.339225 tr-rmse: 0.339225 2024-11-06 18:02:33 [DEBUG] XGB iter 75: tr-p-rmse: 0.044602 tr-a-peak@32: 1.000000 tr-rmse: 0.339225 tr-rmse: 0.339225 2024-11-06 18:02:33 [DEBUG] XGB stopped. Best iteration: [36] tr-p-rmse:0.04460 tr-a-peak@32:1.00000 tr-rmse:0.33922 tr-rmse:0.33922 2024-11-06 18:02:33 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 24 |
2024-11-06 18:02:33 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 24 | ------------------------------------------------------------------------------------------------------------------ Total trials: 24 Total latency (us): 0.85589 Total trials: 24 Total latency (us): 0.85589 2024-11-06 18:02:33 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:35 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:37 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:02:39 [DEBUG] XGB validation: p-rmse: 0.075066 a-peak@32: 1.000000 2024-11-06 18:02:39 [DEBUG] XGB iter 0: tr-p-rmse: 0.552131 tr-a-peak@32: 0.832467 tr-rmse: 0.305779 tr-rmse: 0.305779 2024-11-06 18:02:39 [DEBUG] XGB iter 25: tr-p-rmse: 0.035881 tr-a-peak@32: 1.000000 tr-rmse: 0.339852 tr-rmse: 0.339852 2024-11-06 18:02:39 [DEBUG] XGB iter 50: tr-p-rmse: 0.035777 tr-a-peak@32: 1.000000 tr-rmse: 0.340078 tr-rmse: 0.340078 2024-11-06 18:02:39 [DEBUG] XGB iter 75: tr-p-rmse: 0.035777 tr-a-peak@32: 1.000000 tr-rmse: 0.340078 tr-rmse: 0.340078 2024-11-06 18:02:39 [DEBUG] XGB stopped. Best iteration: [36] tr-p-rmse:0.03578 tr-a-peak@32:1.00000 tr-rmse:0.34008 tr-rmse:0.34008 2024-11-06 18:02:39 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 32 |
2024-11-06 18:02:39 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 32 | ------------------------------------------------------------------------------------------------------------------ Total trials: 32 Total latency (us): 0.85589 Total trials: 32 Total latency (us): 0.85589 2024-11-06 18:02:39 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:40 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:42 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:02:44 [DEBUG] XGB validation: p-rmse: 0.190006 a-peak@32: 0.911841 2024-11-06 18:02:44 [DEBUG] XGB iter 0: tr-p-rmse: 0.540138 tr-a-peak@32: 0.804785 tr-rmse: 0.294483 tr-rmse: 0.294483 2024-11-06 18:02:44 [DEBUG] XGB iter 25: tr-p-rmse: 0.045466 tr-a-peak@32: 1.000000 tr-rmse: 0.331657 tr-rmse: 0.331657 2024-11-06 18:02:44 [DEBUG] XGB iter 50: tr-p-rmse: 0.045316 tr-a-peak@32: 1.000000 tr-rmse: 0.332103 tr-rmse: 0.332103 2024-11-06 18:02:44 [DEBUG] XGB iter 75: tr-p-rmse: 0.045316 tr-a-peak@32: 1.000000 tr-rmse: 0.332103 tr-rmse: 0.332103 2024-11-06 18:02:44 [DEBUG] XGB stopped. Best iteration: [35] tr-p-rmse:0.04532 tr-a-peak@32:1.00000 tr-rmse:0.33210 tr-rmse:0.33210 2024-11-06 18:02:44 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 40 |
Total trials: 40 Total latency (us): 0.85589 2024-11-06 18:02:44 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 40 | ------------------------------------------------------------------------------------------------------------------ Total trials: 40 Total latency (us): 0.85589 2024-11-06 18:02:44 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:46 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:47 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:02:49 [DEBUG] XGB validation: p-rmse: 0.130328 a-peak@32: 1.000000 2024-11-06 18:02:49 [DEBUG] XGB iter 0: tr-p-rmse: 0.552521 tr-a-peak@32: 0.779963 tr-rmse: 0.294587 tr-rmse: 0.294587 2024-11-06 18:02:50 [DEBUG] XGB iter 25: tr-p-rmse: 0.041127 tr-a-peak@32: 1.000000 tr-rmse: 0.329823 tr-rmse: 0.329823 2024-11-06 18:02:50 [DEBUG] XGB iter 50: tr-p-rmse: 0.041115 tr-a-peak@32: 1.000000 tr-rmse: 0.329842 tr-rmse: 0.329842 2024-11-06 18:02:50 [DEBUG] XGB iter 75: tr-p-rmse: 0.041115 tr-a-peak@32: 1.000000 tr-rmse: 0.329842 tr-rmse: 0.329842 2024-11-06 18:02:50 [DEBUG] XGB stopped. Best iteration: [33] tr-p-rmse:0.04111 tr-a-peak@32:1.00000 tr-rmse:0.32984 tr-rmse:0.32984 2024-11-06 18:02:50 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 48 |
Total trials: 48 Total latency (us): 0.85589 2024-11-06 18:02:50 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 48 | ------------------------------------------------------------------------------------------------------------------ Total trials: 48 Total latency (us): 0.85589 2024-11-06 18:02:50 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:51 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:53 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:02:56 [DEBUG] XGB validation: p-rmse: 0.088329 a-peak@32: 1.000000 2024-11-06 18:02:56 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 56 |
2024-11-06 18:02:56 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 56 | ------------------------------------------------------------------------------------------------------------------ Total trials: 56 Total latency (us): 0.85589 Total trials: 56 Total latency (us): 0.85589 2024-11-06 18:02:56 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:02:57 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:02:59 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:03:01 [DEBUG] XGB validation: p-rmse: 0.203209 a-peak@32: 1.000000 2024-11-06 18:03:01 [DEBUG] XGB iter 0: tr-p-rmse: 0.547631 tr-a-peak@32: 0.705767 tr-rmse: 0.273523 tr-rmse: 0.273523 2024-11-06 18:03:01 [DEBUG] XGB iter 25: tr-p-rmse: 0.039272 tr-a-peak@32: 1.000000 tr-rmse: 0.312109 tr-rmse: 0.312109 2024-11-06 18:03:01 [DEBUG] XGB iter 50: tr-p-rmse: 0.039263 tr-a-peak@32: 1.000000 tr-rmse: 0.312125 tr-rmse: 0.312125 2024-11-06 18:03:01 [DEBUG] XGB iter 75: tr-p-rmse: 0.039263 tr-a-peak@32: 1.000000 tr-rmse: 0.312125 tr-rmse: 0.312125 2024-11-06 18:03:01 [DEBUG] XGB stopped. Best iteration: [31] tr-p-rmse:0.03926 tr-a-peak@32:1.00000 tr-rmse:0.31212 tr-rmse:0.31212 2024-11-06 18:03:01 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 64 |
Total trials: 64 Total latency (us): 0.85589 2024-11-06 18:03:01 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 64 | ------------------------------------------------------------------------------------------------------------------ Total trials: 64 Total latency (us): 0.85589 2024-11-06 18:03:01 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:03:03 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:03:04 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:03:06 [DEBUG] XGB validation: p-rmse: 0.057320 a-peak@32: 1.000000 2024-11-06 18:03:06 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 72 |
2024-11-06 18:03:06 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 72 | ------------------------------------------------------------------------------------------------------------------ Total trials: 72 Total latency (us): 0.85589 Total trials: 72 Total latency (us): 0.85589 2024-11-06 18:03:06 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:03:08 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:03:10 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:03:12 [DEBUG] XGB validation: p-rmse: 0.123057 a-peak@32: 1.000000 2024-11-06 18:03:12 [DEBUG] XGB iter 0: tr-p-rmse: 0.530334 tr-a-peak@32: 1.000000 tr-rmse: 0.274288 tr-rmse: 0.274288 2024-11-06 18:03:12 [DEBUG] XGB iter 25: tr-p-rmse: 0.038213 tr-a-peak@32: 1.000000 tr-rmse: 0.312104 tr-rmse: 0.312104 2024-11-06 18:03:12 [DEBUG] XGB iter 50: tr-p-rmse: 0.038206 tr-a-peak@32: 1.000000 tr-rmse: 0.312117 tr-rmse: 0.312117 2024-11-06 18:03:12 [DEBUG] XGB iter 75: tr-p-rmse: 0.038206 tr-a-peak@32: 1.000000 tr-rmse: 0.312117 tr-rmse: 0.312117 2024-11-06 18:03:12 [DEBUG] XGB stopped. Best iteration: [31] tr-p-rmse:0.03821 tr-a-peak@32:1.00000 tr-rmse:0.31212 tr-rmse:0.31212 2024-11-06 18:03:12 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 80 |
Total trials: 80 Total latency (us): 0.85589 2024-11-06 18:03:12 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 80 | ------------------------------------------------------------------------------------------------------------------ Total trials: 80 Total latency (us): 0.85589 2024-11-06 18:03:12 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:03:14 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:03:15 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:03:17 [DEBUG] XGB validation: p-rmse: 0.073607 a-peak@32: 0.950096 2024-11-06 18:03:17 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 88 |
2024-11-06 18:03:17 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 88 | ------------------------------------------------------------------------------------------------------------------ Total trials: 88 Total latency (us): 0.85589 Total trials: 88 Total latency (us): 0.85589 2024-11-06 18:03:17 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add" 2024-11-06 18:03:19 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:03:21 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:03:23 [DEBUG] XGB validation: p-rmse: 0.036131 a-peak@32: 1.000000 2024-11-06 18:03:23 [DEBUG] XGB iter 0: tr-p-rmse: 0.536892 tr-a-peak@32: 1.000000 tr-rmse: 0.264214 tr-rmse: 0.264214 2024-11-06 18:03:23 [DEBUG] XGB iter 25: tr-p-rmse: 0.035004 tr-a-peak@32: 1.000000 tr-rmse: 0.298392 tr-rmse: 0.298392 2024-11-06 18:03:23 [DEBUG] XGB iter 50: tr-p-rmse: 0.035002 tr-a-peak@32: 1.000000 tr-rmse: 0.298396 tr-rmse: 0.298396 2024-11-06 18:03:23 [DEBUG] XGB iter 75: tr-p-rmse: 0.035002 tr-a-peak@32: 1.000000 tr-rmse: 0.298396 tr-rmse: 0.298396 2024-11-06 18:03:23 [DEBUG] XGB stopped. Best iteration: [27] tr-p-rmse:0.03500 tr-a-peak@32:1.00000 tr-rmse:0.29839 tr-rmse:0.29839 2024-11-06 18:03:23 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 96 |
2024-11-06 18:03:23 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 96 | ------------------------------------------------------------------------------------------------------------------ Total trials: 96 Total latency (us): 0.85589 Total trials: 96 Total latency (us): 0.85589 2024-11-06 18:03:23 [INFO] [task_scheduler.cc:260] Task #0 has finished. Remaining task(s): 0
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 96 | Y |
Total trials: 96 Total latency (us): 0.85589 2024-11-06 18:03:23 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------ 0 | fused_nn_dense_add | 15690 | 1 | 18.3318 | 0.8559 | 0.8559 | 96 | Y ------------------------------------------------------------------------------------------------------------------ Total trials: 96 Total latency (us): 0.85589
После оптимизации можно скомпилировать нейронную с учетом построенных оптимизаций
с помощью интерфейса MetaScheduler ms.relay_integration.compile_relay
.
if is_x86():
database = ms.database.JSONDatabase(
f"{work_dir}/database_workload.json",
f"{work_dir}/database_tuning_record.json",
allow_missing=False
)
lib = ms.relay_integration.compile_relay(
database, mod, target, params,
opt_level=opt_level,
)
В завершении измерим время вывода с использованием функции timeit_inference
,
определим качество работы модели с помощью функции get_accuracy
и выполним
проверку корректности работы оптимизированной модели, сравнив полученное значение
показателя точности с референсным.
if is_x86():
ms_logreg_predict, ms_logreg_times = timeit_inference(mod, lib, images)
ms_logreg_accuracy = get_accuracy(labels, ms_logreg_predict)
assert np.allclose(metric['logreg'], ms_logreg_accuracy, rtol=1e-5)
ms_logreg_time = np.median(ms_logreg_times)
print(f'Медианное время работы после оптимизации слоев с помощью MetaScheduler: {ms_logreg_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью MetaScheduler: 0.0036 мc
6.5. Анализ полученных результатов¶
Для анализа результатов оптимизации нейронной сети с использованием различных методов построим прафик медианного времени выполнения.
fig, ax = plt.subplots()
name = ['Без оптимизации\nслоев', 'AutoTVM', 'Auto-scheduler', 'MetaScheduler']
times = [default_logreg_time, autotvm_logreg_time, autoscheduler_logreg_time, ms_logreg_time]
bar_labels = ['red', 'blue', '_red', 'orange']
bar_colors = ['tab:blue', 'tab:red', 'tab:green', 'tab:orange']
bars = ax.bar(name, times, label=name, color=bar_colors)
ax.set_title('Среднее время\nвыполнения (мс)', fontsize=18)
for bar, n, t in zip(bars, name, times):
h = bar.get_height()
if n == 'Без оптимизации\nслоев': h = h / 2
if h != 0:
ax.text(
bar.get_x() + bar.get_width() / 2,
h,
f'{round(t, 4)} с',
ha='center',
va='bottom',
fontsize=15,
)
ax.xaxis.label.set_size(40)
ax.set_title('Среднее время\nвыполнения (с)', fontsize=18)
plt.grid()
Вывод: оптимизация значительно ускоряет время работы сети.
7. Запуск и оптимизация полносвязной нейронной сети¶
7.1. Компиляция и запуск модели¶
Вначале необходимо выполнить загрузку модели полносвязной нейронной сети. Следует учитывать что в данной модели больше слоев и, следовательно, будут другие промежуточные и финальные результаты работы методов, например, количество и веса извлеченных задач, результаты оптимизации.
default_fcnn_time, autotvm_fcnn_time, ms_fcnn_time = 0, 0, 0
mod, params = load_model('model/fcnn.json', 'model/fcnn.params')
print(mod['main'])
fn (%input0: Tensor[(1, 784), float32] /* span=aten::linear_0.input0:0:0 */, %aten::linear_0.weight: Tensor[(300, 784), float32] /* span=aten::linear_0.weight:0:0 */, %aten::linear_0.bias: Tensor[(300), float32] /* span=aten::linear_0.bias:0:0 */, %aten::linear_1.weight: Tensor[(300, 300), float32] /* span=aten::linear_1.weight:0:0 */, %aten::linear_1.bias: Tensor[(300), float32] /* span=aten::linear_1.bias:0:0 */, %aten::linear_2.weight: Tensor[(300, 300), float32] /* span=aten::linear_2.weight:0:0 */, %aten::linear_2.bias: Tensor[(300), float32] /* span=aten::linear_2.bias:0:0 */, %aten::linear_3.weight: Tensor[(10, 300), float32] /* span=aten::linear_3.weight:0:0 */, %aten::linear_3.bias: Tensor[(10), float32] /* span=aten::linear_3.bias:0:0 */) { %0 = nn.dense(%input0, %aten::linear_0.weight, units=None) /* span=aten::linear_0:0:0 */; %1 = nn.bias_add(%0, %aten::linear_0.bias, axis=-1) /* span=aten::linear_0:0:0 */; %2 = nn.relu(%1) /* span=aten::relu_0:0:0 */; %3 = nn.dense(%2, %aten::linear_1.weight, units=None) /* span=aten::linear_1:0:0 */; %4 = nn.bias_add(%3, %aten::linear_1.bias, axis=-1) /* span=aten::linear_1:0:0 */; %5 = nn.relu(%4) /* span=aten::relu_1:0:0 */; %6 = nn.dense(%5, %aten::linear_2.weight, units=None) /* span=aten::linear_2:0:0 */; %7 = nn.bias_add(%6, %aten::linear_2.bias, axis=-1) /* span=aten::linear_2:0:0 */; %8 = nn.relu(%7) /* span=aten::relu_2:0:0 */; %9 = nn.dense(%8, %aten::linear_3.weight, units=None) /* span=aten::linear_3:0:0 */; nn.bias_add(%9, %aten::linear_3.bias, axis=-1) /* span=aten::linear_3:0:0 */ }
Следующий шаг - компиляция модели без оптимизации слоев.
with tvm.transform.PassContext(opt_level=opt_level):
lib = relay.build(mod, target=target, params=params)
После компиляции можно выполнить запуск вывода и измерение времени выполнения с использованием
разработанной функции timeit_inference
, а также проверку качества работы полносвязной нейронной сети
после загрузки с помощью функции get_accuracy
и сравнение полученной точности классификации
с загруженным значением, которое получено на x86-64.
default_fcnn_predict, default_fcnn_times = timeit_inference(mod, lib, images)
default_fcnn_accuracy = get_accuracy(labels, default_fcnn_predict)
assert np.allclose(metric['fcnn'], default_fcnn_accuracy, rtol=1e-5)
default_fcnn_time = np.median(default_fcnn_times)
print(f'Медианное время работы не оптимизированной модели: {default_fcnn_time:.4f} мc')
Медианное время работы не оптимизированной модели: 0.0401 мc
7.2. Использование возможностей AutoTVM¶
Вызовем разработанную функцию get_autotvm_task
для извлечения задач
из графа вычислений для AutoTVM.
В данном случае следовало бы ожидать 8 задач, так как есть 4 слоя. Но задач 6: 3 с трансформацией данных и 3 без трансформации данных. Два слоя имеют идентичные параметры, поэтому данные задачи нет необходимости дублировать. Аналогичное поведение будет и у других методов оптимизации слоев.
tasks = get_autotvm_task(mod, target, params)
Извлечение задач Номер задачи: 0 Информация о задаче: ('dense_nopack.x86', ('TENSOR', (1, 784), 'float32'), ('TENSOR', (300, 784), 'float32'), None, 'float32') Номер задачи: 1 Информация о задаче: ('dense_pack.x86', ('TENSOR', (1, 784), 'float32'), ('TENSOR', (300, 784), 'float32'), None, 'float32') Номер задачи: 2 Информация о задаче: ('dense_nopack.x86', ('TENSOR', (1, 300), 'float32'), ('TENSOR', (300, 300), 'float32'), None, 'float32') Номер задачи: 3 Информация о задаче: ('dense_pack.x86', ('TENSOR', (1, 300), 'float32'), ('TENSOR', (300, 300), 'float32'), None, 'float32') Номер задачи: 4 Информация о задаче: ('dense_nopack.x86', ('TENSOR', (1, 300), 'float32'), ('TENSOR', (10, 300), 'float32'), None, 'float32') Номер задачи: 5 Информация о задаче: ('dense_pack.x86', ('TENSOR', (1, 300), 'float32'), ('TENSOR', (10, 300), 'float32'), None, 'float32')
Для запуска оптимизации с помощью AutoTVM необходимо определить файл log_file
для логирования результатов оптимизации, установить число экспериментов при оптимизации,
а затем вызвать разработанную функцию tune_autotvm
.
log_file = 'autotvm/autotvm_fcnn.log'
n_trial = global_trial
tune_autotvm(tasks, n_trial, log_file)
[Task 2/ 6] Current/Best: 19.99/ 30.68 GFLOPS | Progress: (60/96) | 30.37 s Done. [Task 2/ 6] Current/Best: 25.21/ 31.85 GFLOPS | Progress: (96/96) | 47.14 s Done. [Task 3/ 6] Current/Best: 1.83/ 22.75 GFLOPS | Progress: (96/96) | 42.73 s Done. [Task 4/ 6] Current/Best: 11.52/ 22.01 GFLOPS | Progress: (96/96) | 41.10 s Done. [Task 5/ 6] Current/Best: 0.89/ 10.69 GFLOPS | Progress: (72/96) | 24.93 s Done. [Task 6/ 6] Current/Best: 4.44/ 13.13 GFLOPS | Progress: (96/96) | 31.13 s Done.
Перед использованием оптимизированной модели, необходимо выполнить компиляцию модели
с учетом истории оптимизации, которая была сохранена в файл log_file
.
with autotvm.apply_history_best(log_file):
with tvm.transform.PassContext(opt_level=opt_level):
lib = relay.build(mod, target=target, params=params)
На данном этапе можно выполнить измерение времени выполнения с использованием функции
timeit_inference
, проверку качества работы оптимизированной модели с помощью функции
get_accuracy
и сравнение точности классификации с рефенсным значением, которое было
получено после запуска обучения модели.
autotvm_fcnn_predict, autotvm_fcnn_times = timeit_inference(mod, lib, images)
autotvm_fcnn_accuracy = get_accuracy(labels, autotvm_fcnn_predict)
assert np.allclose(metric['fcnn'], autotvm_fcnn_accuracy, rtol=1e-5)
autotvm_fcnn_time = np.median(autotvm_fcnn_times)
print(f'Медианное время работы после оптимизации слоев с помощью AutoTVM: {autotvm_fcnn_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью AutoTVM: 0.0243 мc
7.3. Применение MetaScheduler¶
Вызовем разработанную функцию get_ms_task
, предварительно определив
директорию work_dir
для логирования результатов оптимизации.
В данном случае строка компиляции уже содержит информацию о числе потоков, поэтому модифицировать ее нет необходимости.
if is_x86():
work_dir = "meta_schedule_fcnn"
tasks, task_weights = get_ms_task(mod, target, params, opt_level, work_dir)
2024-11-06 18:07:51 [INFO] Logging directory: meta_schedule_fcnn/logs Номер задачи: 0 Информация о задаче: fused_nn_dense_add_nn_relu Номер задачи: 1 Информация о задаче: fused_nn_dense_add_nn_relu_1 Номер задачи: 2 Информация о задаче: fused_nn_dense_add
Далее выполним запуск оптимизации с помощью MetaScheduler посредством вызова
функции tune_ms
, установив число экспериментов при оптимизации равным
N * len(tasks)
.
n_trial_per_task = global_trial
if is_x86():
tune_ms(tasks, task_weights, work_dir, n_trial_per_task * len(tasks))
2024-11-06 18:07:51 [INFO] LocalBuilder: max_workers = 12 2024-11-06 18:07:51 [INFO] LocalRunner: max_workers = 1 2024-11-06 18:07:52 [INFO] [task_scheduler.cc:159] Initializing Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:07:52 [INFO] [task_scheduler.cc:159] Initializing Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:07:52 [INFO] [task_scheduler.cc:159] Initializing Task #2: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | N/A | N/A | N/A | 0 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | N/A | N/A | N/A | 0 | |
2 | fused_nn_dense_add | 6010 | 1 | N/A | N/A | N/A | 0 |
Total trials: 0 Total latency (us): 0 2024-11-06 18:07:52 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | N/A | N/A | N/A | 0 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | N/A | N/A | N/A | 0 | 2 | fused_nn_dense_add | 6010 | 1 | N/A | N/A | N/A | 0 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 0 Total latency (us): 0 2024-11-06 18:07:52 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:07:54 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:07:56 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:07:57 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:07:59 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:01 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:03 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_dense_add" 2024-11-06 18:08:04 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:06 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:08 [DEBUG] XGB iter 0: tr-p-rmse: 0.351465 tr-a-peak@32: 0.991026 tr-rmse: 0.366215 tr-rmse: 0.366215 2024-11-06 18:08:08 [DEBUG] XGB iter 25: tr-p-rmse: 0.040920 tr-a-peak@32: 0.997009 tr-rmse: 0.405606 tr-rmse: 0.405606 2024-11-06 18:08:08 [DEBUG] XGB iter 50: tr-p-rmse: 0.040911 tr-a-peak@32: 0.997009 tr-rmse: 0.405618 tr-rmse: 0.405618 2024-11-06 18:08:08 [DEBUG] XGB iter 75: tr-p-rmse: 0.040911 tr-a-peak@32: 0.997009 tr-rmse: 0.405618 tr-rmse: 0.405618 2024-11-06 18:08:08 [DEBUG] XGB stopped. Best iteration: [34] tr-p-rmse:0.04091 tr-a-peak@32:0.99701 tr-rmse:0.40562 tr-rmse:0.40562 2024-11-06 18:08:08 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 37.8599 | 12.4406 | 12.4406 | 8 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | N/A | N/A | N/A | 0 | |
2 | fused_nn_dense_add | 6010 | 1 | N/A | N/A | N/A | 0 |
2024-11-06 18:08:08 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 37.8599 | 12.4406 | 12.4406 | 8 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | N/A | N/A | N/A | 0 | 2 | fused_nn_dense_add | 6010 | 1 | N/A | N/A | N/A | 0 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 8 Total latency (us): 12.4406 Total trials: 8 Total latency (us): 12.4406 2024-11-06 18:08:08 [DEBUG] XGB iter 0: tr-p-rmse: 0.307720 tr-a-peak@32: 1.000000 tr-rmse: 0.334265 tr-rmse: 0.334265 2024-11-06 18:08:09 [DEBUG] XGB iter 25: tr-p-rmse: 0.033375 tr-a-peak@32: 1.000000 tr-rmse: 0.386839 tr-rmse: 0.386839 2024-11-06 18:08:09 [DEBUG] XGB iter 50: tr-p-rmse: 0.033347 tr-a-peak@32: 1.000000 tr-rmse: 0.386890 tr-rmse: 0.386890 2024-11-06 18:08:09 [DEBUG] XGB iter 75: tr-p-rmse: 0.033347 tr-a-peak@32: 1.000000 tr-rmse: 0.386890 tr-rmse: 0.386890 2024-11-06 18:08:09 [DEBUG] XGB stopped. Best iteration: [34] tr-p-rmse:0.03335 tr-a-peak@32:1.00000 tr-rmse:0.38689 tr-rmse:0.38689 2024-11-06 18:08:09 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 37.8599 | 12.4406 | 12.4406 | 8 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 8 | |
2 | fused_nn_dense_add | 6010 | 1 | N/A | N/A | N/A | 0 |
Total trials: 16 Total latency (us): 22.4224 2024-11-06 18:08:09 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 37.8599 | 12.4406 | 12.4406 | 8 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 8 | 2 | fused_nn_dense_add | 6010 | 1 | N/A | N/A | N/A | 0 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 16 Total latency (us): 22.4224 2024-11-06 18:08:09 [DEBUG] XGB iter 0: tr-p-rmse: 0.333048 tr-a-peak@32: 0.997774 tr-rmse: 0.385003 tr-rmse: 0.385003 2024-11-06 18:08:09 [DEBUG] XGB iter 25: tr-p-rmse: 0.028149 tr-a-peak@32: 1.000000 tr-rmse: 0.435107 tr-rmse: 0.435107 2024-11-06 18:08:09 [DEBUG] XGB iter 50: tr-p-rmse: 0.028147 tr-a-peak@32: 1.000000 tr-rmse: 0.435111 tr-rmse: 0.435111 2024-11-06 18:08:09 [DEBUG] XGB iter 75: tr-p-rmse: 0.028147 tr-a-peak@32: 1.000000 tr-rmse: 0.435111 tr-rmse: 0.435111 2024-11-06 18:08:09 [DEBUG] XGB stopped. Best iteration: [28] tr-p-rmse:0.02815 tr-a-peak@32:1.00000 tr-rmse:0.43511 tr-rmse:0.43511 2024-11-06 18:08:09 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 37.8599 | 12.4406 | 12.4406 | 8 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 8 | |
2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 |
Total trials: 24 Total latency (us): 25.768 2024-11-06 18:08:09 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 37.8599 | 12.4406 | 12.4406 | 8 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 8 | 2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 24 Total latency (us): 25.768 2024-11-06 18:08:09 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:08:10 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:12 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:14 [DEBUG] XGB validation: p-rmse: 0.610446 a-peak@32: 0.814860 2024-11-06 18:08:14 [DEBUG] XGB iter 0: tr-p-rmse: 0.386757 tr-a-peak@32: 0.968726 tr-rmse: 0.359414 tr-rmse: 0.359414 2024-11-06 18:08:14 [DEBUG] XGB iter 25: tr-p-rmse: 0.045522 tr-a-peak@32: 1.000000 tr-rmse: 0.410471 tr-rmse: 0.410471 2024-11-06 18:08:14 [DEBUG] XGB iter 50: tr-p-rmse: 0.045518 tr-a-peak@32: 1.000000 tr-rmse: 0.410478 tr-rmse: 0.410478 2024-11-06 18:08:14 [DEBUG] XGB iter 75: tr-p-rmse: 0.045518 tr-a-peak@32: 1.000000 tr-rmse: 0.410478 tr-rmse: 0.410478 2024-11-06 18:08:14 [DEBUG] XGB stopped. Best iteration: [29] tr-p-rmse:0.04552 tr-a-peak@32:1.00000 tr-rmse:0.41048 tr-rmse:0.41048 2024-11-06 18:08:14 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 16 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 8 | |
2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 |
2024-11-06 18:08:14 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 16 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 8 | 2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 32 Total latency (us): 20.2229 Total trials: 32 Total latency (us): 20.2229 2024-11-06 18:08:14 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:08:16 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:18 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:20 [DEBUG] XGB validation: p-rmse: 0.115511 a-peak@32: 1.000000 2024-11-06 18:08:20 [DEBUG] XGB iter 0: tr-p-rmse: 0.379134 tr-a-peak@32: 0.968726 tr-rmse: 0.337476 tr-rmse: 0.337476 2024-11-06 18:08:20 [DEBUG] XGB iter 25: tr-p-rmse: 0.046934 tr-a-peak@32: 1.000000 tr-rmse: 0.392350 tr-rmse: 0.392350 2024-11-06 18:08:21 [DEBUG] XGB iter 50: tr-p-rmse: 0.046930 tr-a-peak@32: 1.000000 tr-rmse: 0.392357 tr-rmse: 0.392357 2024-11-06 18:08:21 [DEBUG] XGB iter 75: tr-p-rmse: 0.046930 tr-a-peak@32: 1.000000 tr-rmse: 0.392357 tr-rmse: 0.392357 2024-11-06 18:08:21 [DEBUG] XGB stopped. Best iteration: [29] tr-p-rmse:0.04693 tr-a-peak@32:1.00000 tr-rmse:0.39236 tr-rmse:0.39236 2024-11-06 18:08:21 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 16 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 16 | |
2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 |
2024-11-06 18:08:21 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 16 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 16 | 2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 40 Total latency (us): 20.2229 Total trials: 40 Total latency (us): 20.2229 2024-11-06 18:08:21 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:08:22 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:24 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:26 [DEBUG] XGB validation: p-rmse: 0.254907 a-peak@32: 0.979337 2024-11-06 18:08:26 [DEBUG] XGB iter 0: tr-p-rmse: 0.383945 tr-a-peak@32: 0.968726 tr-rmse: 0.335318 tr-rmse: 0.335318 2024-11-06 18:08:26 [DEBUG] XGB iter 25: tr-p-rmse: 0.046007 tr-a-peak@32: 1.000000 tr-rmse: 0.387972 tr-rmse: 0.387972 2024-11-06 18:08:26 [DEBUG] XGB iter 50: tr-p-rmse: 0.045991 tr-a-peak@32: 1.000000 tr-rmse: 0.388001 tr-rmse: 0.388001 2024-11-06 18:08:26 [DEBUG] XGB iter 75: tr-p-rmse: 0.045991 tr-a-peak@32: 1.000000 tr-rmse: 0.388001 tr-rmse: 0.388001 2024-11-06 18:08:26 [DEBUG] XGB stopped. Best iteration: [32] tr-p-rmse:0.04599 tr-a-peak@32:1.00000 tr-rmse:0.38800 tr-rmse:0.38800 2024-11-06 18:08:26 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 16 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 24 | |
2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 |
Total trials: 48 Total latency (us): 20.2229 2024-11-06 18:08:26 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 16 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 24 | 2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 48 Total latency (us): 20.2229 2024-11-06 18:08:26 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:08:27 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:29 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:31 [DEBUG] XGB validation: p-rmse: 0.149560 a-peak@32: 0.985629 2024-11-06 18:08:31 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 24 | |
2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 |
2024-11-06 18:08:32 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 24 | 2 | fused_nn_dense_add | 6010 | 1 | 1.7964 | 3.3456 | 3.3456 | 8 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 56 Total latency (us): 20.2229 Total trials: 56 Total latency (us): 20.2229 2024-11-06 18:08:32 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_dense_add" 2024-11-06 18:08:33 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:35 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:37 [DEBUG] XGB validation: p-rmse: 3.034833 a-peak@32: 0.684020 2024-11-06 18:08:37 [DEBUG] XGB iter 0: tr-p-rmse: 0.564327 tr-a-peak@32: 0.861572 tr-rmse: 0.289912 tr-rmse: 0.289912 2024-11-06 18:08:37 [DEBUG] XGB iter 25: tr-p-rmse: 0.064114 tr-a-peak@32: 1.000000 tr-rmse: 0.336602 tr-rmse: 0.336602 2024-11-06 18:08:37 [DEBUG] XGB iter 50: tr-p-rmse: 0.064079 tr-a-peak@32: 1.000000 tr-rmse: 0.336649 tr-rmse: 0.336649 2024-11-06 18:08:37 [DEBUG] XGB iter 75: tr-p-rmse: 0.064079 tr-a-peak@32: 1.000000 tr-rmse: 0.336649 tr-rmse: 0.336649 2024-11-06 18:08:37 [DEBUG] XGB stopped. Best iteration: [33] tr-p-rmse:0.06408 tr-a-peak@32:1.00000 tr-rmse:0.33665 tr-rmse:0.33665 2024-11-06 18:08:37 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 24 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:08:37 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 36.1861 | 4.9909 | 9.9817 | 24 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 64 Total latency (us): 17.2641 Total trials: 64 Total latency (us): 17.2641 2024-11-06 18:08:37 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:08:39 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:40 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:42 [DEBUG] XGB validation: p-rmse: 0.239612 a-peak@32: 1.000000 2024-11-06 18:08:42 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 32 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 72 Total latency (us): 16.8238 2024-11-06 18:08:42 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 32 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 72 Total latency (us): 16.8238 2024-11-06 18:08:42 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:08:44 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:46 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:48 [DEBUG] XGB validation: p-rmse: 0.117211 a-peak@32: 1.000000 2024-11-06 18:08:48 [DEBUG] XGB iter 0: tr-p-rmse: 0.515154 tr-a-peak@32: 0.859875 tr-rmse: 0.298764 tr-rmse: 0.298764 2024-11-06 18:08:48 [DEBUG] XGB iter 25: tr-p-rmse: 0.059394 tr-a-peak@32: 0.999572 tr-rmse: 0.347265 tr-rmse: 0.347265 2024-11-06 18:08:48 [DEBUG] XGB iter 50: tr-p-rmse: 0.059326 tr-a-peak@32: 0.999572 tr-rmse: 0.347359 tr-rmse: 0.347359 2024-11-06 18:08:48 [DEBUG] XGB iter 75: tr-p-rmse: 0.059326 tr-a-peak@32: 0.999572 tr-rmse: 0.347359 tr-rmse: 0.347359 2024-11-06 18:08:48 [DEBUG] XGB stopped. Best iteration: [37] tr-p-rmse:0.05933 tr-a-peak@32:0.99957 tr-rmse:0.34736 tr-rmse:0.34736 2024-11-06 18:08:48 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 40 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:08:48 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 24 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 40 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 80 Total latency (us): 16.8238 Total trials: 80 Total latency (us): 16.8238 2024-11-06 18:08:48 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:08:50 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:51 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:53 [DEBUG] XGB validation: p-rmse: 0.129843 a-peak@32: 1.000000 2024-11-06 18:08:53 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 32 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 40 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 88 Total latency (us): 16.8238 2024-11-06 18:08:53 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 32 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 40 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 88 Total latency (us): 16.8238 2024-11-06 18:08:53 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:08:55 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:08:57 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:08:59 [DEBUG] XGB validation: p-rmse: 0.124433 a-peak@32: 1.000000 2024-11-06 18:08:59 [DEBUG] XGB iter 0: tr-p-rmse: 0.504882 tr-a-peak@32: 0.861572 tr-rmse: 0.291846 tr-rmse: 0.291846 2024-11-06 18:08:59 [DEBUG] XGB iter 25: tr-p-rmse: 0.055972 tr-a-peak@32: 0.999786 tr-rmse: 0.342515 tr-rmse: 0.342515 2024-11-06 18:08:59 [DEBUG] XGB iter 50: tr-p-rmse: 0.055959 tr-a-peak@32: 0.999786 tr-rmse: 0.342533 tr-rmse: 0.342533 2024-11-06 18:08:59 [DEBUG] XGB iter 75: tr-p-rmse: 0.055959 tr-a-peak@32: 0.999786 tr-rmse: 0.342533 tr-rmse: 0.342533 2024-11-06 18:08:59 [DEBUG] XGB stopped. Best iteration: [34] tr-p-rmse:0.05596 tr-a-peak@32:0.99979 tr-rmse:0.34253 tr-rmse:0.34253 2024-11-06 18:08:59 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 40 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 40 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 96 Total latency (us): 16.8238 2024-11-06 18:08:59 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 40 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 40 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 96 Total latency (us): 16.8238 2024-11-06 18:08:59 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:09:00 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:09:02 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:09:05 [DEBUG] XGB validation: p-rmse: 0.104376 a-peak@32: 1.000000 2024-11-06 18:09:05 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 40 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 48 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 104 Total latency (us): 16.8238 2024-11-06 18:09:05 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 40 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 48 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 104 Total latency (us): 16.8238 2024-11-06 18:09:05 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:09:08 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:09:16 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:09:18 [DEBUG] XGB validation: p-rmse: 0.182499 a-peak@32: 0.992085 2024-11-06 18:09:18 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 40 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 56 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:09:18 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 68.3047 | 6.8956 | 6.8956 | 40 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 56 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 112 Total latency (us): 16.8238 Total trials: 112 Total latency (us): 16.8238 2024-11-06 18:09:18 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:09:22 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:09:23 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:09:26 [DEBUG] XGB validation: p-rmse: 0.115199 a-peak@32: 0.998358 2024-11-06 18:09:26 [DEBUG] XGB iter 0: tr-p-rmse: 0.466295 tr-a-peak@32: 0.861331 tr-rmse: 0.324392 tr-rmse: 0.324392 2024-11-06 18:09:26 [DEBUG] XGB iter 25: tr-p-rmse: 0.054985 tr-a-peak@32: 1.000000 tr-rmse: 0.368779 tr-rmse: 0.368779 2024-11-06 18:09:26 [DEBUG] XGB iter 50: tr-p-rmse: 0.054976 tr-a-peak@32: 1.000000 tr-rmse: 0.368793 tr-rmse: 0.368793 2024-11-06 18:09:26 [DEBUG] XGB iter 75: tr-p-rmse: 0.054976 tr-a-peak@32: 1.000000 tr-rmse: 0.368793 tr-rmse: 0.368793 2024-11-06 18:09:26 [DEBUG] XGB stopped. Best iteration: [33] tr-p-rmse:0.05498 tr-a-peak@32:1.00000 tr-rmse:0.36879 tr-rmse:0.36879 2024-11-06 18:09:26 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 69.7673 | 6.7510 | 6.7510 | 48 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 56 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:09:26 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 69.7673 | 6.7510 | 6.7510 | 48 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 37.8558 | 4.7707 | 9.5415 | 56 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 120 Total latency (us): 16.6793 Total trials: 120 Total latency (us): 16.6793 2024-11-06 18:09:26 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:09:29 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:09:31 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:09:33 [DEBUG] XGB validation: p-rmse: 0.055297 a-peak@32: 0.986657 2024-11-06 18:09:33 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 69.7673 | 6.7510 | 6.7510 | 48 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 39.2438 | 4.6020 | 9.2040 | 64 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:09:33 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 69.7673 | 6.7510 | 6.7510 | 48 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 39.2438 | 4.6020 | 9.2040 | 64 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 128 Total latency (us): 16.3418 Total trials: 128 Total latency (us): 16.3418 2024-11-06 18:09:33 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:09:37 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:09:39 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:09:41 [DEBUG] XGB validation: p-rmse: 0.095030 a-peak@32: 0.996017 2024-11-06 18:09:41 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 69.7673 | 6.7510 | 6.7510 | 48 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 39.2438 | 4.6020 | 9.2040 | 72 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 136 Total latency (us): 16.3418 2024-11-06 18:09:41 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 69.7673 | 6.7510 | 6.7510 | 48 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 39.2438 | 4.6020 | 9.2040 | 72 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 136 Total latency (us): 16.3418 2024-11-06 18:09:41 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:09:44 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:09:46 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:09:48 [DEBUG] XGB validation: p-rmse: 0.121401 a-peak@32: 0.954176 2024-11-06 18:09:48 [DEBUG] XGB iter 0: tr-p-rmse: 0.428651 tr-a-peak@32: 0.861572 tr-rmse: 0.349399 tr-rmse: 0.349399 2024-11-06 18:09:48 [DEBUG] XGB iter 25: tr-p-rmse: 0.052485 tr-a-peak@32: 1.000000 tr-rmse: 0.389504 tr-rmse: 0.389504 2024-11-06 18:09:48 [DEBUG] XGB iter 50: tr-p-rmse: 0.052480 tr-a-peak@32: 1.000000 tr-rmse: 0.389515 tr-rmse: 0.389515 2024-11-06 18:09:48 [DEBUG] XGB iter 75: tr-p-rmse: 0.052480 tr-a-peak@32: 1.000000 tr-rmse: 0.389515 tr-rmse: 0.389515 2024-11-06 18:09:48 [DEBUG] XGB stopped. Best iteration: [29] tr-p-rmse:0.05248 tr-a-peak@32:1.00000 tr-rmse:0.38951 tr-rmse:0.38951 2024-11-06 18:09:48 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 56 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 39.2438 | 4.6020 | 9.2040 | 72 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:09:48 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 56 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 39.2438 | 4.6020 | 9.2040 | 72 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 144 Total latency (us): 16.1197 Total trials: 144 Total latency (us): 16.1197 2024-11-06 18:09:48 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:09:52 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:09:54 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:09:56 [DEBUG] XGB validation: p-rmse: 0.128095 a-peak@32: 0.918648 2024-11-06 18:09:56 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 56 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 80 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 152 Total latency (us): 15.6782 2024-11-06 18:09:56 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 56 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 80 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 152 Total latency (us): 15.6782 2024-11-06 18:09:56 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:10:00 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:01 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:04 [DEBUG] XGB validation: p-rmse: 0.144623 a-peak@32: 1.000000 2024-11-06 18:10:04 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 64 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 80 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 160 Total latency (us): 15.6782 2024-11-06 18:10:04 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 64 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 80 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 160 Total latency (us): 15.6782 2024-11-06 18:10:04 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:10:07 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:09 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:11 [DEBUG] XGB validation: p-rmse: 0.205111 a-peak@32: 1.000000 2024-11-06 18:10:11 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 64 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 88 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:10:11 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 72.1406 | 6.5289 | 6.5289 | 64 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 88 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 168 Total latency (us): 15.6782 Total trials: 168 Total latency (us): 15.6782 2024-11-06 18:10:11 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:10:15 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:16 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:19 [DEBUG] XGB validation: p-rmse: 0.095756 a-peak@32: 0.990469 2024-11-06 18:10:19 [DEBUG] XGB iter 0: tr-p-rmse: 0.400105 tr-a-peak@32: 0.861572 tr-rmse: 0.353001 tr-rmse: 0.353001 2024-11-06 18:10:19 [DEBUG] XGB iter 25: tr-p-rmse: 0.052631 tr-a-peak@32: 0.997970 tr-rmse: 0.395233 tr-rmse: 0.395233 2024-11-06 18:10:19 [DEBUG] XGB iter 50: tr-p-rmse: 0.052628 tr-a-peak@32: 0.997970 tr-rmse: 0.395241 tr-rmse: 0.395241 2024-11-06 18:10:19 [DEBUG] XGB iter 75: tr-p-rmse: 0.052628 tr-a-peak@32: 0.997970 tr-rmse: 0.395241 tr-rmse: 0.395241 2024-11-06 18:10:19 [DEBUG] XGB stopped. Best iteration: [29] tr-p-rmse:0.05263 tr-a-peak@32:0.99797 tr-rmse:0.39524 tr-rmse:0.39524 2024-11-06 18:10:19 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.0748 | 6.4455 | 6.4455 | 72 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 88 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 176 Total latency (us): 15.5947 2024-11-06 18:10:19 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.0748 | 6.4455 | 6.4455 | 72 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.2211 | 4.3812 | 8.7625 | 88 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 176 Total latency (us): 15.5947 2024-11-06 18:10:19 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:10:22 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:24 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:26 [DEBUG] XGB validation: p-rmse: 0.079819 a-peak@32: 0.998800 2024-11-06 18:10:26 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.0748 | 6.4455 | 6.4455 | 72 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 96 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 184 Total latency (us): 15.5244 2024-11-06 18:10:26 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.0748 | 6.4455 | 6.4455 | 72 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 96 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 184 Total latency (us): 15.5244 2024-11-06 18:10:26 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:10:30 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:32 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:34 [DEBUG] XGB validation: p-rmse: 0.165400 a-peak@32: 0.995687 2024-11-06 18:10:34 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.0748 | 6.4455 | 6.4455 | 72 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 104 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 192 Total latency (us): 15.5244 2024-11-06 18:10:34 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.0748 | 6.4455 | 6.4455 | 72 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 104 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 192 Total latency (us): 15.5244 2024-11-06 18:10:34 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:10:38 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:39 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:42 [DEBUG] XGB validation: p-rmse: 0.038273 a-peak@32: 0.999526 2024-11-06 18:10:42 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 80 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 104 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:10:42 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 80 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 104 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 200 Total latency (us): 15.4949 Total trials: 200 Total latency (us): 15.4949 2024-11-06 18:10:42 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:10:45 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:47 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:49 [DEBUG] XGB validation: p-rmse: 0.049444 a-peak@32: 1.000000 2024-11-06 18:10:49 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 80 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 112 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:10:49 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 80 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 112 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 208 Total latency (us): 15.4949 Total trials: 208 Total latency (us): 15.4949 2024-11-06 18:10:49 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:10:53 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:10:54 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:10:57 [DEBUG] XGB validation: p-rmse: 0.174322 a-peak@32: 0.973278 2024-11-06 18:10:57 [DEBUG] XGB iter 0: tr-p-rmse: 0.366311 tr-a-peak@32: 0.861155 tr-rmse: 0.375179 tr-rmse: 0.375179 2024-11-06 18:10:57 [DEBUG] XGB iter 25: tr-p-rmse: 0.055858 tr-a-peak@32: 0.998764 tr-rmse: 0.412615 tr-rmse: 0.412615 2024-11-06 18:10:57 [DEBUG] XGB iter 50: tr-p-rmse: 0.055856 tr-a-peak@32: 0.998764 tr-rmse: 0.412620 tr-rmse: 0.412620 2024-11-06 18:10:57 [DEBUG] XGB iter 75: tr-p-rmse: 0.055856 tr-a-peak@32: 0.998764 tr-rmse: 0.412620 tr-rmse: 0.412620 2024-11-06 18:10:57 [DEBUG] XGB stopped. Best iteration: [27] tr-p-rmse:0.05586 tr-a-peak@32:0.99876 tr-rmse:0.41262 tr-rmse:0.41262 2024-11-06 18:10:57 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 88 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 112 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 216 Total latency (us): 15.4949 2024-11-06 18:10:57 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 88 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 112 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 216 Total latency (us): 15.4949 2024-11-06 18:10:57 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:11:00 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:03 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:11:06 [DEBUG] XGB validation: p-rmse: 0.131449 a-peak@32: 0.963567 2024-11-06 18:11:06 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 88 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 120 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 224 Total latency (us): 15.4949 2024-11-06 18:11:06 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 88 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 120 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 224 Total latency (us): 15.4949 2024-11-06 18:11:06 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:11:09 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:11 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:11:13 [DEBUG] XGB validation: p-rmse: 0.032458 a-peak@32: 0.999632 2024-11-06 18:11:13 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 96 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 120 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 232 Total latency (us): 15.4949 2024-11-06 18:11:13 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 96 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 120 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 232 Total latency (us): 15.4949 2024-11-06 18:11:13 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:11:17 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:19 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:11:20 [DEBUG] XGB validation: p-rmse: 0.171026 a-peak@32: 0.999270 2024-11-06 18:11:20 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 96 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 128 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:11:21 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 96 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 128 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 240 Total latency (us): 15.4949 Total trials: 240 Total latency (us): 15.4949 2024-11-06 18:11:21 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:11:24 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:26 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:11:28 [DEBUG] XGB validation: p-rmse: 0.121831 a-peak@32: 1.000000 2024-11-06 18:11:28 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 96 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 136 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:11:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 96 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 136 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 248 Total latency (us): 15.4949 Total trials: 248 Total latency (us): 15.4949 2024-11-06 18:11:28 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:11:32 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:33 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:11:36 [DEBUG] XGB validation: p-rmse: 0.037096 a-peak@32: 0.986682 2024-11-06 18:11:36 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 104 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 136 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 256 Total latency (us): 15.4949 2024-11-06 18:11:36 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 104 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 136 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 256 Total latency (us): 15.4949 2024-11-06 18:11:36 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:11:40 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:42 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:11:44 [DEBUG] XGB validation: p-rmse: 0.127470 a-peak@32: 0.969532 2024-11-06 18:11:44 [DEBUG] XGB iter 0: tr-p-rmse: 0.335610 tr-a-peak@32: 0.858004 tr-rmse: 0.392716 tr-rmse: 0.392716 2024-11-06 18:11:44 [DEBUG] XGB iter 25: tr-p-rmse: 0.061349 tr-a-peak@32: 0.999599 tr-rmse: 0.426219 tr-rmse: 0.426219 2024-11-06 18:11:44 [DEBUG] XGB iter 50: tr-p-rmse: 0.061347 tr-a-peak@32: 0.999599 tr-rmse: 0.426224 tr-rmse: 0.426224 2024-11-06 18:11:44 [DEBUG] XGB iter 75: tr-p-rmse: 0.061347 tr-a-peak@32: 0.999599 tr-rmse: 0.426224 tr-rmse: 0.426224 2024-11-06 18:11:44 [DEBUG] XGB stopped. Best iteration: [34] tr-p-rmse:0.06135 tr-a-peak@32:0.99960 tr-rmse:0.42622 tr-rmse:0.42622 2024-11-06 18:11:44 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 104 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 144 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:11:44 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 104 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 144 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 264 Total latency (us): 15.4949 Total trials: 264 Total latency (us): 15.4949 2024-11-06 18:11:44 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:11:48 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:50 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:11:52 [DEBUG] XGB validation: p-rmse: 0.028990 a-peak@32: 0.999826 2024-11-06 18:11:52 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 112 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 144 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:11:52 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 112 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 144 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 272 Total latency (us): 15.4949 Total trials: 272 Total latency (us): 15.4949 2024-11-06 18:11:52 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_dense_add_nn_relu_1" 2024-11-06 18:11:55 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:11:57 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:12:00 [DEBUG] XGB validation: p-rmse: 0.029809 a-peak@32: 1.000000 2024-11-06 18:12:00 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_dense_add_nn_relu_1"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 112 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 280 Total latency (us): 15.4949 2024-11-06 18:12:00 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 112 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 280 Total latency (us): 15.4949 2024-11-06 18:12:00 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_nn_dense_add_nn_relu" 2024-11-06 18:12:03 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:12:05 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:12:07 [DEBUG] XGB validation: p-rmse: 0.069612 a-peak@32: 0.975988 2024-11-06 18:12:07 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_nn_dense_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 288 Total latency (us): 15.4949 2024-11-06 18:12:07 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 288 Total latency (us): 15.4949 2024-11-06 18:12:07 [INFO] [task_scheduler.cc:260] Task #0 has finished. Remaining task(s): 2
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | Y |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
2024-11-06 18:12:07 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | Y 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 288 Total latency (us): 15.4949 Total trials: 288 Total latency (us): 15.4949 2024-11-06 18:12:07 [INFO] [task_scheduler.cc:260] Task #1 has finished. Remaining task(s): 1
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | Y |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | Y |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 |
Total trials: 288 Total latency (us): 15.4949 2024-11-06 18:12:07 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | Y 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | Y 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | ----------------------------------------------------------------------------------------------------------------------------- Total trials: 288 Total latency (us): 15.4949 2024-11-06 18:12:07 [INFO] [task_scheduler.cc:260] Task #2 has finished. Remaining task(s): 0
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | Y |
1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | Y |
2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | Y |
2024-11-06 18:12:07 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ----------------------------------------------------------------------------------------------------------------------------- 0 | fused_nn_dense_add_nn_relu | 471000 | 1 | 73.4099 | 6.4160 | 6.4160 | 120 | Y 1 | fused_nn_dense_add_nn_relu_1 | 180600 | 2 | 41.5548 | 4.3461 | 8.6921 | 152 | Y 2 | fused_nn_dense_add | 6010 | 1 | 15.5394 | 0.3868 | 0.3868 | 16 | Y ----------------------------------------------------------------------------------------------------------------------------- Total trials: 288 Total latency (us): 15.4949 Total trials: 288 Total latency (us): 15.4949
После оптимизации можно скомпилировать нейронную с учетом построенных оптимизаций
с помощью интерфейса MetaScheduler ms.relay_integration.compile_relay
.
if is_x86():
database = ms.database.JSONDatabase(
f"{work_dir}/database_workload.json",
f"{work_dir}/database_tuning_record.json",
allow_missing=False
)
lib = ms.relay_integration.compile_relay(
database, mod, target, params,
opt_level=opt_level,
)
В завершении измерим время вывода с использованием функции timeit_inference
,
определим качество работы модели с помощью функции get_accuracy
и выполним
проверку корректности работы оптимизированной модели, сравнив полученное значение
показателя точности с референсным.
if is_x86():
ms_fcnn_predict, ms_fcnn_times = timeit_inference(mod, lib, images)
ms_fcnn_accuracy = get_accuracy(labels, ms_fcnn_predict)
assert np.allclose(metric['fcnn'], ms_fcnn_accuracy, rtol=1e-5)
ms_fcnn_time = np.median(ms_fcnn_times)
print(f'Медианное время работы после оптимизации слоев с помощью MetaScheduler: {ms_fcnn_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью MetaScheduler: 0.0238 мc
7.4. Анализ результатов¶
Для анализа результатов оптимизации нейронной сети с использованием различных методов построим прафик медианного времени выполнения.
fig, ax = plt.subplots()
name = ['Без оптимизации\nслоев', 'AutoTVM', 'MetaScheduler']
times = [default_fcnn_time, autotvm_fcnn_time, ms_fcnn_time]
bars = ax.bar(name, times, label=name, color=bar_colors)
ax.set_title('Среднее время\nвыполнения (мс)', fontsize=18)
for bar, n, t in zip(bars, name, times):
h = bar.get_height()
if n == 'Без оптимизации\nслоев': h = h / 2
if h != 0:
ax.text(
bar.get_x() + bar.get_width() / 2,
h,
f'{round(t, 4)} с',
ha='center',
va='bottom',
fontsize=15,
)
ax.xaxis.label.set_size(40)
ax.set_title('Среднее время\nвыполнения (с)', fontsize=18)
plt.grid()
Вывод: оптимизация значительно ускоряет время работы сети.
default_cnn_time, autotvm_cnn_time, autoscheduler_cnn_time, ms_cnn_time = 0, 0, 0, 0
mod, params = load_model('model/cnn.json', 'model/cnn.params')
print(mod['main'])
fn (%input0: Tensor[(1, 1, 28, 28), float32] /* span=aten::_convolution_0.input0:0:0 */, %aten::_convolution_0.weight: Tensor[(64, 1, 3, 3), float32] /* span=aten::_convolution_0.weight:0:0 */, %aten::_convolution_0.bias: Tensor[(64), float32] /* span=aten::_convolution_0.bias:0:0 */, %aten::linear_0.weight: Tensor[(10, 12544), float32] /* span=aten::linear_0.weight:0:0 */, %aten::linear_0.bias: Tensor[(10), float32] /* span=aten::linear_0.bias:0:0 */) { %0 = nn.conv2d(%input0, %aten::_convolution_0.weight, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* span=aten::_convolution_0:0:0 */; %1 = nn.bias_add(%0, %aten::_convolution_0.bias) /* span=aten::_convolution_0:0:0 */; %2 = nn.relu(%1) /* span=aten::relu_0:0:0 */; %3 = nn.max_pool2d(%2, pool_size=[2, 2], strides=[2, 2], padding=[0, 0, 0, 0]) /* span=aten::max_pool2d_0:0:0 */; %4 = reshape(%3, newshape=[1, -1]) /* span=aten::view_0:0:0 */; %5 = nn.dense(%4, %aten::linear_0.weight, units=None) /* span=aten::linear_0:0:0 */; nn.bias_add(%5, %aten::linear_0.bias, axis=-1) /* span=aten::linear_0:0:0 */ }
Следующий шаг - компиляция модели без оптимизации слоев.
with tvm.transform.PassContext(opt_level=opt_level):
lib = relay.build(mod, target=target, params=params)
После компиляции можно выполнить запуск вывода и измерение времени выполнения с использованием
разработанной функции timeit_inference
, а также проверку качества работы полносвязной нейронной сети
после загрузки с помощью функции get_accuracy
и сравнение полученной точности классификации
с загруженным значением, которое получено на x86-64.
default_cnn_predict, default_cnn_times = timeit_inference(mod, lib, images)
default_cnn_accuracy = get_accuracy(labels, default_cnn_predict)
assert np.allclose(metric['cnn'], default_cnn_accuracy, rtol=1e-5)
default_cnn_time = np.median(default_cnn_times)
print(f'Медианное время работы не оптимизированной модели: {default_cnn_time:.4f} мc')
Медианное время работы не оптимизированной модели: 0.0644 мc
8.2. Использование возможностей AutoTVM¶
Вызовем разработанную функцию get_autotvm_task
для извлечения задач
из графа вычислений для AutoTVM.
В данном случае к задачам с полносвязным слоем добавляется задача со сверточным слоем.
if is_x86():
tasks = get_autotvm_task(mod, target, params)
Извлечение задач Номер задачи: 0 Информация о задаче: ('conv2d_NCHWc.x86', ('TENSOR', (1, 1, 28, 28), 'float32'), ('TENSOR', (64, 1, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'NCHW', 'float32') Номер задачи: 1 Информация о задаче: ('dense_nopack.x86', ('TENSOR', (1, 12544), 'float32'), ('TENSOR', (10, 12544), 'float32'), None, 'float32') Номер задачи: 2 Информация о задаче: ('dense_pack.x86', ('TENSOR', (1, 12544), 'float32'), ('TENSOR', (10, 12544), 'float32'), None, 'float32')
Для запуска оптимизации с помощью AutoTVM необходимо определить файл log_file
для логирования результатов оптимизации, установить число экспериментов при оптимизации,
а затем вызвать разработанную функцию tune_autotvm
.
log_file = 'autotvm/autotvm_cnn.log'
n_trial = global_trial
if is_x86():
tune_autotvm(tasks, n_trial, log_file)
[Task 1/ 3] Current/Best: 71.28/ 80.80 GFLOPS | Progress: (96/96) | 35.12 s Done. [Task 3/ 3] Current/Best: 9.54/ 16.49 GFLOPS | Progress: (60/96) | 60.29 s Done. [Task 3/ 3] Current/Best: 12.25/ 16.49 GFLOPS | Progress: (96/96) | 85.73 s Done.
Перед использованием оптимизированной модели, необходимо выполнить компиляцию модели
с учетом истории оптимизации, которая была сохранена в файл log_file
.
if is_x86():
with autotvm.apply_history_best(log_file):
with tvm.transform.PassContext(opt_level=opt_level):
lib = relay.build(mod, target=target, params=params)
На данном этапе можно выполнить измерение времени выполнения с использованием функции
timeit_inference
, проверку качества работы оптимизированной модели с помощью функции
get_accuracy
и сравнение точности классификации с рефенсным значением, которое было
получено после запуска обучения модели.
if is_x86():
autotvm_cnn_predict, autotvm_cnn_times = timeit_inference(mod, lib, images)
autotvm_cnn_accuracy = get_accuracy(labels, autotvm_cnn_predict)
assert np.allclose(metric['cnn'], autotvm_cnn_accuracy, rtol=1e-5)
autotvm_cnn_time = np.median(autotvm_cnn_times)
print(f'Медианное время работы после оптимизации слоев с помощью AutoTVM: {autotvm_cnn_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью AutoTVM: 0.0508 мc
8.3. Использование Auto-scheduler¶
Вызовем разработанную функцию get_auto_scheduler_task
для извлечения задач
из графа вычислений для AutoTVM.
if is_x86():
tasks, task_weights = get_auto_scheduler_task(mod, target, params, opt_level)
Номер задачи: 0 Информация о задаче: vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu Номер задачи: 1 Информация о задаче: vm_mod_fused_nn_dense_add Номер задачи: 2 Информация о задаче: vm_mod_fused_nn_max_pool2d
Для запуска оптимизации с помощью AutoTVM необходимо определить файл log_file
для логирования результатов оптимизации, установить число экспериментов при оптимизации,
а затем вызвать разработанную функцию tune_auto_scheduler
.
os.makedirs('auto_schedule/', exist_ok=True)
log_file = 'auto_schedule/auto-schedule_cnn.log'
n_trial_per_task = global_trial
if is_x86():
tune_auto_scheduler(tasks, task_weights, log_file, n_trial_per_task * len(tasks))
| ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials |---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | - | - | 0 | | 1 | vm_mod_fused_nn_dense_add | - | - | 0 | | 2 | vm_mod_fused_nn_max_pool2d | - | - | 0 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: - ms Trials: 0 Used time : 0 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Generate Sketches #s: 3 Sample Initial Population #s: 1785 fail_ct: 0 Time elapsed: 3.63 GA Iter: 0 Max score: 0.9987 Min score: 0.9908 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9996 Min score: 0.9986 #Pop: 16 #M+: 1385 #M-: 36 EvolutionarySearch #s: 16 Time elapsed: 17.56 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 3.67 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 8 | | 1 | vm_mod_fused_nn_dense_add | - | - | 0 | | 2 | vm_mod_fused_nn_max_pool2d | - | - | 0 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: - ms Trials: 8 Used time : 25 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Generate Sketches #s: 5 Sample Initial Population #s: 989 fail_ct: 626 Time elapsed: 0.75 GA Iter: 0 Max score: 0.9995 Min score: 0.9803 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9995 Min score: 0.9982 #Pop: 16 #M+: 1373 #M-: 69 EvolutionarySearch #s: 16 Time elapsed: 3.35 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: .....E.E.E.E**** Time elapsed for measurement: 2.41 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 8 | | 1 | vm_mod_fused_nn_dense_add | 0.022 | 11.17 | 8 | | 2 | vm_mod_fused_nn_max_pool2d | - | - | 0 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: - ms Trials: 16 Used time : 32 s Next ID: 2 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Generate Sketches #s: 1 Sample Iter: 5 #Pop: 4 #Target: 50 fail_ct: 10236 Time elapsed: 2.63 #Target has been reduced to 25 due to too many failures or duplications Sample Iter: 10 #Pop: 4 #Target: 25 fail_ct: 20476 Time elapsed: 5.26 #Target has been reduced to 12 due to too many failures or duplications Sample Iter: 15 #Pop: 4 #Target: 12 fail_ct: 30716 Time elapsed: 7.88 #Target has been reduced to 6 due to too many failures or duplications Sample Iter: 20 #Pop: 4 #Target: 6 fail_ct: 40956 Time elapsed: 10.51 #Target has been reduced to 3 due to too many failures or duplications Sample Initial Population #s: 4 fail_ct: 43004 Time elapsed: 11.04 GA Iter: 0 Max score: 0.8313 Min score: 0.4427 #Pop: 4 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9996 Min score: 0.9047 #Pop: 16 #M+: 354 #M-: 6816 EvolutionarySearch #s: 16 Time elapsed: 1.93 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.67 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 8 | | 1 | vm_mod_fused_nn_dense_add | 0.022 | 11.17 | 8 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 12.12 | 8 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.041 ms Trials: 24 Used time : 47 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1796 fail_ct: 0 Time elapsed: 3.66 GA Iter: 0 Max score: 0.9998 Min score: 0.9923 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 1.0000 Min score: 0.9991 #Pop: 16 #M+: 1383 #M-: 40 EvolutionarySearch #s: 16 Time elapsed: 17.65 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 3.24 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 16 | | 1 | vm_mod_fused_nn_dense_add | 0.022 | 11.17 | 8 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 12.12 | 8 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.041 ms Trials: 32 Used time : 72 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 993 fail_ct: 613 Time elapsed: 0.75 GA Iter: 0 Max score: 0.9746 Min score: 0.8838 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 1.6343 Min score: 1.3069 #Pop: 16 #M+: 1374 #M-: 78 EvolutionarySearch #s: 16 Time elapsed: 3.36 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.50 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 16 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 16 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 12.12 | 8 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.039 ms Trials: 40 Used time : 79 s Next ID: 2 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 4 fail_ct: 2044 Time elapsed: 0.54 GA Iter: 0 Max score: 0.5743 Min score: 0.5743 #Pop: 2 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9231 Min score: 0.2326 #Pop: 12 #M+: 341 #M-: 7000 EvolutionarySearch #s: 12 Time elapsed: 2.07 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.69 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 16 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 16 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 16 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.038 ms Trials: 48 Used time : 84 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1760 fail_ct: 0 Time elapsed: 3.65 GA Iter: 0 Max score: 0.9608 Min score: 0.8305 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 1.1595 Min score: 1.0566 #Pop: 16 #M+: 1385 #M-: 36 EvolutionarySearch #s: 16 Time elapsed: 17.91 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.91 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 24 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 16 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 16 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.038 ms Trials: 56 Used time : 108 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1011 fail_ct: 589 Time elapsed: 0.75 GA Iter: 0 Max score: 0.9508 Min score: 0.8637 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9968 Min score: 0.9424 #Pop: 16 #M+: 1390 #M-: 74 EvolutionarySearch #s: 16 Time elapsed: 3.59 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.45 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 24 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 24 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 16 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.038 ms Trials: 64 Used time : 115 s Next ID: 2 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 4 fail_ct: 2044 Time elapsed: 0.55 GA Iter: 0 Max score: N/A Min score: N/A #Pop: 0 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.5409 Min score: 0.3118 #Pop: 4 #M+: 350 #M-: 6927 EvolutionarySearch #s: 4 Time elapsed: 2.12 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 4 programs to measure: ....**** Time elapsed for measurement: 1.72 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.014 | 71.28 | 24 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 24 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 24 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.038 ms Trials: 68 Used time : 120 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1790 fail_ct: 0 Time elapsed: 3.82 GA Iter: 0 Max score: 0.8568 Min score: 0.7690 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9531 Min score: 0.8350 #Pop: 16 #M+: 1378 #M-: 44 EvolutionarySearch #s: 16 Time elapsed: 18.60 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.54 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.012 | 82.61 | 32 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 24 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 24 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.037 ms Trials: 76 Used time : 145 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1015 fail_ct: 605 Time elapsed: 0.77 GA Iter: 0 Max score: 0.9435 Min score: 0.8916 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9504 Min score: 0.9210 #Pop: 16 #M+: 1388 #M-: 69 EvolutionarySearch #s: 16 Time elapsed: 3.59 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.58 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.012 | 82.61 | 32 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 32 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 24 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.037 ms Trials: 84 Used time : 152 s Next ID: 2 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 4 fail_ct: 2044 Time elapsed: 0.57 GA Iter: 0 Max score: N/A Min score: N/A #Pop: 0 #M+: 0 #M-: 0 GA Iter: 4 Max score: N/A Min score: N/A #Pop: 0 #M+: 356 #M-: 6909 EvolutionarySearch #s: 0 Time elapsed: 2.05 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 0 programs to measure: Time elapsed for measurement: 0.00 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.00 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.012 | 82.61 | 32 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 32 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.037 ms Trials: 84 Used time : 154 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1778 fail_ct: 0 Time elapsed: 3.64 GA Iter: 0 Max score: 0.8297 Min score: 0.6632 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.8921 Min score: 0.8259 #Pop: 16 #M+: 1397 #M-: 45 EvolutionarySearch #s: 16 Time elapsed: 17.75 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.58 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.011 | 95.29 | 40 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 32 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.035 ms Trials: 92 Used time : 179 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1045 fail_ct: 592 Time elapsed: 0.75 GA Iter: 0 Max score: 0.9027 Min score: 0.8365 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9494 Min score: 0.9037 #Pop: 16 #M+: 1385 #M-: 73 EvolutionarySearch #s: 16 Time elapsed: 3.41 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.48 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.04 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.011 | 95.29 | 40 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 40 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.035 ms Trials: 100 Used time : 185 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1799 fail_ct: 0 Time elapsed: 3.69 GA Iter: 0 Max score: 0.7539 Min score: 0.6363 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.8545 Min score: 0.7825 #Pop: 16 #M+: 1381 #M-: 46 EvolutionarySearch #s: 16 Time elapsed: 17.88 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.76 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.011 | 95.29 | 48 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 40 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.035 ms Trials: 108 Used time : 210 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1017 fail_ct: 608 Time elapsed: 0.76 GA Iter: 0 Max score: 0.8832 Min score: 0.8387 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9849 Min score: 0.8709 #Pop: 16 #M+: 1377 #M-: 69 EvolutionarySearch #s: 16 Time elapsed: 3.48 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.49 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.011 | 95.29 | 48 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 48 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.035 ms Trials: 116 Used time : 217 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1793 fail_ct: 0 Time elapsed: 3.68 GA Iter: 0 Max score: 0.7573 Min score: 0.6526 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9031 Min score: 0.7777 #Pop: 16 #M+: 1382 #M-: 46 EvolutionarySearch #s: 16 Time elapsed: 18.27 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.61 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 56 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.15 | 48 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.035 ms Trials: 124 Used time : 241 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1014 fail_ct: 609 Time elapsed: 0.84 GA Iter: 0 Max score: 0.8629 Min score: 0.8298 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9486 Min score: 0.8564 #Pop: 16 #M+: 1374 #M-: 72 EvolutionarySearch #s: 16 Time elapsed: 3.47 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.51 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 56 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 56 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 132 Used time : 248 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1776 fail_ct: 0 Time elapsed: 3.69 GA Iter: 0 Max score: 0.7908 Min score: 0.6810 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9305 Min score: 0.8259 #Pop: 16 #M+: 1382 #M-: 47 EvolutionarySearch #s: 16 Time elapsed: 18.05 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.63 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 64 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 56 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 140 Used time : 273 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1005 fail_ct: 632 Time elapsed: 0.76 GA Iter: 0 Max score: 0.8697 Min score: 0.8214 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9633 Min score: 0.8459 #Pop: 16 #M+: 1377 #M-: 70 EvolutionarySearch #s: 16 Time elapsed: 3.47 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.45 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 64 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 64 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 148 Used time : 279 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1794 fail_ct: 0 Time elapsed: 3.67 GA Iter: 0 Max score: 0.6588 Min score: 0.6052 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9298 Min score: 0.7892 #Pop: 16 #M+: 1394 #M-: 43 EvolutionarySearch #s: 16 Time elapsed: 17.99 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.63 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 72 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 64 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 156 Used time : 304 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 974 fail_ct: 638 Time elapsed: 0.74 GA Iter: 0 Max score: 0.8673 Min score: 0.8455 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9544 Min score: 0.8627 #Pop: 16 #M+: 1382 #M-: 77 EvolutionarySearch #s: 16 Time elapsed: 3.46 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.43 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 72 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 72 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 164 Used time : 310 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1799 fail_ct: 1 Time elapsed: 3.68 GA Iter: 0 Max score: 0.7651 Min score: 0.6197 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9655 Min score: 0.9023 #Pop: 16 #M+: 1372 #M-: 48 EvolutionarySearch #s: 16 Time elapsed: 18.05 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.50 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 80 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 72 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 172 Used time : 335 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1000 fail_ct: 622 Time elapsed: 0.75 GA Iter: 0 Max score: 0.8641 Min score: 0.8334 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.8957 Min score: 0.8601 #Pop: 16 #M+: 1377 #M-: 65 EvolutionarySearch #s: 16 Time elapsed: 3.49 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.48 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 98.69 | 80 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 80 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 180 Used time : 342 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1781 fail_ct: 0 Time elapsed: 3.66 GA Iter: 0 Max score: 0.7655 Min score: 0.6635 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9821 Min score: 0.8913 #Pop: 16 #M+: 1391 #M-: 45 EvolutionarySearch #s: 16 Time elapsed: 18.14 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.62 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 88 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 80 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 188 Used time : 366 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1012 fail_ct: 566 Time elapsed: 0.74 GA Iter: 0 Max score: 0.9146 Min score: 0.8536 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9146 Min score: 0.8863 #Pop: 16 #M+: 1388 #M-: 76 EvolutionarySearch #s: 16 Time elapsed: 3.49 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.44 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 88 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 88 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 196 Used time : 373 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1788 fail_ct: 0 Time elapsed: 3.70 GA Iter: 0 Max score: 0.7399 Min score: 0.6505 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9578 Min score: 0.8916 #Pop: 16 #M+: 1397 #M-: 45 EvolutionarySearch #s: 16 Time elapsed: 18.10 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.49 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 96 | | 1 | vm_mod_fused_nn_dense_add | 0.021 | 12.19 | 88 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.034 ms Trials: 204 Used time : 397 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1008 fail_ct: 628 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9234 Min score: 0.8479 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9948 Min score: 0.9072 #Pop: 16 #M+: 1376 #M-: 69 EvolutionarySearch #s: 16 Time elapsed: 3.52 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.44 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 96 | | 1 | vm_mod_fused_nn_dense_add | 0.019 | 13.31 | 96 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.033 ms Trials: 212 Used time : 404 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1777 fail_ct: 0 Time elapsed: 3.66 GA Iter: 0 Max score: 0.7967 Min score: 0.6456 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9699 Min score: 0.8964 #Pop: 16 #M+: 1384 #M-: 44 EvolutionarySearch #s: 16 Time elapsed: 18.16 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.48 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 104 | | 1 | vm_mod_fused_nn_dense_add | 0.019 | 13.31 | 96 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.033 ms Trials: 220 Used time : 428 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1012 fail_ct: 605 Time elapsed: 0.77 GA Iter: 0 Max score: 0.8123 Min score: 0.7859 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.8887 Min score: 0.8289 #Pop: 16 #M+: 1390 #M-: 70 EvolutionarySearch #s: 16 Time elapsed: 3.51 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.46 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 104 | | 1 | vm_mod_fused_nn_dense_add | 0.019 | 13.40 | 104 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.033 ms Trials: 228 Used time : 435 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1770 fail_ct: 0 Time elapsed: 3.63 GA Iter: 0 Max score: 0.6901 Min score: 0.6150 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9919 Min score: 0.8839 #Pop: 16 #M+: 1390 #M-: 36 EvolutionarySearch #s: 16 Time elapsed: 18.25 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.56 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 112 | | 1 | vm_mod_fused_nn_dense_add | 0.019 | 13.40 | 104 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.033 ms Trials: 236 Used time : 460 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1024 fail_ct: 631 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9062 Min score: 0.8772 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9845 Min score: 0.9154 #Pop: 16 #M+: 1387 #M-: 70 EvolutionarySearch #s: 16 Time elapsed: 3.55 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.51 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 112 | | 1 | vm_mod_fused_nn_dense_add | 0.019 | 13.40 | 112 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.033 ms Trials: 244 Used time : 467 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1766 fail_ct: 0 Time elapsed: 3.61 GA Iter: 0 Max score: 0.7807 Min score: 0.6345 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9656 Min score: 0.8971 #Pop: 16 #M+: 1391 #M-: 39 EvolutionarySearch #s: 16 Time elapsed: 18.19 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.63 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.07 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 120 | | 1 | vm_mod_fused_nn_dense_add | 0.019 | 13.40 | 112 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.033 ms Trials: 252 Used time : 491 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1017 fail_ct: 616 Time elapsed: 0.75 GA Iter: 0 Max score: 0.8916 Min score: 0.8486 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9631 Min score: 0.9069 #Pop: 16 #M+: 1392 #M-: 69 EvolutionarySearch #s: 16 Time elapsed: 3.55 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.47 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.05 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.26 | 120 | | 1 | vm_mod_fused_nn_dense_add | 0.018 | 13.62 | 120 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.032 ms Trials: 260 Used time : 498 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1789 fail_ct: 0 Time elapsed: 3.65 GA Iter: 0 Max score: 0.7186 Min score: 0.6269 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9358 Min score: 0.8948 #Pop: 16 #M+: 1384 #M-: 35 EvolutionarySearch #s: 16 Time elapsed: 18.34 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.57 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.07 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.37 | 128 | | 1 | vm_mod_fused_nn_dense_add | 0.018 | 13.62 | 120 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.032 ms Trials: 268 Used time : 523 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1003 fail_ct: 607 Time elapsed: 0.75 GA Iter: 0 Max score: 0.9033 Min score: 0.8858 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9796 Min score: 0.9338 #Pop: 16 #M+: 1384 #M-: 67 EvolutionarySearch #s: 16 Time elapsed: 3.57 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.60 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.37 | 128 | | 1 | vm_mod_fused_nn_dense_add | 0.018 | 13.81 | 128 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.032 ms Trials: 276 Used time : 530 s Next ID: 0 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1788 fail_ct: 0 Time elapsed: 3.66 GA Iter: 0 Max score: 0.7131 Min score: 0.6271 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 1.0194 Min score: 0.8859 #Pop: 16 #M+: 1384 #M-: 40 EvolutionarySearch #s: 16 Time elapsed: 18.37 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.57 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.07 s | ID | Task Description | Latency (ms) | Speed (GFLOPS) | Trials | ----------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------ [ Task Scheduler ] ---------------------------------------------------------------------- | 0 | vm_mod_fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 0.010 | 99.37 | 136 | | 1 | vm_mod_fused_nn_dense_add | 0.018 | 13.81 | 128 | | 2 | vm_mod_fused_nn_max_pool2d | 0.004 | 13.49 | 32 | ----------------------------------------------------------------------------------------------------------------- Estimated total latency: 0.032 ms Trials: 284 Used time : 555 s Next ID: 1 ---------------------------------------------------------------------- ------------------------------ [ Search ] ---------------------------------------------------------------------- Sample Initial Population #s: 1021 fail_ct: 622 Time elapsed: 0.76 GA Iter: 0 Max score: 0.9536 Min score: 0.8689 #Pop: 16 #M+: 0 #M-: 0 GA Iter: 4 Max score: 0.9951 Min score: 0.9339 #Pop: 16 #M+: 1377 #M-: 63 EvolutionarySearch #s: 16 Time elapsed: 3.59 ---------------------------------------------------------------------- ------------------------------ [ Measure ] ---------------------------------------------------------------------- Get 8 programs to measure: ........******** Time elapsed for measurement: 2.52 s ---------------------------------------------------------------------- ------------------------------ [ Train cost model ] ---------------------------------------------------------------------- Time elapsed for training: 0.06 s
Перед использованием оптимизированной модели, необходимо выполнить компиляцию модели
с учетом истории оптимизации, которая была сохранена в файл log_file
.
if is_x86():
with auto_scheduler.ApplyHistoryBest(log_file):
with tvm.transform.PassContext(
opt_level=opt_level, config={"relay.backend.use_auto_scheduler": True},
):
lib = relay.build(mod, target=target, params=params)
На данном этапе можно выполнить измерение времени выполнения с использованием функции
timeit_inference
, проверку качества работы оптимизированной модели с помощью функции
get_accuracy
и сравнение точности классификации с рефенсным значением, которое было
получено после запуска обучения модели.
if is_x86():
autoscheduler_cnn_predict, autoscheduler_cnn_times = timeit_inference(mod, lib, images)
autoscheduler_cnn_accuracy = get_accuracy(labels, autoscheduler_cnn_predict)
assert np.allclose(metric['cnn'], autoscheduler_cnn_accuracy, rtol=1e-5)
autoscheduler_cnn_time = np.median(autoscheduler_cnn_times)
print(f'Медианное время работы после оптимизации слоев с помощью Auto-scheduler: {autoscheduler_cnn_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью Auto-scheduler: 0.0548 мc
8.4. Применение MetaScheduler¶
Вызовем разработанную функцию get_ms_task
, предварительно определив
директорию work_dir
для логирования результатов оптимизации.
В данном случае строка компиляции уже содержит информацию о числе потоков, поэтому модифицировать ее нет необходимости.
if is_x86():
work_dir = "meta_schedule_cnn"
tasks, task_weights = get_ms_task(mod, target, params, opt_level, work_dir)
2024-11-06 18:25:08 [INFO] Logging directory: meta_schedule_cnn/logs Номер задачи: 0 Информация о задаче: fused_layout_transform Номер задачи: 1 Информация о задаче: fused_nn_contrib_conv2d_NCHWc_add_nn_relu Номер задачи: 2 Информация о задаче: fused_nn_max_pool2d Номер задачи: 3 Информация о задаче: fused_layout_transform_reshape Номер задачи: 4 Информация о задаче: fused_nn_dense_add
Далее выполним запуск оптимизации с помощью MetaScheduler посредством вызова
функции tune_ms
, установив число экспериментов при оптимизации равным
N * len(tasks)
.
n_trial_per_task = global_trial
if is_x86():
tune_ms(tasks, task_weights, work_dir, n_trial_per_task * len(tasks))
2024-11-06 18:25:08 [INFO] LocalBuilder: max_workers = 12 2024-11-06 18:25:09 [INFO] LocalRunner: max_workers = 1 2024-11-06 18:25:09 [INFO] [task_scheduler.cc:159] Initializing Task #0: "fused_layout_transform" 2024-11-06 18:25:09 [INFO] [task_scheduler.cc:159] Initializing Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:25:09 [INFO] [task_scheduler.cc:159] Initializing Task #2: "fused_nn_max_pool2d" 2024-11-06 18:25:09 [INFO] [task_scheduler.cc:159] Initializing Task #3: "fused_layout_transform_reshape" 2024-11-06 18:25:09 [INFO] [task_scheduler.cc:159] Initializing Task #4: "fused_nn_dense_add"
[18:25:09] /home/yury/project/tensor_compilers/TVM/tvms/tvm18/src/meta_schedule/schedule_rule/apply_custom_rule.cc:56: Warning: Unknown schedule rule "meta_schedule.pool_max" for target keys "["cpu"]". Checked PackedFuncs: meta_schedule.cpu.meta_schedule.pool_max
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | N/A | N/A | N/A | 0 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | N/A | N/A | N/A | 0 | |
2 | fused_nn_max_pool2d | 50176 | 1 | N/A | N/A | N/A | 0 | |
3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | |
4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 |
2024-11-06 18:25:09 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | N/A | N/A | N/A | 0 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | N/A | N/A | N/A | 0 | 2 | fused_nn_max_pool2d | 50176 | 1 | N/A | N/A | N/A | 0 | 3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | 4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 0 Total latency (us): 0 Total trials: 0 Total latency (us): 0 2024-11-06 18:25:09 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_layout_transform" 2024-11-06 18:25:10 [INFO] [task_scheduler.cc:193] Sending 2 sample(s) to builder 2024-11-06 18:25:10 [INFO] [task_scheduler.cc:195] Sending 2 sample(s) to runner 2024-11-06 18:25:11 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:25:17 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:25:21 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:25:23 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:25:24 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:25:26 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:25:28 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #3: "fused_layout_transform_reshape" 2024-11-06 18:25:29 [INFO] [task_scheduler.cc:193] Sending 2 sample(s) to builder 2024-11-06 18:25:29 [INFO] [task_scheduler.cc:195] Sending 2 sample(s) to runner 2024-11-06 18:25:30 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:25:32 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:25:33 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:25:36 [DEBUG] XGB iter 0: tr-p-rmse: 0.428956 tr-a-peak@32: 1.000000 tr-rmse: 0.428984 tr-rmse: 0.428984 2024-11-06 18:25:36 [DEBUG] XGB iter 25: tr-p-rmse: 0.013121 tr-a-peak@32: 1.000000 tr-rmse: 0.013146 tr-rmse: 0.013146 2024-11-06 18:25:36 [DEBUG] XGB iter 50: tr-p-rmse: 0.005225 tr-a-peak@32: 1.000000 tr-rmse: 0.005226 tr-rmse: 0.005226 2024-11-06 18:25:36 [DEBUG] XGB iter 75: tr-p-rmse: 0.005215 tr-a-peak@32: 1.000000 tr-rmse: 0.005215 tr-rmse: 0.005215 2024-11-06 18:25:36 [DEBUG] XGB iter 100: tr-p-rmse: 0.005215 tr-a-peak@32: 1.000000 tr-rmse: 0.005215 tr-rmse: 0.005215 2024-11-06 18:25:36 [DEBUG] XGB stopped. Best iteration: [63] tr-p-rmse:0.00522 tr-a-peak@32:1.00000 tr-rmse:0.00522 tr-rmse:0.00522 2024-11-06 18:25:36 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_layout_transform"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | N/A | N/A | N/A | 0 | |
2 | fused_nn_max_pool2d | 50176 | 1 | N/A | N/A | N/A | 0 | |
3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | |
4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 |
2024-11-06 18:25:36 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | N/A | N/A | N/A | 0 | 2 | fused_nn_max_pool2d | 50176 | 1 | N/A | N/A | N/A | 0 | 3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | 4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 2 Total latency (us): 3.19523 Total trials: 2 Total latency (us): 3.19523 2024-11-06 18:25:36 [DEBUG] XGB iter 0: tr-p-rmse: 0.603430 tr-a-peak@32: 1.000000 tr-rmse: 0.379929 tr-rmse: 0.379929 2024-11-06 18:25:36 [DEBUG] XGB iter 25: tr-p-rmse: 0.035632 tr-a-peak@32: 1.000000 tr-rmse: 0.388200 tr-rmse: 0.388200 2024-11-06 18:25:36 [DEBUG] XGB iter 50: tr-p-rmse: 0.035772 tr-a-peak@32: 1.000000 tr-rmse: 0.388074 tr-rmse: 0.388074 2024-11-06 18:25:36 [DEBUG] XGB stopped. Best iteration: [22] tr-p-rmse:0.03496 tr-a-peak@32:1.00000 tr-rmse:0.38890 tr-rmse:0.38890 2024-11-06 18:25:36 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | |
2 | fused_nn_max_pool2d | 50176 | 1 | N/A | N/A | N/A | 0 | |
3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | |
4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 |
Total trials: 10 Total latency (us): 16.9321 2024-11-06 18:25:36 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | 2 | fused_nn_max_pool2d | 50176 | 1 | N/A | N/A | N/A | 0 | 3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | 4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 10 Total latency (us): 16.9321 2024-11-06 18:25:36 [DEBUG] XGB iter 0: tr-p-rmse: 0.483316 tr-a-peak@32: 1.000000 tr-rmse: 0.392427 tr-rmse: 0.392427 2024-11-06 18:25:36 [DEBUG] XGB iter 25: tr-p-rmse: 0.034418 tr-a-peak@32: 1.000000 tr-rmse: 0.414158 tr-rmse: 0.414158 2024-11-06 18:25:36 [DEBUG] XGB iter 50: tr-p-rmse: 0.034458 tr-a-peak@32: 1.000000 tr-rmse: 0.414098 tr-rmse: 0.414098 2024-11-06 18:25:36 [DEBUG] XGB stopped. Best iteration: [21] tr-p-rmse:0.03411 tr-a-peak@32:1.00000 tr-rmse:0.41467 tr-rmse:0.41467 2024-11-06 18:25:36 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | |
3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | |
4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 |
Total trials: 18 Total latency (us): 23.9759 2024-11-06 18:25:36 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | 3 | fused_layout_transform_reshape | 1 | 1 | N/A | N/A | N/A | 0 | 4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 18 Total latency (us): 23.9759 2024-11-06 18:25:36 [INFO] [task_scheduler.cc:237] [Updated] Task #3: "fused_layout_transform_reshape"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 |
Total trials: 20 Total latency (us): 31.1984 2024-11-06 18:25:36 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | N/A | N/A | N/A | 0 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 20 Total latency (us): 31.1984 2024-11-06 18:25:36 [DEBUG] XGB iter 0: tr-p-rmse: 0.480004 tr-a-peak@32: 1.000000 tr-rmse: 0.408802 tr-rmse: 0.408802 2024-11-06 18:25:36 [DEBUG] XGB iter 25: tr-p-rmse: 0.032664 tr-a-peak@32: 1.000000 tr-rmse: 0.436189 tr-rmse: 0.436189 2024-11-06 18:25:36 [DEBUG] XGB iter 50: tr-p-rmse: 0.032666 tr-a-peak@32: 1.000000 tr-rmse: 0.436185 tr-rmse: 0.436185 2024-11-06 18:25:36 [DEBUG] XGB stopped. Best iteration: [17] tr-p-rmse:0.03239 tr-a-peak@32:1.00000 tr-rmse:0.43671 tr-rmse:0.43671 2024-11-06 18:25:36 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 8 |
Total trials: 28 Total latency (us): 42.2394 2024-11-06 18:25:36 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 8 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 8 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 28 Total latency (us): 42.2394 2024-11-06 18:25:36 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:25:42 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:25:45 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:25:47 [DEBUG] XGB validation: p-rmse: 0.350801 a-peak@32: 0.775240 2024-11-06 18:25:47 [DEBUG] XGB iter 0: tr-p-rmse: 0.493188 tr-a-peak@32: 1.000000 tr-rmse: 0.382766 tr-rmse: 0.382766 2024-11-06 18:25:47 [DEBUG] XGB iter 25: tr-p-rmse: 0.053644 tr-a-peak@32: 1.000000 tr-rmse: 0.426199 tr-rmse: 0.426199 2024-11-06 18:25:48 [DEBUG] XGB iter 50: tr-p-rmse: 0.053647 tr-a-peak@32: 1.000000 tr-rmse: 0.426196 tr-rmse: 0.426196 2024-11-06 18:25:48 [DEBUG] XGB stopped. Best iteration: [17] tr-p-rmse:0.05316 tr-a-peak@32:1.00000 tr-rmse:0.42675 tr-rmse:0.42675 2024-11-06 18:25:48 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 8 |
Total trials: 36 Total latency (us): 42.2394 2024-11-06 18:25:48 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 8 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 36 Total latency (us): 42.2394 2024-11-06 18:25:48 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:25:49 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:25:51 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:25:53 [DEBUG] XGB validation: p-rmse: 0.225171 a-peak@32: 0.951184 2024-11-06 18:25:53 [DEBUG] XGB iter 0: tr-p-rmse: 0.496843 tr-a-peak@32: 1.000000 tr-rmse: 0.385603 tr-rmse: 0.385603 2024-11-06 18:25:53 [DEBUG] XGB iter 25: tr-p-rmse: 0.048279 tr-a-peak@32: 1.000000 tr-rmse: 0.431890 tr-rmse: 0.431890 2024-11-06 18:25:53 [DEBUG] XGB iter 50: tr-p-rmse: 0.048280 tr-a-peak@32: 1.000000 tr-rmse: 0.431889 tr-rmse: 0.431889 2024-11-06 18:25:53 [DEBUG] XGB stopped. Best iteration: [17] tr-p-rmse:0.04810 tr-a-peak@32:1.00000 tr-rmse:0.43212 tr-rmse:0.43212 2024-11-06 18:25:53 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 |
Total trials: 44 Total latency (us): 42.2394 2024-11-06 18:25:53 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 44 Total latency (us): 42.2394 2024-11-06 18:25:53 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #3: "fused_layout_transform_reshape" 2024-11-06 18:25:54 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:25:54 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:25:54 [INFO] [task_scheduler.cc:237] [Updated] Task #3: "fused_layout_transform_reshape"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 |
2024-11-06 18:25:54 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 8 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 44 Total latency (us): 42.2394 Total trials: 44 Total latency (us): 42.2394 2024-11-06 18:25:54 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:25:55 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:25:57 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:25:59 [DEBUG] XGB validation: p-rmse: 0.077958 a-peak@32: 1.000000 2024-11-06 18:25:59 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 |
2024-11-06 18:25:59 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 16 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 52 Total latency (us): 42.2394 Total trials: 52 Total latency (us): 42.2394 2024-11-06 18:25:59 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:26:06 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:26:08 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:26:10 [DEBUG] XGB validation: p-rmse: 0.193769 a-peak@32: 1.000000 2024-11-06 18:26:10 [DEBUG] XGB iter 0: tr-p-rmse: 0.485757 tr-a-peak@32: 1.000000 tr-rmse: 0.376904 tr-rmse: 0.376904 2024-11-06 18:26:10 [DEBUG] XGB iter 25: tr-p-rmse: 0.058399 tr-a-peak@32: 1.000000 tr-rmse: 0.427946 tr-rmse: 0.427946 2024-11-06 18:26:10 [DEBUG] XGB iter 50: tr-p-rmse: 0.058400 tr-a-peak@32: 1.000000 tr-rmse: 0.427944 tr-rmse: 0.427944 2024-11-06 18:26:10 [DEBUG] XGB stopped. Best iteration: [17] tr-p-rmse:0.05822 tr-a-peak@32:1.00000 tr-rmse:0.42820 tr-rmse:0.42820 2024-11-06 18:26:10 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 24 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 |
2024-11-06 18:26:10 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 24 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 16 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 60 Total latency (us): 42.2394 Total trials: 60 Total latency (us): 42.2394 2024-11-06 18:26:10 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:26:11 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:26:13 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:26:15 [DEBUG] XGB validation: p-rmse: 0.162904 a-peak@32: 0.994705 2024-11-06 18:26:15 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 24 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 24 |
2024-11-06 18:26:15 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 24 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 24 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 68 Total latency (us): 42.2394 Total trials: 68 Total latency (us): 42.2394 2024-11-06 18:26:15 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:26:21 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:26:25 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:26:27 [DEBUG] XGB validation: p-rmse: 0.143762 a-peak@32: 0.981613 2024-11-06 18:26:27 [DEBUG] XGB iter 0: tr-p-rmse: 0.481905 tr-a-peak@32: 1.000000 tr-rmse: 0.375591 tr-rmse: 0.375591 2024-11-06 18:26:27 [DEBUG] XGB iter 25: tr-p-rmse: 0.053558 tr-a-peak@32: 0.999831 tr-rmse: 0.428806 tr-rmse: 0.428806 2024-11-06 18:26:27 [DEBUG] XGB iter 50: tr-p-rmse: 0.053559 tr-a-peak@32: 0.999831 tr-rmse: 0.428805 tr-rmse: 0.428805 2024-11-06 18:26:27 [DEBUG] XGB stopped. Best iteration: [18] tr-p-rmse:0.05348 tr-a-peak@32:0.99983 tr-rmse:0.42893 tr-rmse:0.42893 2024-11-06 18:26:27 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 24 |
2024-11-06 18:26:27 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 22.7236 | 11.0410 | 11.0410 | 24 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 76 Total latency (us): 42.2394 Total trials: 76 Total latency (us): 42.2394 2024-11-06 18:26:27 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:26:29 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:26:30 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:26:33 [DEBUG] XGB validation: p-rmse: 0.145088 a-peak@32: 0.977988 2024-11-06 18:26:33 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 |
2024-11-06 18:26:33 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 84 Total latency (us): 41.4546 Total trials: 84 Total latency (us): 41.4546 2024-11-06 18:26:33 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #3: "fused_layout_transform_reshape" 2024-11-06 18:26:33 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:26:33 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:26:33 [INFO] [task_scheduler.cc:237] [Updated] Task #3: "fused_layout_transform_reshape"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 |
2024-11-06 18:26:33 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | 2 | fused_nn_max_pool2d | 50176 | 1 | 7.1234 | 7.0438 | 7.0438 | 16 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 84 Total latency (us): 41.4546 Total trials: 84 Total latency (us): 41.4546 2024-11-06 18:26:33 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:26:34 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:26:37 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:26:39 [DEBUG] XGB validation: p-rmse: 0.656155 a-peak@32: 0.666110 2024-11-06 18:26:39 [DEBUG] XGB iter 0: tr-p-rmse: 0.479655 tr-a-peak@32: 0.994792 tr-rmse: 0.343173 tr-rmse: 0.343173 2024-11-06 18:26:39 [DEBUG] XGB iter 25: tr-p-rmse: 0.051710 tr-a-peak@32: 0.999831 tr-rmse: 0.406589 tr-rmse: 0.406589 2024-11-06 18:26:39 [DEBUG] XGB iter 50: tr-p-rmse: 0.051711 tr-a-peak@32: 0.999831 tr-rmse: 0.406587 tr-rmse: 0.406587 2024-11-06 18:26:39 [DEBUG] XGB stopped. Best iteration: [18] tr-p-rmse:0.05157 tr-a-peak@32:0.99983 tr-rmse:0.40679 tr-rmse:0.40679 2024-11-06 18:26:39 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 |
2024-11-06 18:26:39 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 32 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 92 Total latency (us): 38.131 Total trials: 92 Total latency (us): 38.131 2024-11-06 18:26:39 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:26:45 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:26:48 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:26:50 [DEBUG] XGB validation: p-rmse: 0.161438 a-peak@32: 0.928364 2024-11-06 18:26:50 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 40 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 |
2024-11-06 18:26:50 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 40 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 100 Total latency (us): 38.131 Total trials: 100 Total latency (us): 38.131 2024-11-06 18:26:50 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_layout_transform" 2024-11-06 18:26:51 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:26:51 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:26:51 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_layout_transform"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 40 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 |
2024-11-06 18:26:51 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 73.0532 | 13.7368 | 13.7368 | 40 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 100 Total latency (us): 38.131 Total trials: 100 Total latency (us): 38.131 2024-11-06 18:26:51 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:27:04 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:27:06 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:27:08 [DEBUG] XGB validation: p-rmse: 0.263920 a-peak@32: 0.993032 2024-11-06 18:27:08 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 77.6145 | 12.9295 | 12.9295 | 48 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 |
Total trials: 108 Total latency (us): 37.3237 2024-11-06 18:27:08 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 77.6145 | 12.9295 | 12.9295 | 48 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 24.4622 | 10.2562 | 10.2562 | 32 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 108 Total latency (us): 37.3237 2024-11-06 18:27:08 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:27:12 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:27:24 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:27:26 [DEBUG] XGB validation: p-rmse: 0.176584 a-peak@32: 1.000000 2024-11-06 18:27:26 [DEBUG] XGB iter 0: tr-p-rmse: 0.492402 tr-a-peak@32: 0.998163 tr-rmse: 0.367762 tr-rmse: 0.367762 2024-11-06 18:27:26 [DEBUG] XGB iter 25: tr-p-rmse: 0.049030 tr-a-peak@32: 0.999831 tr-rmse: 0.430966 tr-rmse: 0.430966 2024-11-06 18:27:26 [DEBUG] XGB iter 50: tr-p-rmse: 0.049030 tr-a-peak@32: 0.999831 tr-rmse: 0.430965 tr-rmse: 0.430965 2024-11-06 18:27:26 [DEBUG] XGB stopped. Best iteration: [18] tr-p-rmse:0.04894 tr-a-peak@32:0.99983 tr-rmse:0.43111 tr-rmse:0.43111 2024-11-06 18:27:26 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 77.6145 | 12.9295 | 12.9295 | 48 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 40 |
2024-11-06 18:27:26 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 77.6145 | 12.9295 | 12.9295 | 48 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 40 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 116 Total latency (us): 36.1314 Total trials: 116 Total latency (us): 36.1314 2024-11-06 18:27:26 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #3: "fused_layout_transform_reshape" 2024-11-06 18:27:28 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:27:28 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:27:28 [INFO] [task_scheduler.cc:237] [Updated] Task #3: "fused_layout_transform_reshape"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 77.6145 | 12.9295 | 12.9295 | 48 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 40 |
2024-11-06 18:27:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 77.6145 | 12.9295 | 12.9295 | 48 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 40 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 116 Total latency (us): 36.1314 Total trials: 116 Total latency (us): 36.1314 2024-11-06 18:27:28 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:27:41 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:27:43 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:27:45 [DEBUG] XGB validation: p-rmse: 0.295029 a-peak@32: 0.865659 2024-11-06 18:27:45 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 56 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 40 |
2024-11-06 18:27:45 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 56 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 40 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 124 Total latency (us): 35.9165 Total trials: 124 Total latency (us): 35.9165 2024-11-06 18:27:45 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:27:49 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:27:50 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:27:52 [DEBUG] XGB validation: p-rmse: 0.047716 a-peak@32: 1.000000 2024-11-06 18:27:52 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 56 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 48 |
Total trials: 132 Total latency (us): 35.9165 2024-11-06 18:27:52 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 56 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 48 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 132 Total latency (us): 35.9165 2024-11-06 18:27:52 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:28:05 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:28:07 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:28:09 [DEBUG] XGB validation: p-rmse: 0.379888 a-peak@32: 0.995790 2024-11-06 18:28:09 [DEBUG] XGB iter 0: tr-p-rmse: 0.458952 tr-a-peak@32: 0.995886 tr-rmse: 0.405198 tr-rmse: 0.405198 2024-11-06 18:28:09 [DEBUG] XGB iter 25: tr-p-rmse: 0.089273 tr-a-peak@32: 0.999562 tr-rmse: 0.477386 tr-rmse: 0.477386 2024-11-06 18:28:09 [DEBUG] XGB iter 50: tr-p-rmse: 0.089273 tr-a-peak@32: 0.999562 tr-rmse: 0.477385 tr-rmse: 0.477385 2024-11-06 18:28:10 [DEBUG] XGB stopped. Best iteration: [19] tr-p-rmse:0.08919 tr-a-peak@32:0.99956 tr-rmse:0.47753 tr-rmse:0.47753 2024-11-06 18:28:10 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 64 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 48 |
2024-11-06 18:28:10 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 64 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 48 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 140 Total latency (us): 35.9165 Total trials: 140 Total latency (us): 35.9165 2024-11-06 18:28:10 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #3: "fused_layout_transform_reshape" 2024-11-06 18:28:11 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:28:11 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:28:11 [INFO] [task_scheduler.cc:237] [Updated] Task #3: "fused_layout_transform_reshape"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 64 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 48 |
2024-11-06 18:28:11 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 64 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 27.6800 | 9.0639 | 9.0639 | 48 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 140 Total latency (us): 35.9165 Total trials: 140 Total latency (us): 35.9165 2024-11-06 18:28:11 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:28:15 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:28:27 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:28:29 [DEBUG] XGB validation: p-rmse: 0.151383 a-peak@32: 0.975659 2024-11-06 18:28:29 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 64 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 |
Total trials: 148 Total latency (us): 35.4354 2024-11-06 18:28:29 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 78.9262 | 12.7147 | 12.7147 | 64 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 148 Total latency (us): 35.4354 2024-11-06 18:28:29 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:28:42 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:28:44 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:28:46 [DEBUG] XGB validation: p-rmse: 0.282471 a-peak@32: 0.996347 2024-11-06 18:28:46 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 72 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 |
2024-11-06 18:28:46 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 72 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 156 Total latency (us): 34.9648 Total trials: 156 Total latency (us): 34.9648 2024-11-06 18:28:46 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_layout_transform" 2024-11-06 18:28:48 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:28:48 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:28:48 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_layout_transform"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 72 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 |
2024-11-06 18:28:48 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 72 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 156 Total latency (us): 34.9648 Total trials: 156 Total latency (us): 34.9648 2024-11-06 18:28:48 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #3: "fused_layout_transform_reshape" 2024-11-06 18:28:50 [INFO] [task_scheduler.cc:260] Task #3 has finished. Remaining task(s): 4
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 72 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 |
2024-11-06 18:28:50 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 72 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 156 Total latency (us): 34.9648 Total trials: 156 Total latency (us): 34.9648 2024-11-06 18:28:50 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:29:03 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:29:05 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:29:07 [DEBUG] XGB validation: p-rmse: 0.503644 a-peak@32: 0.826634 2024-11-06 18:29:07 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 80 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 |
2024-11-06 18:29:07 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 80 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 56 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 164 Total latency (us): 34.9648 Total trials: 164 Total latency (us): 34.9648 2024-11-06 18:29:07 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:29:10 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:29:14 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:29:16 [DEBUG] XGB validation: p-rmse: 0.085098 a-peak@32: 0.992122 2024-11-06 18:29:16 [DEBUG] XGB iter 0: tr-p-rmse: 0.450609 tr-a-peak@32: 0.988297 tr-rmse: 0.418118 tr-rmse: 0.418118 2024-11-06 18:29:16 [DEBUG] XGB iter 25: tr-p-rmse: 0.114067 tr-a-peak@32: 1.000000 tr-rmse: 0.493538 tr-rmse: 0.493538 2024-11-06 18:29:16 [DEBUG] XGB iter 50: tr-p-rmse: 0.114067 tr-a-peak@32: 1.000000 tr-rmse: 0.493538 tr-rmse: 0.493538 2024-11-06 18:29:16 [DEBUG] XGB stopped. Best iteration: [16] tr-p-rmse:0.11400 tr-a-peak@32:1.00000 tr-rmse:0.49367 tr-rmse:0.49367 2024-11-06 18:29:16 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 80 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 |
2024-11-06 18:29:16 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 80 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 172 Total latency (us): 34.9648 Total trials: 172 Total latency (us): 34.9648 2024-11-06 18:29:16 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:29:29 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:29:31 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:29:33 [DEBUG] XGB validation: p-rmse: 0.347296 a-peak@32: 1.000000 2024-11-06 18:29:33 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 88 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 |
2024-11-06 18:29:33 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 88 | 2 | fused_nn_max_pool2d | 50176 | 1 | 13.4876 | 3.7202 | 3.7202 | 24 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 180 Total latency (us): 34.9648 Total trials: 180 Total latency (us): 34.9648 2024-11-06 18:29:33 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:29:36 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:29:38 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:29:40 [DEBUG] XGB validation: p-rmse: 0.077047 a-peak@32: 0.987611 2024-11-06 18:29:40 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 88 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 32 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 |
2024-11-06 18:29:40 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 88 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 32 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 188 Total latency (us): 34.7811 Total trials: 188 Total latency (us): 34.7811 2024-11-06 18:29:40 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:29:43 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:29:45 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:29:48 [DEBUG] XGB validation: p-rmse: 0.044324 a-peak@32: 1.000000 2024-11-06 18:29:48 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 88 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 |
2024-11-06 18:29:48 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 88 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 196 Total latency (us): 34.7811 Total trials: 196 Total latency (us): 34.7811 2024-11-06 18:29:48 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:30:01 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:30:03 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:30:05 [DEBUG] XGB validation: p-rmse: 0.372697 a-peak@32: 0.836836 2024-11-06 18:30:05 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 96 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 |
Total trials: 204 Total latency (us): 34.7811 2024-11-06 18:30:05 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 96 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2317 | 8.5828 | 8.5828 | 64 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 204 Total latency (us): 34.7811 2024-11-06 18:30:05 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:30:08 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:30:10 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:30:12 [DEBUG] XGB validation: p-rmse: 0.039160 a-peak@32: 0.999396 2024-11-06 18:30:12 [DEBUG] XGB iter 0: tr-p-rmse: 0.449608 tr-a-peak@32: 0.964817 tr-rmse: 0.432020 tr-rmse: 0.432020 2024-11-06 18:30:12 [DEBUG] XGB iter 25: tr-p-rmse: 0.132792 tr-a-peak@32: 0.999831 tr-rmse: 0.503330 tr-rmse: 0.503330 2024-11-06 18:30:12 [DEBUG] XGB iter 50: tr-p-rmse: 0.132792 tr-a-peak@32: 0.999831 tr-rmse: 0.503330 tr-rmse: 0.503330 2024-11-06 18:30:12 [DEBUG] XGB stopped. Best iteration: [18] tr-p-rmse:0.13278 tr-a-peak@32:0.99983 tr-rmse:0.50335 tr-rmse:0.50335 2024-11-06 18:30:12 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 96 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 |
2024-11-06 18:30:12 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 96 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 212 Total latency (us): 34.7675 Total trials: 212 Total latency (us): 34.7675 2024-11-06 18:30:12 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_layout_transform" 2024-11-06 18:30:13 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:30:13 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:30:13 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_layout_transform"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 96 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 |
2024-11-06 18:30:13 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 96 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 212 Total latency (us): 34.7675 Total trials: 212 Total latency (us): 34.7675 2024-11-06 18:30:13 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:30:26 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:30:28 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:30:30 [DEBUG] XGB validation: p-rmse: 0.341009 a-peak@32: 0.836577 2024-11-06 18:30:30 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 104 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 |
2024-11-06 18:30:30 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 104 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 40 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 220 Total latency (us): 34.7675 Total trials: 220 Total latency (us): 34.7675 2024-11-06 18:30:30 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:30:33 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:30:35 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:30:37 [DEBUG] XGB validation: p-rmse: 0.030518 a-peak@32: 0.988129 2024-11-06 18:30:37 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 104 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 |
2024-11-06 18:30:37 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 104 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 72 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 228 Total latency (us): 34.7675 Total trials: 228 Total latency (us): 34.7675 2024-11-06 18:30:37 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:30:41 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:30:43 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:30:45 [DEBUG] XGB validation: p-rmse: 0.090661 a-peak@32: 0.950780 2024-11-06 18:30:45 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 104 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 80 |
2024-11-06 18:30:45 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 104 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 80 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 236 Total latency (us): 34.7675 Total trials: 236 Total latency (us): 34.7675 2024-11-06 18:30:45 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:30:58 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:31:00 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:31:02 [DEBUG] XGB validation: p-rmse: 0.418442 a-peak@32: 0.817949 2024-11-06 18:31:02 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 112 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 80 |
Total trials: 244 Total latency (us): 34.7675 2024-11-06 18:31:02 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 112 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 80 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 244 Total latency (us): 34.7675 2024-11-06 18:31:02 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:31:16 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:31:18 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:31:20 [DEBUG] XGB validation: p-rmse: 0.424621 a-peak@32: 0.780059 2024-11-06 18:31:20 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 120 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 80 |
2024-11-06 18:31:20 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 120 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 80 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 252 Total latency (us): 34.7675 Total trials: 252 Total latency (us): 34.7675 2024-11-06 18:31:20 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:31:23 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:31:25 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:31:27 [DEBUG] XGB validation: p-rmse: 0.124876 a-peak@32: 0.993320 2024-11-06 18:31:27 [DEBUG] XGB iter 0: tr-p-rmse: 0.447523 tr-a-peak@32: 0.991634 tr-rmse: 0.436950 tr-rmse: 0.436950 2024-11-06 18:31:27 [DEBUG] XGB iter 25: tr-p-rmse: 0.154782 tr-a-peak@32: 1.000000 tr-rmse: 0.511629 tr-rmse: 0.511629 2024-11-06 18:31:27 [DEBUG] XGB iter 50: tr-p-rmse: 0.154782 tr-a-peak@32: 1.000000 tr-rmse: 0.511629 tr-rmse: 0.511629 2024-11-06 18:31:27 [DEBUG] XGB stopped. Best iteration: [17] tr-p-rmse:0.15475 tr-a-peak@32:1.00000 tr-rmse:0.51168 tr-rmse:0.51168 2024-11-06 18:31:27 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 120 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 88 |
2024-11-06 18:31:27 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 120 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 88 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 260 Total latency (us): 34.7675 Total trials: 260 Total latency (us): 34.7675 2024-11-06 18:31:27 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:31:40 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:31:42 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:31:44 [DEBUG] XGB validation: p-rmse: 0.281354 a-peak@32: 0.977430 2024-11-06 18:31:44 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 128 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 88 |
2024-11-06 18:31:44 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 128 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 88 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 268 Total latency (us): 34.7675 Total trials: 268 Total latency (us): 34.7675 2024-11-06 18:31:44 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_layout_transform" 2024-11-06 18:31:46 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:31:46 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:31:46 [INFO] [task_scheduler.cc:237] [Updated] Task #0: "fused_layout_transform"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 128 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 88 |
2024-11-06 18:31:46 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 128 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 88 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 268 Total latency (us): 34.7675 Total trials: 268 Total latency (us): 34.7675 2024-11-06 18:31:46 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:31:49 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:31:51 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:31:53 [DEBUG] XGB validation: p-rmse: 0.511726 a-peak@32: 1.000000 2024-11-06 18:31:53 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 128 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 96 |
2024-11-06 18:31:53 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 128 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 96 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 276 Total latency (us): 34.7675 Total trials: 276 Total latency (us): 34.7675 2024-11-06 18:31:53 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:32:06 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:32:08 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:32:11 [DEBUG] XGB validation: p-rmse: 0.277912 a-peak@32: 0.959712 2024-11-06 18:32:11 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 136 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 96 |
2024-11-06 18:32:11 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 136 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 96 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 284 Total latency (us): 34.7675 Total trials: 284 Total latency (us): 34.7675 2024-11-06 18:32:11 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:32:24 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:32:26 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:32:28 [DEBUG] XGB validation: p-rmse: 0.148024 a-peak@32: 0.900631 2024-11-06 18:32:28 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 144 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 96 |
Total trials: 292 Total latency (us): 34.7675 2024-11-06 18:32:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 144 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 96 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 292 Total latency (us): 34.7675 2024-11-06 18:32:28 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:32:32 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:32:33 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:32:35 [DEBUG] XGB validation: p-rmse: 0.534080 a-peak@32: 0.547863 2024-11-06 18:32:35 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 144 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 104 |
2024-11-06 18:32:35 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 144 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 104 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 300 Total latency (us): 34.7675 Total trials: 300 Total latency (us): 34.7675 2024-11-06 18:32:35 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:32:48 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:32:50 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:32:53 [DEBUG] XGB validation: p-rmse: 0.287411 a-peak@32: 0.953610 2024-11-06 18:32:53 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 152 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 104 |
2024-11-06 18:32:53 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 152 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 104 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 308 Total latency (us): 34.7675 Total trials: 308 Total latency (us): 34.7675 2024-11-06 18:32:53 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:32:56 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:32:58 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:33:00 [DEBUG] XGB validation: p-rmse: 0.375471 a-peak@32: 0.706074 2024-11-06 18:33:00 [DEBUG] XGB iter 0: tr-p-rmse: 0.462738 tr-a-peak@32: 0.912965 tr-rmse: 0.438690 tr-rmse: 0.438690 2024-11-06 18:33:00 [DEBUG] XGB iter 25: tr-p-rmse: 0.139792 tr-a-peak@32: 1.000000 tr-rmse: 0.512912 tr-rmse: 0.512912 2024-11-06 18:33:00 [DEBUG] XGB iter 50: tr-p-rmse: 0.139792 tr-a-peak@32: 1.000000 tr-rmse: 0.512912 tr-rmse: 0.512912 2024-11-06 18:33:00 [DEBUG] XGB stopped. Best iteration: [17] tr-p-rmse:0.13978 tr-a-peak@32:1.00000 tr-rmse:0.51293 tr-rmse:0.51293 2024-11-06 18:33:00 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 152 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 |
2024-11-06 18:33:00 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 81.9600 | 12.2440 | 12.2440 | 152 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 316 Total latency (us): 34.7675 Total trials: 316 Total latency (us): 34.7675 2024-11-06 18:33:00 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:33:13 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:33:15 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:33:18 [DEBUG] XGB validation: p-rmse: 0.205324 a-peak@32: 0.866268 2024-11-06 18:33:18 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 160 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 |
Total trials: 324 Total latency (us): 32.3861 2024-11-06 18:33:18 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 160 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 324 Total latency (us): 32.3861 2024-11-06 18:33:18 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:33:31 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:33:33 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:33:35 [DEBUG] XGB validation: p-rmse: 0.424072 a-peak@32: 0.884443 2024-11-06 18:33:35 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 168 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 |
2024-11-06 18:33:35 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 168 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 332 Total latency (us): 32.3861 Total trials: 332 Total latency (us): 32.3861 2024-11-06 18:33:35 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:33:48 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:33:51 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:33:53 [DEBUG] XGB validation: p-rmse: 0.470635 a-peak@32: 0.918360 2024-11-06 18:33:53 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 176 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 |
2024-11-06 18:33:53 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 176 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 340 Total latency (us): 32.3861 Total trials: 340 Total latency (us): 32.3861 2024-11-06 18:33:53 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:34:06 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:34:08 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:34:10 [DEBUG] XGB validation: p-rmse: 0.555645 a-peak@32: 0.955161 2024-11-06 18:34:10 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 |
Total trials: 348 Total latency (us): 32.3861 2024-11-06 18:34:10 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 348 Total latency (us): 32.3861 2024-11-06 18:34:10 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #0: "fused_layout_transform" 2024-11-06 18:34:12 [INFO] [task_scheduler.cc:260] Task #0 has finished. Remaining task(s): 3
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 |
2024-11-06 18:34:12 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 112 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 348 Total latency (us): 32.3861 Total trials: 348 Total latency (us): 32.3861 2024-11-06 18:34:12 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:34:15 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:34:17 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:34:19 [DEBUG] XGB validation: p-rmse: 0.104416 a-peak@32: 0.993149 2024-11-06 18:34:19 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 120 |
Total trials: 356 Total latency (us): 32.3861 2024-11-06 18:34:19 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 48 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 120 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 356 Total latency (us): 32.3861 2024-11-06 18:34:19 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:34:22 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:34:25 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:34:27 [DEBUG] XGB validation: p-rmse: 0.024494 a-peak@32: 1.000000 2024-11-06 18:34:27 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 120 |
Total trials: 364 Total latency (us): 32.3861 2024-11-06 18:34:27 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 120 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 364 Total latency (us): 32.3861 2024-11-06 18:34:27 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:34:31 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:34:32 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:34:34 [DEBUG] XGB validation: p-rmse: 0.206462 a-peak@32: 0.999547 2024-11-06 18:34:34 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 128 |
Total trials: 372 Total latency (us): 32.3861 2024-11-06 18:34:34 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 128 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 372 Total latency (us): 32.3861 2024-11-06 18:34:34 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:34:38 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:34:40 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:34:42 [DEBUG] XGB validation: p-rmse: 0.206657 a-peak@32: 0.999329 2024-11-06 18:34:42 [DEBUG] XGB iter 0: tr-p-rmse: 0.465364 tr-a-peak@32: 0.972804 tr-rmse: 0.378529 tr-rmse: 0.378529 2024-11-06 18:34:42 [DEBUG] XGB iter 25: tr-p-rmse: 0.105192 tr-a-peak@32: 1.000000 tr-rmse: 0.459046 tr-rmse: 0.459046 2024-11-06 18:34:42 [DEBUG] XGB iter 50: tr-p-rmse: 0.105192 tr-a-peak@32: 1.000000 tr-rmse: 0.459046 tr-rmse: 0.459046 2024-11-06 18:34:42 [DEBUG] XGB stopped. Best iteration: [22] tr-p-rmse:0.10519 tr-a-peak@32:1.00000 tr-rmse:0.45904 tr-rmse:0.45904 2024-11-06 18:34:42 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 136 |
Total trials: 380 Total latency (us): 32.3861 2024-11-06 18:34:42 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 136 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 380 Total latency (us): 32.3861 2024-11-06 18:34:42 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:34:45 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:34:45 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:34:45 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 136 |
Total trials: 380 Total latency (us): 32.3861 2024-11-06 18:34:45 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 136 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 380 Total latency (us): 32.3861 2024-11-06 18:34:45 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:34:48 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:34:50 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:34:52 [DEBUG] XGB validation: p-rmse: 0.097164 a-peak@32: 0.925082 2024-11-06 18:34:52 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 144 |
2024-11-06 18:34:52 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 144 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 388 Total latency (us): 32.3861 Total trials: 388 Total latency (us): 32.3861 2024-11-06 18:34:52 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:34:56 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:34:57 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:34:59 [DEBUG] XGB validation: p-rmse: 0.112814 a-peak@32: 0.951644 2024-11-06 18:34:59 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 152 |
2024-11-06 18:34:59 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 152 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 396 Total latency (us): 32.3861 Total trials: 396 Total latency (us): 32.3861 2024-11-06 18:34:59 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:35:03 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:35:15 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:35:17 [DEBUG] XGB validation: p-rmse: 0.058394 a-peak@32: 0.974092 2024-11-06 18:35:17 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 |
2024-11-06 18:35:17 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 404 Total latency (us): 32.3861 Total trials: 404 Total latency (us): 32.3861 2024-11-06 18:35:17 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:35:20 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:35:20 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:35:20 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 |
Total trials: 404 Total latency (us): 32.3861 2024-11-06 18:35:20 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 101.7493 | 9.8627 | 9.8627 | 184 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 404 Total latency (us): 32.3861 2024-11-06 18:35:20 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:35:33 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:35:35 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:35:38 [DEBUG] XGB validation: p-rmse: 0.210817 a-peak@32: 0.970991 2024-11-06 18:35:38 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 108.2094 | 9.2739 | 9.2739 | 192 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 |
Total trials: 412 Total latency (us): 31.7973 2024-11-06 18:35:38 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 108.2094 | 9.2739 | 9.2739 | 192 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 412 Total latency (us): 31.7973 2024-11-06 18:35:38 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:35:51 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:35:53 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:35:55 [DEBUG] XGB validation: p-rmse: 0.132945 a-peak@32: 0.889768 2024-11-06 18:35:55 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 200 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 |
2024-11-06 18:35:55 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 200 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 420 Total latency (us): 31.5258 Total trials: 420 Total latency (us): 31.5258 2024-11-06 18:35:55 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:36:09 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:36:10 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:36:13 [DEBUG] XGB validation: p-rmse: 0.233211 a-peak@32: 0.925215 2024-11-06 18:36:13 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 208 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 |
2024-11-06 18:36:13 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 208 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 160 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 428 Total latency (us): 31.5258 Total trials: 428 Total latency (us): 31.5258 2024-11-06 18:36:13 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:36:16 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:36:18 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:36:20 [DEBUG] XGB validation: p-rmse: 0.071691 a-peak@32: 0.960906 2024-11-06 18:36:20 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 208 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 168 |
2024-11-06 18:36:20 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 208 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 168 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 436 Total latency (us): 31.5258 Total trials: 436 Total latency (us): 31.5258 2024-11-06 18:36:20 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:36:33 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:36:35 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:36:37 [DEBUG] XGB validation: p-rmse: 0.065844 a-peak@32: 0.975867 2024-11-06 18:36:37 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 168 |
Total trials: 444 Total latency (us): 31.5258 2024-11-06 18:36:37 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 168 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 444 Total latency (us): 31.5258 2024-11-06 18:36:37 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:36:41 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:36:42 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:36:44 [DEBUG] XGB validation: p-rmse: 0.051169 a-peak@32: 0.950947 2024-11-06 18:36:44 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 176 |
Total trials: 452 Total latency (us): 31.5258 2024-11-06 18:36:44 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 176 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 452 Total latency (us): 31.5258 2024-11-06 18:36:44 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #2: "fused_nn_max_pool2d" 2024-11-06 18:36:47 [INFO] [task_scheduler.cc:193] Sending 0 sample(s) to builder 2024-11-06 18:36:47 [INFO] [task_scheduler.cc:195] Sending 0 sample(s) to runner 2024-11-06 18:36:47 [INFO] [task_scheduler.cc:237] [Updated] Task #2: "fused_nn_max_pool2d"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 176 |
Total trials: 452 Total latency (us): 31.5258 2024-11-06 18:36:47 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 176 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 452 Total latency (us): 31.5258 2024-11-06 18:36:47 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:36:51 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:36:53 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:36:55 [DEBUG] XGB validation: p-rmse: 0.089216 a-peak@32: 0.829930 2024-11-06 18:36:55 [DEBUG] XGB iter 0: tr-p-rmse: 0.449478 tr-a-peak@32: 0.969132 tr-rmse: 0.372439 tr-rmse: 0.372439 2024-11-06 18:36:55 [DEBUG] XGB iter 25: tr-p-rmse: 0.088929 tr-a-peak@32: 0.999831 tr-rmse: 0.455742 tr-rmse: 0.455742 2024-11-06 18:36:55 [DEBUG] XGB iter 50: tr-p-rmse: 0.088929 tr-a-peak@32: 0.999831 tr-rmse: 0.455743 tr-rmse: 0.455743 2024-11-06 18:36:55 [DEBUG] XGB iter 75: tr-p-rmse: 0.088929 tr-a-peak@32: 0.999831 tr-rmse: 0.455743 tr-rmse: 0.455743 2024-11-06 18:36:55 [DEBUG] XGB stopped. Best iteration: [25] tr-p-rmse:0.08893 tr-a-peak@32:0.99983 tr-rmse:0.45574 tr-rmse:0.45574 2024-11-06 18:36:55 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 184 |
Total trials: 460 Total latency (us): 31.5258 2024-11-06 18:36:55 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 184 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 460 Total latency (us): 31.5258 2024-11-06 18:36:55 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:36:58 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:37:00 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:37:02 [DEBUG] XGB validation: p-rmse: 0.137296 a-peak@32: 0.996112 2024-11-06 18:37:02 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 192 |
2024-11-06 18:37:02 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 192 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 468 Total latency (us): 31.5258 Total trials: 468 Total latency (us): 31.5258 2024-11-06 18:37:02 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #4: "fused_nn_dense_add" 2024-11-06 18:37:06 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:37:08 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:37:10 [DEBUG] XGB validation: p-rmse: 0.131698 a-peak@32: 0.985965 2024-11-06 18:37:10 [INFO] [task_scheduler.cc:237] [Updated] Task #4: "fused_nn_dense_add"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 |
Total trials: 476 Total latency (us): 31.5258 2024-11-06 18:37:10 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 111.4731 | 9.0023 | 9.0023 | 216 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 476 Total latency (us): 31.5258 2024-11-06 18:37:10 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu" 2024-11-06 18:37:23 [INFO] [task_scheduler.cc:193] Sending 8 sample(s) to builder 2024-11-06 18:37:25 [INFO] [task_scheduler.cc:195] Sending 8 sample(s) to runner 2024-11-06 18:37:28 [DEBUG] XGB validation: p-rmse: 0.071621 a-peak@32: 0.996863 2024-11-06 18:37:28 [INFO] [task_scheduler.cc:237] [Updated] Task #1: "fused_nn_contrib_conv2d_NCHWc_add_nn_relu"
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 |
Total trials: 484 Total latency (us): 31.4787 2024-11-06 18:37:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 484 Total latency (us): 31.4787 2024-11-06 18:37:28 [INFO] [task_scheduler.cc:260] Task #1 has finished. Remaining task(s): 2
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | Y |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 |
2024-11-06 18:37:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | Y 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 484 Total latency (us): 31.4787 Total trials: 484 Total latency (us): 31.4787 2024-11-06 18:37:28 [INFO] [task_scheduler.cc:260] Task #2 has finished. Remaining task(s): 1
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | Y |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | Y |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 |
Total trials: 484 Total latency (us): 31.4787 2024-11-06 18:37:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | Y 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | Y 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 | ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 484 Total latency (us): 31.4787 2024-11-06 18:37:28 [INFO] [task_scheduler.cc:260] Task #4 has finished. Remaining task(s): 0
Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done | |
---|---|---|---|---|---|---|---|---|
0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y |
1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | Y |
2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | Y |
3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y |
4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 | Y |
Total trials: 484 Total latency (us): 31.4787 2024-11-06 18:37:28 [DEBUG] [task_scheduler.cc:318] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done ------------------------------------------------------------------------------------------------------------------------------------------- 0 | fused_layout_transform | 1 | 1 | 0.0003 | 3.1952 | 3.1952 | 2 | Y 1 | fused_nn_contrib_conv2d_NCHWc_add_nn_relu | 1003520 | 1 | 112.0600 | 8.9552 | 8.9552 | 224 | Y 2 | fused_nn_max_pool2d | 50176 | 1 | 14.1879 | 3.5365 | 3.5365 | 56 | Y 3 | fused_layout_transform_reshape | 1 | 1 | 0.0001 | 7.2225 | 7.2225 | 2 | Y 4 | fused_nn_dense_add | 250890 | 1 | 29.2783 | 8.5691 | 8.5691 | 200 | Y ------------------------------------------------------------------------------------------------------------------------------------------- Total trials: 484 Total latency (us): 31.4787
После оптимизации можно скомпилировать нейронную с учетом построенных оптимизаций
с помощью интерфейса MetaScheduler ms.relay_integration.compile_relay
.
if is_x86():
database = ms.database.JSONDatabase(
f"{work_dir}/database_workload.json",
f"{work_dir}/database_tuning_record.json",
allow_missing=False
)
lib = ms.relay_integration.compile_relay(
database, mod, target, params,
opt_level=opt_level,
)
В завершении измерим время вывода с использованием функции timeit_inference
,
определим качество работы модели с помощью функции get_accuracy
и выполним
проверку корректности работы оптимизированной модели, сравнив полученное значение
показателя точности с референсным.
if is_x86():
ms_cnn_predict, ms_cnn_times = timeit_inference(mod, lib, images)
ms_cnn_accuracy = get_accuracy(labels, ms_cnn_predict)
assert np.allclose(metric['cnn'], ms_cnn_accuracy, rtol=1e-5)
ms_cnn_time = np.median(ms_cnn_times)
print(f'Медианное время работы после оптимизации слоев с помощью MetaScheduler: {ms_cnn_time:.4f} мc')
Медианное время работы после оптимизации слоев с помощью MetaScheduler: 0.0415 мc
8.5. Анализ результатов¶
Для анализа результатов оптимизации нейронной сети с использованием различных методов построим прафик медианного времени выполнения.
fig, ax = plt.subplots()
name = ['Без оптимизации\nслоев', 'AutoTVM', 'Auto-scheduler', 'MetaScheduler']
times = [default_cnn_time, autotvm_cnn_time, autoscheduler_cnn_time, ms_cnn_time]
bars = ax.bar(name, times, label=name, color=bar_colors)
ax.set_title('Среднее время\nвыполнения (мс)', fontsize=18)
for bar, n, t in zip(bars, name, times):
h = bar.get_height()
if n == 'Без оптимизации\nслоев': h = h / 2
if h != 0:
ax.text(
bar.get_x() + bar.get_width() / 2,
h,
f'{round(t, 4)} с',
ha='center',
va='bottom',
fontsize=15,
)
ax.xaxis.label.set_size(40)
ax.set_title('Среднее время\nвыполнения (с)', fontsize=18)
plt.grid()
Вывод: оптимизация значительно ускоряет время работы сети.