Tensorflow, TPU, Dataset, TFRecords, Cheet sheet

TPU(Tensor Processing Unit)

딥러닝을 위해 만들어진 가속화 하드웨어로, 특히 Tensorflow, Keras와 찰떡 궁합으로 만들어진 장치
Cloud TPU에서 사용 가능하며, 기본적으로 1개 TPU에 8개 코어가 있음

TPU를 쓰는 이유

TPU는 딥러닝에서, 특히 행렬곱을 위해 최적화 된 장치.
1개의 TPU만 잘 사용해도 8개의 코어를 통해 학습 속도를 극대화 할 수 있다.

Illustration: a dense neural network layer as a matrix multiplication, with a batch of eight images processed through the neural network at once. Please run through one line x column multiplication to verify that it is indeed doing a weighted sum of all the pixels values of an image. Convolutional layers can be represented as matrix multiplications too although it's a bit more complicated (  explanation here, in section 1 ).                                        출처 : https://codelabs.developers.google.com/codelabs/keras-flowers-data#2

TPU를 쓰는 개인적이고 주관적인 이유

Google의 TPU Research Cloud 에서는 일반 개인에게도 자유롭게 TPU를 무료로 빌려준다(storage, networking 비용 별도, 코랩 프로 비용 만큼도 나오지 않음)
NIA등 국내 정부기관에서는 기업, 연구실 단위로만 GPU를 지원해주며, 개인에게는 지원해주지 않.는.다
캐글이나, 데이콘 등 대회에 참여시 컴퓨팅 능력이 좋으면 좋은만큼 다양한 실험을 해보고, 빠르게 결과를 확인할 수 있다.(Vram이 크면 큰 모델을, 연산 능력이 빠르면 제한된 시간안에 다양한 모델을.)
VM 자체 스펙도 굉장하다. 96개의 cpu 코어, 120GB RAM은 전처리 과정에도 굉장한 도움이 된다
시즈 탱크와 같다 생각한다. 적용하는 과정은 시즈 모드와 같으며, 한번 시즈 모드가 갖춰지면 굉장한 연산력을 보여준다
현재 Cloud TPU 3-8이 무료로 제공된다. Cloud TPU 4-8가 최근 나온 상태에서 훗날 TPU의 성장 가능성은 무궁무진하다고 생각한다.
Tensorflow에 최적화 되어있어 현재 가지고 있는 기술 스택에도 적합하다.
적용 방법이 살짝 다를 뿐, 결국 GPU를 사용하더라도, Distributed된 학습과 추론이 가능하기 때문에 이걸 배운다해서 GPU를 사용할 때 이 개념이 쓰이지 않는 것은 아니다.
난이도 있는 게임을 좋아하는 사람은 다크 소울을 한다. 나는 다크 소울 대신 TPU를 한다

단점

GPU에 비해 유연하지 못한 점이 있다. padding을 제대로 해주지 않으면 compile을 거치면서 오히려 GPU보다 늦다.
로컬에서 사용할 수 없기 때문에 Cloud를 base로 활용해야하며, 이를 위해 google storage와 vm에 ssh로 접속해서 쓰는 방법이 필요하다
GPU에서 돌아가던 모델이, TPU에서 안돌아가는 경우가 있다(모델 내부에 TPU에서 사용 불가능한 코드가 있는 경우)
python의 function을 쓸 수 없다
초급자에게 진입 장벽이 큰 편. 회사에서도 보편적으로 아직 사용하지 않는 것으로 보인다(기술 블로그에 TPU 흔적이 보이는 기업이 몇 없음)

Dataset

tf.data API로 성능 향상하기 | TensorFlow Core

tf.data API로 성능 향상하기 Note: 이 문서는 텐서플로 커뮤니티에서 번역했습니다. 커뮤니티 번역 활동의 특성상 정확한 번역과 최신 내용을 반영하기 위해 노력함에도 불구하고 공식 영문 문서의

www.tensorflow.org

아무리 GPU든, TPU든 빠린 사람이 있더라도 데이터를 서빙하는 워커가 데이터를 늦게 가져주면 소용이 없다
이러한 병목 제거를 위해 파이프라인을 구성하는 API
이외에도 다양한 API를 통해 데이터 파이프 라인 작성시 도움 되는 기능이 많음
데이터셋을 안써도 학습할 순 있지만, 최적화 하는 과정에서 필수

TFRecord

데이터 파이프 라인의 정점에 있는 형태로 보면 될 것 같다
데이터를 바이너리 형식으로 저장하는 용도

Why TFRecord

TPU의 성능은 매우 빠르므로, 그만큼 데이터 또한 빠르게 전달되어야 한다.
Google Cloud Storage는 TPU에 데이터를 전달하고 쓸데없는 네트워크 절차를 없앨 수 있는 장점이 있다.
따라서 수천 개의 개별 파일로 데이터를 저장하는 것보다 더 적은 수의 파일로 일괄 처리하고 tf.data.Dataset의 기능을 사용하여 여러 파일에서 병렬로 읽는 것이 이상적이다.

TFRecord Cheat sheet

해당 cheat sheet는 여기에서 가져왔습니다

#TFRecord 파일 포맷
#텐서플로에서 데이터 저장을 위해 선호하는 방식은 protobuf기반 TFRecord
#다른 직렬화 방식도 사용할 수 있지만 아래와 같이 사용하면 바로 TFRecord 데이터세트로 로드 가능

filenames = tf.io.gfile.glob(FILENAME_PATTERN)
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...) # do the TFRecord decoding here - see below


#조금 더 복잡하지만, 아래와 같이 작성하면 더 최적의 성능을 낼 수 있음(한번에 읽기)
#N개의 파일을 병렬로 읽고, 속도를 위해 순서는 생략함

AUTOTUNE = tf.data.AUTOTUNE
ignore_order = tf.data.Options()
ignore_order.experimental_deterministic = False

filenames = tf.io.gfile.glob(FILENAME_PATTERN)
dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTOTUNE)
dataset = dataset.with_options(ignore_order)
dataset = dataset.map(...) # do the TFRecord decoding here - see below

3개의 Data Type

# Byte String 쓰기
def _bytestring_feature(list_of_bytestrings):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=list_of_bytestrings))
  
  
# Interger 쓰기
def _int_feature(list_of_ints): # int64
  return tf.train.Feature(int64_list=tf.train.Int64List(value=list_of_ints))


# Floats 쓰기
def _float_feature(list_of_floats): # float32
  return tf.train.Feature(float_list=tf.train.FloatList(value=list_of_floats))


# 위 헬퍼 함수를 사용한 TFRecord 쓰기
# input data in my_img_bytes, my_class, my_height, my_width, my_floats
with tf.python_io.TFRecordWriter(filename) as out_file:
  feature = {
    "image": _bytestring_feature([my_img_bytes]), # one image in the list
    "class": _int_feature([my_class]),            # one class in the list
    "size": _int_feature([my_height, my_width]),  # fixed length (2) list of ints
    "float_data": _float_feature(my_floats)       # variable length  list of floats
  }
  tf_record = tf.train.Example(features=tf.train.Features(feature=feature))
  out_file.write(tf_record.SerializeToString())

TFRecords 읽기

def read_tfrecord(data):
  features = {
    # tf.string = byte string (not text string)
    "image": tf.io.FixedLenFeature([], tf.string), # shape [] means scalar, here, a single byte string
    "class": tf.io.FixedLenFeature([], tf.int64),  # shape [] means scalar, i.e. a single item
    "size": tf.io.FixedLenFeature([2], tf.int64),  # two integers
    "float_data": tf.io.VarLenFeature(tf.float32)  # a variable number of floats
  }

  # decode the TFRecord
  tf_record = tf.io.parse_single_example(data, features)

  # FixedLenFeature fields are now ready to use
  sz = tf_record['size']

  # Typical code for decoding compressed images
  image = tf.io.decode_jpeg(tf_record['image'], channels=3)

  # VarLenFeature fields require additional sparse.to_dense decoding
  float_data = tf.sparse.to_dense(tf_record['float_data'])

  return image, sz, float_data

# decoding a tf.data.TFRecordDataset
dataset = dataset.map(read_tfrecord)
# now a dataset of triplets (image, sz, float_data)

코드 스니펫

# reading single data elements
tf.io.FixedLenFeature([], tf.string)   # for one byte string
tf.io.FixedLenFeature([], tf.int64)    # for one int
tf.io.FixedLenFeature([], tf.float32)  # for one float


# reading fixed size lists of elements
tf.io.FixedLenFeature([N], tf.string)   # list of N byte strings
tf.io.FixedLenFeature([N], tf.int64)    # list of N ints
tf.io.FixedLenFeature([N], tf.float32)  # list of N floats


# reading a variable number of data items
tf.io.VarLenFeature(tf.string)   # list of byte strings
tf.io.VarLenFeature(tf.int64)    # list of ints
tf.io.VarLenFeature(tf.float32)  # list of floats


# A VarLenFeature returns a sparse vector and an additional step is required after decoding the TFRecord
dense_data = tf.sparse.to_dense(tf_record['my_var_len_feature'])

캐글 노트북에 있는 예제

Walkthrough: Building a Dataset of TFRecords

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

www.kaggle.com

import tensorflow as tf
import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file(
    origin=dataset_url,
    fname='flower_photos',
    untar=True,
    cache_dir="..cache"
)
data_dir = pathlib.Path(data_dir)

#Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
#228818944/228813984 [==============================] - 5s 0us/step
#228827136/228813984 [==============================] - 5s 0us/step

# PosixPath('/tmp/.keras/datasets/flower_photos')

from functools import partial
IMG_HEIGHT = 512
IMG_WIDTH = 512

load_split = partial(
    tf.keras.preprocessing.image_dataset_from_directory,
    data_dir,
    validation_split=0.2,
    shuffle=True,
    seed=123,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=1,
)

ds_train = load_split(subset='training')
ds_valid = load_split(subset='validation')
class_names = ds_train.class_names
print(class_names)



#Found 3670 files belonging to 5 classes.
#Using 2936 files for training.

#Found 3670 files belonging to 5 classes.
#Using 734 files for validation.
#['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

batch 되어있는 데이터셋을 풀고, map 을 통해 byte로 변환

from tensorflow.train import BytesList, FloatList, Int64List
from tensorflow.train import Example, Features, Feature

def process_image(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.uint8)
    image = tf.io.encode_jpeg(image)
    return image, label

ds_train_encoded = (
    ds_train
    .unbatch()
    .map(process_image)
)

ds_valid_encoded = (
    ds_valid
    .unbatch()
    .map(process_image)
)

tf.train.example 형태로 변환

def make_example(encoded_image, label):
    image_feature = Feature(
        bytes_list=BytesList(value=[
            encoded_image,
        ]),
    )
    label_feature = Feature(
        int64_list=Int64List(value=[
            label,
        ])
    )

    features = Features(feature={
        'image': image_feature,
        'label': label_feature,
    })
    
    example = Example(features=features)
    
    return example.SerializeToString()

!mkdir -p './kaggle/working/training'

NUM_SHARDS = 32
PATH = './kaggle/working/training/shard_{:02d}.tfrecord'

#샤드는 파일을 자르는 것으로 보면 됨
# 이후 record 파일로 작성

for shard in range(NUM_SHARDS):
    ds_shard = (
        ds_train_encoded
        .shard(NUM_SHARDS, shard)
        .as_numpy_iterator()
    )
    with tf.io.TFRecordWriter(path=PATH.format(shard)) as f:
        for encoded_image, label in ds_shard:
            example = make_example(encoded_image, label)
            f.write(example)

실전 TFRecord

캐글에서 자주 출몰하시는 그랜드 마스터의 TFRecord 데이터셋 만들기

How To Create TFRecords

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

실전 사용 예제

EfficientNetB7 on 100+ flowers

Explore and run machine learning code with Kaggle Notebooks | Using data from Flower Classification with TPUs

www.kaggle.com

저작자표시

TPU(Tensor Processing Unit)

TPU를 쓰는 이유

TPU를 쓰는 개인적이고 주관적인 이유

단점

Dataset

TFRecord

Why TFRecord

TFRecord Cheat sheet

3개의 Data Type

TFRecords 읽기

코드 스니펫

캐글 노트북에 있는 예제

실전 TFRecord

티스토리툴바