Other Services¶

ML¶

AI > ML > Deep Learning
AI is more general discipline
ML a toolset within AI discipline
deep learning is a type of ML that can process unstructured data such as images, speech, video, NLP etc
Many ways to implement ML on GCP
ML Frameworks: such as TensorFlow for advanced users
Cloud AutoML: Bring your own data to train model
Pre-trained Models: services such as NLP API, Vision API, ML, Speech API, Translate API
Three ways to build ML model:
BigQueryML: SQL Based, fast, runs in minutes, not very accurate for production
AutoML: runs in a few hours, does feature engineering, much more accurate
Custom: (using Keras, TensorFlow and Python),
TensorFlow ML requires
An Estimator object that implements fit(), eval(), predict() and export_savedmodel() methods
An input function that returns a tuple of features and labels.
- Features are a dict containing feature column and tensor, labels contains tensor containing label column
A serving function that make predictions based on user input, returns a tuple of features and inputs
- it's similar to training function, except the user input data might have different format
Train and Evaluate functions
- defined using estimator, TrainSpec and EvalSpec

3 phases Train, Deploy, Serve
Train with Prepared Dataset
Prepared Dataset must reside in the same bucket as source files.
- 1st optional column identifies TRAIN, VALIDATION, TEST; 80-10-10 is the default
VALIDATION data is used for identify incorrect labels and improve the model
TEST data is used for model evaluation and remove bias
One run through all trained groups is called Epoch
Custom models are temporary and deleted after some time
cannot be exported
must train new models periodically
Serve provisions a REST endpoint
accepts model name and payload (data that needs to be classified)
returns displayName (matched labels), classification and score
AutoML Vision: classify images
ROT training data: low score => increase data; perfect score => increase variety
AutoML NLP
AutoML Table: for structured data in tabular format

Supports python or jinja2 templates
python template must
define GenerateConfig(context) or generate_config(config) method
return python dict
names used are from the template and not from configuration
both, template and configuration, can use references
jinja2 templates can reference environment variables using env and properties using properties
All resources are created in parallel except that have dependencies
Use selfLink to ensure dependencies are created first, e.g. refer a network selfLink in firewall to ensure VPC is create before firewall rule
Use gcloud deployment-manager types list | grep instance to find exact type of resource
Use gcloud deployment-manager deployments create dminfra --config=config.yaml --preview to preview
immediately after --preview, gcloud deployment-manager deployments update dminfra doesn't need --config=...

monitoring, logging, alert and notification, Application Performance Management (APM)
APM includes: application profiler, debugger and cloud trace
can run query against logs (json fields)
Logging
recommended to install logging agent on VM or EC2 instances
Platform, system and application logging
logs retained only for 30 days
Tracing: real-time, latency reporting
App Engine, Google HTTP(S) load balancers and any application using Slackdriver Trace SDK
Debugging:
ability to inject debug logging at runtime
Error reporting: supported by all services: Kubernetes, App Engine, Compute Engine
multi-cloud can monitor AWS
Q: how do you live debug microservice that bursts logs? A: set up monitor for a burst of log-lines in stackdriver
App Engine, Kubernetes don't need logging agent installed, but compute engine does

A workspace can monitor other projects than the hosting project
Anyone with access to workspace can monitor all projects that are in the workspace
BP: Use different workspace to limit visibility of specific projects is desired
Without Monitoring Agent, still can monitor CPU Utilization, Some Disk usage, network traffic and uptime information
Monitor domain specific information by defining Custom Metric (eg number of gamers on a VM)
Chart consists of
Metric: Resource Type, Metric, Filter, Group By, Aggregator
View Options: Threshold, "Compare to Past", Use Log scale
Alerting consists of 1+ conditions and 1+ Notifications
Condition consists of a Metric and Condition (above/below etc), Threshold (value) and For (how long)
Notification Channel is email, SMS, pager, slack, pubsub (beta)
creates an incident when alerting policy fails, automatically resolves when policy succeeds again
- views past incidents in Events
Resource Group allows grouping resources using name, tag, region, resource type etc
a dashboard is automatically created
Uptime checks
Check Type: HTTP, HTTPS, TCP
Resource Type: URL, VM Instance, App Engine, ELB or AWS Instance
Applies To: Single or Group
Check Every: 1/5/10/15 minutes