Other Services
ML
- AI > ML > Deep Learning
- AI is more general discipline
- ML a toolset within AI discipline
- deep learning is a type of ML that can process unstructured data such as images, speech, video, NLP etc
- Many ways to implement ML on GCP
- ML Frameworks: such as TensorFlow for advanced users
- Cloud AutoML: Bring your own data to train model
- Pre-trained Models: services such as NLP API, Vision API, ML, Speech API, Translate API
- Three ways to build ML model:
- BigQueryML: SQL Based, fast, runs in minutes, not very accurate for production
- AutoML: runs in a few hours, does feature engineering, much more accurate
- Custom: (using Keras, TensorFlow and Python),
- TensorFlow ML requires
- An Estimator object that implements
fit(), eval(), predict() and export_savedmodel() methods
- An input function that returns a tuple of features and labels.
- Features are a dict containing feature column and tensor, labels contains tensor containing label column
- A serving function that make predictions based on user input, returns a tuple of features and inputs
- it's similar to training function, except the user input data might have different format
- Train and Evaluate functions
- defined using estimator, TrainSpec and EvalSpec
AutoML
- 3 phases Train, Deploy, Serve
- Train with Prepared Dataset
- Prepared Dataset must reside in the same bucket as source files.
- 1st optional column identifies TRAIN, VALIDATION, TEST; 80-10-10 is the default
- VALIDATION data is used for identify incorrect labels and improve the model
- TEST data is used for model evaluation and remove bias
- One run through all trained groups is called Epoch
- Custom models are temporary and deleted after some time
- cannot be exported
- must train new models periodically
- Serve provisions a REST endpoint
- accepts model name and payload (data that needs to be classified)
- returns
displayName (matched labels), classification and score
- AutoML Vision: classify images
- ROT training data: low score => increase data; perfect score => increase variety
- AutoML NLP
- AutoML Table: for structured data in tabular format
Deployment Manager
- Supports python or jinja2 templates
- python template must
- define
GenerateConfig(context) or generate_config(config) method
- return python
dict
- names used are from the template and not from configuration
- both, template and configuration, can use references
- jinja2 templates can reference environment variables using
env and properties using properties
- All resources are created in parallel except that have dependencies
- Use
selfLink to ensure dependencies are created first, e.g. refer a network selfLink in firewall to ensure VPC is create before firewall rule
- Use
gcloud deployment-manager types list | grep instance to find exact type of resource
- Use
gcloud deployment-manager deployments create dminfra --config=config.yaml --preview to preview
- immediately after
--preview, gcloud deployment-manager deployments update dminfra doesn't need --config=...
Stackdriver
- monitoring, logging, alert and notification, Application Performance Management (APM)
- APM includes: application profiler, debugger and cloud trace
- can run query against logs (json fields)
- Logging
- recommended to install logging agent on VM or EC2 instances
- Platform, system and application logging
- logs retained only for 30 days
- Tracing: real-time, latency reporting
- App Engine, Google HTTP(S) load balancers and any application using Slackdriver Trace SDK
- Debugging:
- ability to inject debug logging at runtime
- Error reporting: supported by all services: Kubernetes, App Engine, Compute Engine
- multi-cloud can monitor AWS
- Q: how do you live debug microservice that bursts logs?
A: set up monitor for a burst of log-lines in stackdriver
- App Engine, Kubernetes don't need logging agent installed, but compute engine does
Monitoring
- A workspace can monitor other projects than the hosting project
- Anyone with access to workspace can monitor all projects that are in the workspace
- BP: Use different workspace to limit visibility of specific projects is desired
- Without Monitoring Agent, still can monitor CPU Utilization, Some Disk usage, network traffic and uptime information
- Monitor domain specific information by defining Custom Metric (eg number of gamers on a VM)
- Chart consists of
- Metric: Resource Type, Metric, Filter, Group By, Aggregator
- View Options: Threshold, "Compare to Past", Use Log scale
- Alerting consists of 1+ conditions and 1+ Notifications
- Condition consists of a Metric and Condition (above/below etc), Threshold (value) and For (how long)
- Notification Channel is email, SMS, pager, slack, pubsub (beta)
- creates an incident when alerting policy fails, automatically resolves when policy succeeds again
- views past incidents in Events
- Resource Group allows grouping resources using name, tag, region, resource type etc
- a dashboard is automatically created
- Uptime checks
- Check Type: HTTP, HTTPS, TCP
- Resource Type: URL, VM Instance, App Engine, ELB or AWS Instance
- Applies To: Single or Group
- Check Every: 1/5/10/15 minutes