Other
SageMaker
Types of SageMake Pipeline steps
Processing: requires a processor, a python script, for processing data and model evaluation, e.g. pyspark job that does data cleaning
Training: Training a model, requires estimator, training and validation inputs
Tuning: hyperparameter tuning. Associated with SageMaker experiment , runs multiple training jobs as trials . Requires a HyperparameterTuner and TrainingInput
can get top performing model(s) (max 50) from the job
CreateModel: create a model
RegisterModel: Register Model or PipelineModel with Sagemaker Registry
Transform: batch transformation to run inference on entire dataset
Kinesis Streaming Data
Streams
Scaling using shards
24h to 7 days retention
EC2 consumers can write to S3, RedShift etc
Firehose
No scaling required
No retention period because it’s delivered to destinations such as
S3, RedShift (via S3), Lambda, Elastic Search
Analytics
Uses SQL to process data in Streams/Firehose
Write data to S3, ElasticSearch, Redshift
BP: Kinesis is for handling massive streaming data, whereas SQS is for inter application communication
SWF Simple Workflow Service
1 year retention
Has 3 type of Actors: WF starters, Deciders and Activity Workers
Can involve non-AWS or manual tasks as part of workflow
CloudWatch: Monitor CPU, network, IOs,
Available via: console, API, SDK or CLI
CloudFormation: JSON formatted template
AWS Trusted Advisor: Best practices in Cost, Security, Performance and Fault Tolerance
Analytics
Elastic Map Reduce Apache Hadoop + EC2, + S3
Kinesis: Streaming
Application Services
API Gateway: Frontend to lambda, Kinesis or HTTP
Traffic management, Access, Monitoring, API versioning
Simple Queue Service (SQS): messaging queuing
Enterprise Applications
Workspaces: Windows, Mac, Chromebook, iPad, Fire/Android tablets
Ports: TCP 443, TCP+UDP: 4172
Value (2GB), Standard Plus (4GB), Performance Plus (7.5GB)
MS Office, Trend Micro
SWF: Simple WorkFlow
AWS Services Access
Management Console, CLI, SDK, Query API (Using HTTP)
Support: 4 levels (Basic (free), Developer, Business, Enterprise)
AWS DevOps
Code pipeline: CodeCommit, CodeBuild, CodeDeploy, CodePipeline
AWS Infrastructure as code: CloudFormation, OpsWorks (Chef), Config
Active Monitoring: CloudWatch, CloudTrail (API caller info such as IP, time params etc)
PaaS: EBS (Elastic Beanstalk): upload code and beanstalk does provisioning, scale, load balancing, health monitoring
Cloud Migration
Unmanaged: rsync, S3 and glacier CLI
Replace Internet: Direct connect, Snowball, S3 Transfer acceleration
Friends S3: Gateways, Partners, Kinesis Firehose
Gateways: on-premise device that interfaces with SAN or VTL
Partners: Write backups directly to S3
Kinesis Firehose: load streaming data to S3/Redshift/Elastic Search
Application DIscovery Service: identify running on-premise applications, performance profiles, configuration data
AWS SMS, Server Migration Service: migrate VM servers including volumes
AWS DMS: Database Migration Service
Uses replication, allows to migrate to different database
AWS Web Architecture
Auto Scaling Group=[Security Group[EC2+CloudWatch] in AZ1 + SecGroup in AZ2]
Can be triggered to expand/shrink
Sec Group = Protocol + Ports + IP ranges
Eg Internet => only ports 80+443 to webserver
Eg Corporate => only ports 22 to App servers
No access to DB servers
Mobile
Cognito: Sign-In
Sync data using Cognito and S3
SNS: Push notification
KMS
AWS Pricing
Compute, Storage, Data Transfer
Simple Monthly Calculator,
TCO Calculator: compare on-premise vs AWS
Billing and Cost Management Console: View, pay bills
Security and Compliance
Infrastructure:
Facility: video surveillance, two factor
Decommissioning; magnetic storage degaussed and physically destroyed
Compliance: HIPAA, ISO 27001, NIST etc
Network
Boundary devices, ACL
Secure Access points: API endpoint, HTTPS access
Transmission protection: TLS, VPC, IPSec VPN
Amazon Corporate is segregated from AWS
Fault Tolerant
Network monitoring and protection, prevents
DDoS, Man in the Middle, IP Spoofing, packet sniffing, port scanning
Computer: shared instances are isolated by Xen hypervisor, AWS FIrewall, Signed API calls
Database security:
RDS: DB security groups, permissions
DynamoDB: requires HMAC-SHA256
Storage: KMS (Key Management Service), S3 encryption client, S3 server side encryption
IAM: Least privilege, explicitly permission, temporary access, multi-factor
CloudTrail: API call tracking