SRE Engineer
New York City
Contracted
Experienced
No DevOps resume & Cloud Engineer
SRE Experts Only / W2 / GC & USC only
JOB DUTIES/RESPONSIBILITIES:
The successful candidate will:
· The successful candidate will be involved in application support, application server administration, technical troubleshooting of infrastructure and user incidents
· Incorporate Site Reliability Engineering practices into the day-to-day role by developing automated solutions to long-standing problems to ensure minimal downtime and reduce toil
· Experience with web architecture implementation including performance, availability, scalability, and disaster recovery planning.
· Experience with monitoring and alerting tools, configuring application monitors using industry standard monitoring tools, as well as developing customized monitoring solutions
· Revisit SRE Metrics and confirm against the firm and department goals
· Identify areas for improvement including automation, toil reduction, resiliency and observability across the platforms and help build up the knowledge and documentation for the team
· Partner with other teams in Morgan Stanley such as enterprise infrastructure, networking, security, storage, and database and data center to roll out application platforms successfully as per the design.
· Produce reusable infrastructure designs patterns and periodically review / refresh the patterns.
· Support vendor / vendor technology onboarding following the Morgan Stanley best practices and security blueprint.
· Apply technical skills to automate daily support functions, improve system stability, support hygiene initiatives and deliver innovation that creates efficiency and consistency.
· Occasional weekend availability and on-call work on a rotation basis.
Required Skills
· Strong infrastructure knowledge in Linux / Unix, Databases, Storage and Networking technologies.
· Hands-on experience with containers and container orchestration platforms OpenShift / Kubernetes
· Experience with scripting in Python and Shell
· Hands-on experience of web servers (Apache / Nginx), application integration, configuration, and troubleshooting.
· Clear concept of load balancer, web proxies and storage platforms like NAS / SAN from an implementation perspective only.
· Familiar with basic security practices to ensure secure hosting solutions, including single sign-on (SSO) and standard encryption protocols.
· Prior experience managing large web-based n-tier applications in secure environments on cloud
· Strong knowledge SRE Principles with grasp over tools / approach to apply them
· Strong infrastructure knowledge in Storage, Networking and Databases
· Experience in troubleshooting Application Issues and Managing Incidents
· Exposure to tools like Prometheus, Grafana, and Open Telemetry framework
· Excellent verbal and written communication skills.
Desired / Nice to have skills
· Exposure and experience with data pipeline technologies such as Kafka, Redis and Airflow
· Exposure to Big Data platforms like Hadoop / Cloudera and ELK Stack
· Capacity planning and performance tuning exercise
· Identity management protocols like OIDC / OAuth, SAML, LDAP integration
· Cloud Application and infrastructure knowledge is a plus.
· Experience in Cloud / Distributed computing technology or certification is a plus
Experience
· 7 to 12 years in a similar role of hands-on application / middleware specialist.
· Prior experience of working in a global financial organization is an advantage
If interested please share your updated resume [email protected]
Thanks
SRE Experts Only / W2 / GC & USC only
JOB DUTIES/RESPONSIBILITIES:
The successful candidate will:
· The successful candidate will be involved in application support, application server administration, technical troubleshooting of infrastructure and user incidents
· Incorporate Site Reliability Engineering practices into the day-to-day role by developing automated solutions to long-standing problems to ensure minimal downtime and reduce toil
· Experience with web architecture implementation including performance, availability, scalability, and disaster recovery planning.
· Experience with monitoring and alerting tools, configuring application monitors using industry standard monitoring tools, as well as developing customized monitoring solutions
· Revisit SRE Metrics and confirm against the firm and department goals
· Identify areas for improvement including automation, toil reduction, resiliency and observability across the platforms and help build up the knowledge and documentation for the team
· Partner with other teams in Morgan Stanley such as enterprise infrastructure, networking, security, storage, and database and data center to roll out application platforms successfully as per the design.
· Produce reusable infrastructure designs patterns and periodically review / refresh the patterns.
· Support vendor / vendor technology onboarding following the Morgan Stanley best practices and security blueprint.
· Apply technical skills to automate daily support functions, improve system stability, support hygiene initiatives and deliver innovation that creates efficiency and consistency.
· Occasional weekend availability and on-call work on a rotation basis.
Required Skills
· Strong infrastructure knowledge in Linux / Unix, Databases, Storage and Networking technologies.
· Hands-on experience with containers and container orchestration platforms OpenShift / Kubernetes
· Experience with scripting in Python and Shell
· Hands-on experience of web servers (Apache / Nginx), application integration, configuration, and troubleshooting.
· Clear concept of load balancer, web proxies and storage platforms like NAS / SAN from an implementation perspective only.
· Familiar with basic security practices to ensure secure hosting solutions, including single sign-on (SSO) and standard encryption protocols.
· Prior experience managing large web-based n-tier applications in secure environments on cloud
· Strong knowledge SRE Principles with grasp over tools / approach to apply them
· Strong infrastructure knowledge in Storage, Networking and Databases
· Experience in troubleshooting Application Issues and Managing Incidents
· Exposure to tools like Prometheus, Grafana, and Open Telemetry framework
· Excellent verbal and written communication skills.
Desired / Nice to have skills
· Exposure and experience with data pipeline technologies such as Kafka, Redis and Airflow
· Exposure to Big Data platforms like Hadoop / Cloudera and ELK Stack
· Capacity planning and performance tuning exercise
· Identity management protocols like OIDC / OAuth, SAML, LDAP integration
· Cloud Application and infrastructure knowledge is a plus.
· Experience in Cloud / Distributed computing technology or certification is a plus
Experience
· 7 to 12 years in a similar role of hands-on application / middleware specialist.
· Prior experience of working in a global financial organization is an advantage
If interested please share your updated resume [email protected]
Thanks
Apply for this position
Required*