https://spark.apache.org/docs/latest/index.html

 

-  pandas API on Spark for pandas workloads

 

- Downloads are pre-packaged for a handful of popular Hadoop versions

 

- Spark runs on both Windows and UNIX-like systems, and it should run on any platform that runs a supported version of Java

 

- it is necessary for applications to use the same version of Scala that Spark was compiled for

For example, when using Scala 2.13, use Spark compiled for 2.13

 

- use this class in the top-level Spark directory.

 

- with this approach, each appliction is given a maximum amount of resources it can use

and holds onto them for its whole duration.

 

- Resource allocation can be configured as follows, based on the cluster type.

 

- At a high level, Spark should relinquish executors when they are no longer used and acquire when  they are needed.

 

- We need a set of heuristics to determine when to remove and request executors.

 

- By default, Spark's scheduler runs jobs in FIFO fashion.

 

- If the jobs at the head of the queue don't need to use the whole cluster, 

later jobs can start to run right away, but if the jobs at the head of the queue are large,

then later jobs may be delayed significantly.

 

- Under fair sharing, Spark assigns tasks between jobs in a "round robin" fashion,

so that all jobs get a roughly equal share of cluster resources.

 

- This feature is disabled by default and available on all coarse-grained cluster managers.

 

- Without any intervention, newly submitted jobs go into a default pool

 

- This is done as follows

 

- This setting is per-thread to make it easy to have a  thread run multiple jobs on behalf of the same user.

 

- If you would like to clear the pool that a thread is associated with, simply call this.

 

- jobs run in FIFO order.

 

- each user's queries will run in order instead of later queries taking resources from that user's earlier ones.

 

- At a high level, every Spark application consists of a driver program that runs the user's main function and executes various parallel opperations on a cluster.

 

- ...the cluster that can be operated on in parallel.

 

- This guide shows each of these features in each of Spark's supported languages.

 

- it's easiest to follow along with if you launch Spark's interactive shell.

 

 

 

 

 

 

 

 

 

 

 

 

 

- It is not only Value but also Pointer, both of these together make up the node.

 

- We do it by just having the next value of A node be the B node.

 

- the same is true of the C node.

 

- if you look at how we're going to have to traverse this, we are going to have to start at head.

 

- that's what we are going to do down here with this print statement.

 

- the syntac is a little bit different than if you are going to use dictionaries.

 

 

 

 

- Our managers deal with all kinds of clients every day. So I can say that we maintain the highest level of service.

 

- In my work I follow the best practices to maintain clean and easy to understand python codes.

 

- In my previous project I carried out the responsibilities of both the Project Manager and the Team Leader.

 

carry out : to do something, to perform

both A and B

 

- As a QA specialist I worked with a test environment where I tested many aspects of the platform

  to ensure that it works as desired.

 

- A cloud architect oversees application architecture

  and deploys it in cloud environments like public cloud, private cloud and hybrid cloud.

 

to oversee : to watch over and control something to make sure that the work is good or satisfactory, to supervise

 

- I took a course where I learned how to design and write programs that are easy to maintain.

 

to design : to create, draw, or construct something

 

- I will set up all the necessary equipment in my home office to work remotely on this project

 

- I'm an IT Technician, so I install and configure different software on all computers in the office.

 

to install : to put a new program or piece of software into a computer

to configure : to chagne setting of software on a computer

 

- As a Jr Software Engineer, I assist and participate in the research, design, development and testing software and tools.

 

to assist : to help someone or something

 

- I am a web designer, so I know hot to provide the best UX for your website visitors.

 

- Project managers usually estimate new projects by analogy, using previous projecs and past experience.

 

to estimate : to give a general idea of the cost of work or the time you need to do the work

analogy : a comparison of two things based on their being alike in some way

 

- Sometimes I need to google my questions, for ex "how to execute the code inside of function in JS"

 

- Working on my project I improved my time management and organizational skills.

 

- press the F2 key on your keyboard

 

- The screen resolution is 1366x768

 

- I prefer work with desktop

 

- Workstation PCs have multiple processor cores.

 

- Some tablets have a long battery life

 

- The volume on my speakers won't ture up.

 

- My printer broke down, so I printed out these documentations at work.

 

- 'ram' is not countable, so only possible to say 'ram is' or 'ram was', not 'rams' or 'rams are'

 

- with a cable : wired mouse, wired connection(Ethernet)

- without a cable : wireless mouse, wireless connection(WiFi)

 

- ISP stands for Internet Service Provider.

 

- so many folers on my desktop

 

- start or shut down a computer.

- turn on or turn off a computer

 

- to crash / to freeze up : when a computer suddenly stops working

 

- to look up a word or address : to find something.

  we can use 'nslookup' command in terminal to query to DNS server.

 

- It will take about two hours to key in all this data.

 

to key in : to enter info into computer

 

- a shortcut key : 단축키.

 

Use Ctrl + L shortcut to see the last saved version.

 

- 'perform' is used a lot more than my thought. for ex, the server performs instructions written in code.

 

- use only the numbers in given array.

 

- I assigned the number 33 to age variable.

 

- fraction : 분수

- numerator : 분자

- denominator : 분모

  ex, 1/3 : one thrid, 2/3 : two thirds, 1/2 : a half(second), 1/4 : a quarter, 3/4 : three quarters

 

- decimal : 소수

- decimal point : 소수점

- floating point : 부동소수점

  ex, 1.23 : one point two three, 15.1 : fifteen point one

 

'double' also has a point but 'float' and 'double' are different each.

let's look up how they consist differently.

 

- I created an array of strings.

 

- To debug is to investigate the program and fix bugs.

 

- Comment is a text written around code that is ignored by the computer.

It is used for writing extra info about your code to help you undertand it later.

so we can say, 'leave comments in your code.'

 

- 'Comment out' is to turn a piece of code into a comment with the help of special characters.

like, //, #, -- ... etc

you can comment out some lines to see how it works without them.

 

- Constant is a variable that never changes its value.

for ex, val a = 1, final int a= 1

we can say "In Java, a constant is assigned using the final keyword"

"the PI constant has the value of 3.14"

 

- If you try to divide a number by zero, your program will crash.

A program crashes when it stops running because of an error.

 

- An 'executable' is a program which is ready to be run. 

Short for executable file, executable program

A common filename extension .exe means that it is an executable file.

it sounds '엨즤큐터블'

 

- To declare(선언) in programming means to say that something exists

usually a variable, a function, or a class.

I've only declared a function, but I haven't written it yet.

 

- To implement(구현) means to write and complete something in code

for example, to implement a function or a class

I declared a function and implemented it. It works well! 

 

- To instantiate(인스턴스화) means to create an object from a class.

I instantiated another object of the Student class.

it sounds '인스탠시에이트'

 

- A loop is a piece of code that runs itself many times.

It can also be used as a verb - to loop or to iterate

I used a "for" loop to run this code for every value in the array.

I iterate throught every element in the list.

 

- He read some data values from another source over the internet.

 

- Syntax is the grammatical rules of a programming language.

Syntax determines if code is written correctly or not.

 

- find any typing mistakes if you got a syntax error.

 

 

Let's learn how to call various 'symbols'.

[ ~ ] : Tilde

[ ` ] : Backtick, Grave accent

[ ! ] : Exclamation mark

[ ? ] : Question mark

[ @ ] : At symbol

[ # ] : Number or Hash

[ $ ] : Dollar sign

[ ^ ] : Caret

[ & ] : Ampersand

[ * ] : Asterisk

[ () ] : Brackets, Parentheses

[ ( ] : Open bracket, Left bracket, Open Parenthesis, Left Parenthesis

[ ) ] : Close bracket, Right bracket, Close Parenthesis, Right Parenthesis

[ {} ] : Curly braces

[ { ] : Open curly brace, Left curly brace
[ } ] : Close curly brace, Right curly brace
[ [] ] : Square brackets
[ [ ] : Open square bracket, Left square bracket
[ ] ] : Close square bracket, Right square bracket
[ _ ] : Underscore, Horizontal bar

[ - ] : Dash, Hyphen

[ = ] : Equals

[ | ] : Vertical bar, 'Or'
[ / ] : Forward slash
[ \ ] : Back slash
[ : ] : Colon

[ ; ] : Semicolon

[ " ] : Quote, Double quote

[ ' ] : Apostrophe, Single quote

[ < ] : Less than

[ > ] : More than, Greater than

[ . ] : Dot, Period

 

- top brass : top managers in the company

The top brass from the USA want to see how we work here.

 

- hamster wheel : a serise of company meetings

I thought this would be a productive day, but we ended up with a hamster wheel of pointless meetings.

 

- seagull : a manager who asks too many questions and gives advice too often

Warning, the seagull is coming! Wonder, what he would say this time...

 

- blamestorming : when the team members try to find who is responsible for a certain problem.

Guys, let's stop this blamestorming and think how we can solve this problem!

 

- space out : to stare at your screen to pretend you are working.

I need to stop spacing out and get down to work...!

 

 

 

- Product-based companies : Companies that work on their own products and sell them to end users.

- Service-based companies : Companies that provide different types of IT services to business clients.

- IT Consulting companies : Companies that deal with the implementation of ready-made software.

 

- Outsourcing (outside-resource-using) : when a company hires another company to do a certain job

e.g. software development, software support etc.

ex ) The company outsourced web-development to us.

 

- B2C : Business to consumer - company sells directly to individual clients.

- B2B : Business to business - company provides services or products to other businesses.

 

- SME : Small and Medium-sized Enterprises

- corporation, MNC(Multinational Corporation) : a big company that operates in two or more countries.

  the opposite of start-up

- social enterprise : a business that tries to reach certain social goals apart from making profit.

  they usually cares about environment.

 

 

 

- be based in : = be located at/in

ex) The company is based in SanFrancisco.

ex) The company is located at IQ Business Center.

 

- to specialize in : your company's field

ex) Our company specializes in Data Engineering.

 

- to develop, to deliver, to offer : to provide

ex) We offer digital consulting services.

ex) Our company delivers full-cycle software development services.

 

- target

ex) Teenagers are the target audience for our app!

ex) Our app targets college-aged adults.

 

- subsidiary, daughter company : a company that is owned or controlled by another larger company.

ex) After a merger in 2019, our company became a subsidiary of EDB group.

 

- SDLC : Software Development Life Cycle which is a process of software creation

it could consist of several stages like 'Planning', 'Designing', 'Development', 'Testing', 'Deployment', 'Maintenance', or etc

 

- Phase 1. Requirements collection (Planning)

  - Business requirements are gathered and documented.

  - Major stakeholders give their input (stakeholder : people or groups who have an interest in or are affected by a decision, project, or organization.)

  - Project scope is outlined, budget, resources, deadlines, and potential risks

  and quality assurance requirements are defined.

  (Project scope : all aspects of a project, including all activities, resources, etc

  to outline : to describe something in a general way without giving too many details)

  - These are involved : Business analyst, Subject matter expert, Major stakeholders, PM

 

- Phase 2. Design

  - Software development requirements are translated into design.

  - The entire system and its elements need to be designed (including high-level design and low-level design)

  (high-level design (HLD) : the system's architectural design. general picture.

  low-level design (LLD) : the design of its components; a detailed description of all components, configs, and processes)

  - This stage includes the design of user interfaces, system interfaces, network, and network requirements, DBs.

  - Operation, training, and maintenance plans are drawn up so that developers know what they need to do throughout every stage of the cycle.

  (drawn up : to prepare a draft or something)

 

- Phase 3. Development

  - Using the design document, software developers write code for all the components.

  - Program code is built per the design document specifications.

  (per : according to

  specification / technical specification (tech spec) = a document that explains what a product will do and how you will achieve these goals)

  - Every developer has to stick to the agreed blueprint.

  (to stick to something : to keep doing a particular thing and not change to anything else, to follow the specification

  blueprint : a detailed plan of how to do something)

  - Developers utilize different tools, for example compilers, debuggers, and interpreters.

  - The tasks are divided among the team members according to their area of specialization (front-end, back-end, DB administration etc)

  - it's the most time-consuming phase. (time-consuming : using or taking up a lot of time)

  - The result of this phase is a working software product.

 

 

- Phase 4. Testing

  - The goal(or objective) is to ensure the software meets requirements.

  - This is where the Qaulity Assurance(QA) team steps in to test the software.

  (steps in : to become involved, start doing something on the project)

  - All the modules of the software are brought together into a special testing environment and tested for errors and interoperability.

  (bring together : assemble, collect, compile,

  interoperability : an ability of one system or application to interact with another system or application)

  - Software developers fix any bugs that come up during this stage. Then QA specialists test the software or its components again.

  - All the defects are tracked, fixed, and retested.

  - There are different kinds of testing: Functional testing, Performance testing, Unit testing, Integration testing, Regression testing etc.

  - QC(Quality Control) is a set of activities designed to evaluate the quality of a component or system.

 

 

- Phase 5. Deployment

  - The product is deployed in the production environment.

  (to deploy : to make a software system available for use)

  - If the customer wishes, UAT (User Acceptance Testing) is done before deployment.

    For UAT, a replica of the production environment is created and the customer company does the testing!

  (replica : an exact copy)

  - Once they check that the product works as expected, they give a sign off to go live.

  (go live : the point at which code moves from the test env to the prod env, therefore becomes available for end users)

  - The customer may also come up with changes or enhancements to the software behavior.

    These changes are called change requests.

  - After they are done, the product is released to the market or deployed in the company's production environment.

  (release : the distribution of the final version of an application)

 

 

- Phase 6. Maintenance (support)

  - During this, the system is assessed to ensure it doesn't become obsolete(out-of-date, old-fashioned).

  (to access : to check and decide about the quality of something,

  assessment : the process of checking and considering all the information about something; making a judgement,

  obsolete : that is not in use anymore and has to be replaced by something newer and better)

  - If any issue comes up and needs to be fixed or any enhancement needs to be done developers taken care of that.

  - This is also where changes can be made to initial software.

 

 

 

If you learn how to code in Java, you can choose from hundreds of jobs on the market.

Apps for Android OS are built on Java.

Almost all of the apps you use on your Android phone run on Java.

Arond 80% of the world's largest websites use back-end web apps built with Java(with the help of Java)

 

A framework is a collection of languages, libraries, and utilities designed to help developers build applications.

Spring is a web application framework with clear and elegant syntax.

Utility is a small program that provides an addition to the capabilities provided by the OS

Syntax is rules that define the structure of a language.

 

On one hand Django ensures rapid development, fast processing, and scalability,

whereas on the other hand it has monolithic, nature, and is not suitable for smaller projects.

Monolithic is composed all in one piece.

Scalability is the property of a system to handle a growing amount of work by adding resources to the system.

Processing is manipulation of data by a computer. e.g. conversion of raw data into machine-readable form.

 

APIs are applications that help you connect to different tools.

Those tools make up your extended tech stack.

API stands for  Application Programming Interface.

 

This category includes servers, content distribution networks, routing and caching services that let your applications send and receive requests, run smoothly, and scale capacity as needed.

Routing is process of selecting a path for traffic in a network or across multiple networks.

Reuqest-Response is one of the basic methods, that computers use to communicate with each other in a network. The first computer sends a request for some data and the second responds to the request.

 

This layer of the stack consists of relational and non-relational databases, data warehouses, and data pipelines that allow you to store and query all of your real-time and historical data.

Query is a request for data from a database table or combination of tables. This data may be generated as results returned by SQL.

 

BI tools bring together data gathered from multiple parts of the company and the market, and are designed to help track company performance and make higher-level business decisions.

Track is to record the progress or development of something over a period.

 

Full-stack developer can work well with the variety of languages as well as frameworks and can quickly learn something new.

Full-stack developers usually have skills in a lot of different niches, from databases etc.

 

Maintainability : It should be stable when the changes are made. It's easy to maintain the code and add amendments.

Compatibility : the software is compatible with several components.

Reliability : it's defined as the capability of the software to perform under specific conditions for a specified duration.

 

 

 

 

 

 

 

 

 

 

If you move your mouse over the picture, you can see the hint!

 

 

Pete you cannot work like this! You need a dedicated work space (wft setup)

 

If you don't have a stable Internet connection, it may take long to load some pages.

 

Internet outage : no internet connection, Internet is down

  ex) I had an internet outage during a meeting yesterday

Power outage : no electricity

 

BYOD : bring your own device

COBO : company owned, business only

COPE : company owned, personally enabled

 

My employer provided me with a laptop and all the software I need was pre-loaded.

 

distributed team : members of a team work from different locations.

  ex) Our company is headquartered in San Francisco with a distributed team across 5 countries.

hybrid team : some members of are fully remote, others may come to the office.

all-remote company : A company that doesn't have offices at all and all employees work from home.

 

to work flextime : Flexible, you can change the time you start and finish work.

to maintain regular hours : to start and finish at the same time.

  ex) I prefer to maintain regular hours of work, othewise its very hard to get things done!

 

You have to track time you spent working in order to be paid overtime.

I need to log hours at the end of every week.

 

Conference call : a telephone call in which people in different places can ALL talk to each other.

  (=to be in a call, to have a call)

 

Sorry, I'm in a meeting.

I was having a meeting so I missed a delivery man.

We are having a conference call with client where we'll discuss possible solutions to this issue.

 

back-to-back meetings : meetings without a break

  ex) I finished my first meeting at 10am and the second one started 10am without a break.\

  ex) I had five back-to-back meetings today, I am exhausted.

 

to reschedule

to move a meeting up : to start earlier

to move a meeting back : to start later

  ex) The meeting was rescheduled for Thursday.

  ex) Sorry I have my English class at 9am, Let's move our meeting back an hour.

 

I will stop sharing my screen now, and we can go into detail after the brak.

 

Could I jump in for a second? (means interruptions)

Let me clarify, what is the deadline for this task? (clarify = explain, elaborate. talk in more detail about something)

 

Tina, you are on mute. please unmute yourself and repeat what you were saying.

I can hear some background noises. If you are not speaking, please put yourself on mute!

I think there is a delay, that's why Peter answers late.

I got kicked out of the meeting. (I got disconnected)

Sorry, I just jump to another meeting.

Could you speak up?

Speak closer to the mice, you are too quite.

 

When I just joined the company, I was constantly overworking. This soon led to burnout.

 

Agenda : list of objectives, topics to discuss in a meeting.

  ex) When you create an online meeting, please always put the agenda in the invitation as well.

  ex) We have a number of important matters on the agenda.

 

 

Apologies : announcing that some people are absent

  usually those people ask beforehand to give their apologies at a meeting that they cannot attent.

  apologies for absence.

  ex) Hi Jane, I won't be able to join the weekly team meeting today as  I have a client meeting. plz give my apologies.

  ex) I have received apologies for the absence of Peter. he is on a sick leave.

 

Chairperson/ chair : the person who leads a meeting

  ex) As chair, I want to take a moment to thank everybody for participating and sharing your thoughts and ideas.

 

Minutes : a written documentation/record of what was said at a meeting. can be detailed or just in the form of bullet points.

  ex) First of all, Let's quickly review the meeting minutes from last week and see if have any open issues.

  ex) Let's go over the minutes from our last meeting.

 

Designate : assign, ask someone to do something

  ex) Does anyboy volunteer to take the minutes or shall I designate someone?

 

Formality : a procedure that has to be followed due to a rule

  ex) I will schedule a weekly meeting and take care of all the formalities, so that the team can concentrate on their work.

 

Objectives : goals to accomplish topics to discuss at the meeting (usually as points in agenda)

   ex) I'm happy that we covered all the objectives today within the designated time.

 

Show of hands : raised hands to express an opinion in a vote

  ex) Let's decide if we need a short break with the show of hands. Please raise your hand who is for or againt it.

 

When a participant gets the invitation, they respond to it.

Accept : "Yes, I will attend!"

Tentatively accept : "I don't know yet"

Decline : "No, I won't attend"

 

When a participant receives an invitation, they can also forward the invitation to a colleague.

Forward : to send the original invitation to someone else.

 

In an online meeting there sometimes can be a  so-called 'lobby'

When a person joins a meeting, they first get into the lobby.

it means, that  they wait to be let into the actual meeting room.

The organizer of the meeting needs to admit participants to let them into the meeting room.

 

Before we move on, I think we need to look at how we can ensure that it will not happend during the next sprint.

Let's move on to the status of our ongoing projects.

 

Jane, would you like to kick off?(=start) wolud you like to introduce the first item on the today's agenda?

 

I'd like to hand over to Tomas(who is gonna tell....)

  'hand over' means that you ask the person to speak about something, to introduce a topic, to give an opinion on something.

Okay, Martin. over to you. 

  'over to you' means you give them control of the discussion.

 

Tomas could you please comment on it? (You want Tomas to say or to add something on the topic that you are discussing)

 

In summary, we're going to do the following. We've decided on this following.

This is what we've agreed on. We will meet in a week and synchronize on the progress.

 

The meeting is adjourned. Thank you all for attending.

I guess that's all for today. Thanks for coming!

That's it for today. have a nice rest of the day, every one!

 

Some people think that rapid development of AI is dangerous.

In ML, computer systems utilize complex data to recognize patterns and make appropriate decisions.

 

Most IoT devices are Wi-Fi enabled, but bluetooth can also be used to transfer data to nearby devices.

 

Big data enables you to gather data from social media, web visits, call logs, and other sources to improve customer experience.

Big data can be stored in the cloud, on premises, or both.

Traditional data is measured in megabytes, gigabytes and terabytes, but big data is stored in petabytes and zettabytes.

Big data is used in different industries to identify patterns and trends, answer questions, gain insights into customers' preferences, an tackle problems.

 

 

A kafka topic is identified by its name

and a kafka topic supports any kind of message format

The sequence of messages is called a data stream.

Topics are split in partitions.

Each message gets an incremental id, called offset.

Kafka topics are immutable. Once data is written to a partition, it cannot be changed.

Data is kept only for a limited time : 유지된다

Order is guaranteed only within a partition (not across partitions) : 해당 파티션 내에서만 순서가 보장되고, 다른 파티션들 간의 순서는 보장되지 않는다는 의미의 across

 

Each consumer within a group reads data from exclusive partitions. 각각의 파티션에서 데이터를 읽는다는 의미의 exclusive

 

Each brokers is  identified with its ID. (each 는 주어를 단수로 만듦)

In these examples, we choose to number brokers starting at 100

 

Over time, the kafka clients and CLI have been migrated to leverage the brokers as a connection endpoint instead of Zookeeper.

'English' 카테고리의 다른 글

[IT] 개발 영어 공부 - 빅데이터를 지탱하는 기술 1  (2) 2025.06.12
[Duo] section 01 ~ 43  (1) 2025.05.20
Study English 24.07.03-05  (0) 2024.07.06
Study English 24.06.29-07.02  (0) 2024.07.02
Study English 24.06.28  (0) 2024.06.29

+ Recent posts