Apache Software Foundation,
I’m often asked the question, “What’s next for open source technology?”
My typical response is variations of “I don’t know” to “the
the past year, we’ve seen open source technology make strong inroads
into the mainstream of enterprise technology. Who would have thought
that my work on Hadoop ten years ago would impact
so many industries – from manufacturing to telecom to finance. They
have all taken hold of the powers of the open source ecosystem not only
to improve the customer experience, become more innovative and grow the
bottom line, but also to support work toward
the greater good of society through genomic research, precision
medicine and programs to stop human trafficking, as just a few
I’ve listed five tips for folks who are curious about how to begin
working with open source and what to expect from the ever-changing
change: this is the first lesson anyone who is new to open source
technology needs to learn and one of open source’s biggest
differentiators from traditional software. The nature of
open source is fluid and flexible with new projects regularly being
invented for specific use cases. This dynamic cycle propels products to
get better faster. So, in order for companies to reap the full benefits
of open source, they must be open to this change.
The Spark vs. MapReduce debate is a perfect illustration of why this is
true that folks are building fewer new applications based on MapReduce
and instead are using Spark as their default data-processing engine.
MapReduce is gradually being replaced as the underlying
engine in tools like Hive and Pig, but that doesn’t make MapReduce
obsolete. It will continue to work well for existing applications for
many years, and, for certain large-scale batch loads, may remain the
superior tool. This trend follows the natural evolution
of open source technology: MapReduce was the 1.0 engine for the
open-source data ecosystem, Spark is its 2.0 engine, and someday there
will be a 3.0 that will make Spark the legacy engine.
than architecting and deploying point solutions, we now have
general-purpose data platforms with many tools that can be combined
flexibly for search, streaming, machine learning and more.
Together these aspects require not just a different set of skills but a
cultural shift around management style and organizational structure.
For this reason, it’s important to gain high-level support within an
organization and introduce data management as
an important boardroom-level discussion. I’d also recommend gradually
building a new culture around a few new applications rather than
replacing everything all at once to help everyone acclimate and starting
with one specific use case.
more enterprise organizations and industries embrace the cloud, they
should consider open-source software that’s not only becoming more
scalable and secure, but which can also help them avoid cloud vendor
lock-in. By building on an open-source platform, organizations can
employ cloud-vendor arbitrage to keep costs down, use different clouds
in different regions, or use a combination of cloud-based
and on-premises systems. In fact, open-source platforms have also
proven technically superior and will likely gain more ground in 2017.
It’s difficult for a single vendor to compete against a large number of
institutions collaborating in open source. In addition,
open-source data systems now lead in performance and flexibility, and
they’re improving more rapidly.
hunters in the fields of IT, programming and data science shouldn’t
fixate on mastering individual technologies, but focus instead on
understanding the best use of each of the components
of the open source data ecosystem and how they can be connected to
solve problems. This high-level architectural understanding is the most
valuable skill to companies innovating in technology. Because as new
technologies arrive, it’s crucial to understand
how they fit in, what they might replace and what they might enable.
skills gap in big data will remain relatively constant in the next
year, but this shouldn’t deter people from adopting Hadoop and other
open-source technologies. As most of us know, when
new technologies are created and vie for users, they are known by few.
Only once a particular type of software is a mature standard part of the
canon do we begin to have a substantial number of folks skilled in its
use — but even then the skills gap can persist.
It will disappear only when we stop seeing big improvements to the
stack, which I doubt we want. In short, the skills gap is one of the
primary factors gating the rate of platform change, but it’s also a sign
innovation is at hand.
open source ecosystem and its implementation in meaningful projects
will continue to expand over the coming years. As an impetus for
collaboration, it brings together today’s brightest minds
to move software development forward at a pace not possible ten years
ago. If you have an idea for improving existing technologies or want to
rally behind a notion for breaking the status quo, this is the place. I
encourage everyone interested to get involved
and for those open source veterans to keep committing to the cause.
here more information on joining the ASF community.
For the LATEST tech updates,
FOLLOW us on our Twitter
LIKE us on our FaceBook
SUBSCRIBE to us on our YouTube Channel!