Tim Smith: Big Data

البيانات الضخمة هي مفهومٌ مُحيّر إنّه يُمثّل كمية المعلومات الرقميّة، والذي يكون غير ملائمٍ للتخزين، أو النقل، أو التحليل. البيانات الضخمة هائلةٌ للغاية بحيث أنها سيطرت على تقنيّات اليوم، وتتحدانا لننشأ الجيل المقبل من أدوات وتقنيّات تخزين البيانات. لذا، فالبيانات الضخمة ليست جديدة. في الحقيقة، كان الفيزيائيون في المختبر اﻷوروبي لفيزياء الجزيئات على جدالٍ بتحدي بلوغ البيانات الضخمة الآخذة في التوسع لعقود قبل 50 سنة، كان من الممكن تخزين بيانات المختبر اﻷوروبي لفيزياء الجزيئات في حاسوبٍ واحد حسنًا، إذاً إنّه لم يكن حاسوبك العادي، كان هذا حاسوبًا ضخمًا ملئ مبنىً كاملًا لتحليل البيانات، سافر فيزيائيّون من أرجاء العالم إلى المختبر اﻷوروبي لفيزياء الجزيئات للاتصال بالآلة الهائلة في السبعينات، كانت بياناتنا الضخمة النامية موزّعة عبر مجموعاتٍ مختلفةٍ من الحواسيب، المنتشرة في المختبر اﻷوروبي لفيزياء الجزيئات كُلُّ مجموعةٍ منضمّةٌ معًا في شبكاتٍ محلِّيةٍ مخصّصة لكنّ الفيزيائيين تعاونوا بغضّ النظر عن الحدود بين المجموعات، لذا، كان من اللازم الوصول للبيانات لكلِّ هؤلاء لذا، وصلنا الشبكات المُستقلّة معًا في المختبر اﻷوروبي لفيزياء الجزيئات الخاص بنا في الثمانينات، برزت جُزرٌ من الشبكات المشابهة تتخاطب بلهجاتٍ مختلفة في جميع أنحاء أوروبا والولايات المتحدة ، جاعلةً الوصول عن البعد ممكنًا لكن شاقًّا لتسهيل وصول علمائنا الفيزيائيين حول العالم إلى البيانات الضخمة الآخذة في التوسع المخزّنة في المختبر اﻷوروبي لفيزياء الجزيئات دون السفر، كان من اللازم أن تتحدث الشبكات بنفس اللغة تبنينا معيار عمل الإنترنت الناشئ من الولايات، تلتها بقيّة أوروبا، وأنشأنا الرابط الرئيسي في المختبر اﻷوروبي لفيزياء الجزيئات بين أوروبا والولايات المتحدة في 1989، وانطلق الإنترنت العالمي! كان بإمكان الفيزيائيين بعدها الوصول إلى تيرا بايتاتٍ من البيانات الضخمة بسهولةٍ عن بعد من حول العالم، وتوليد نتائج، وكتابة الأبحاث في معاهدهم المحليّة ثمّ أرادوا مشاركة اكتشافاتهم مع جميع زملائهم لتسهيل مشاركة هذه المعلومات، أنشأنا الويب أوائل التسعينات لم يعد الفيزيائيون بحاجةٍ إلى معرفة أين خُزِّنت المعلومات بغية إيجادها والوصول إليها على الويب، فكرةٌ قد انتشرت حول العالم وحوّلت طريقتنا في التواصل في حياتنا اليوميّة خلال أوائل الألفين، النمو المستمرُّ لبياتنا الضخمة فاق قدرتنا على تحليلها في المختبر اﻷوروبي لفيزياء الجزيئات، بالرغم من وجود مبانٍ مليئةٍ بالحواسيب، كنّا مضطرين إلى بدء توزيع البيتا بايتات من البيانات إلى شركائنا المساهمين بغية توظيف الحوسبة والتخزين المحليّ في المئات من المعاهد المختلفة من أجل تنظيم هذه الموارد المترابطة مع تقنيّاتها المنتوّعة، طوّرنا شبكةً حوسبيّة، مكّنت المشاركة السلسلة لموارد الحوسبة في جميع أنحاء العالم يعتمد هذا على علاقات الثقة والتبادل المشترك لكنّ لم يكن من الممكن نقل نموذج هذه الشبكة من خارج مجتمعنا بهذه السهولة، حيث لم يكن لدى الجميع الموارد للمشاركة ولا كان من الممكن أن تتوقع لدى الشركة مستوى الثقة نفسه بدلًا من ذلك، ازدهر مؤخراً نهجٌ أكثر شبهاً بالأعمال للوصول إلى الموارد عند الطلب، يدعى: الحوسبة السحابية، والذي تستغلّه المجتمعات الأخرى الآن لتحليل بياناتها الضخمة قد تبدو مفارقةً لمكانٍ مثل المختبر اﻷوروبي لفيزياء الجزيئات، مختبرٌ يركّز على دراسة اللبنات الصغيرة غير القابلة للتصوّرللمادّة، أن يكون مصدر شيءٍ كبيرٍ كالبيانات الضخمة لكنّ الطريقة التي ندرس بها الجزيئات الأساسيّة، والقوى التي تتفاعلها كذلك، والتي تتضمّن انشائها في برهةٍ خاطفةـ وتصادم البروتونات في مسرّعاتنا والتقاط أثرٍ لها بينما تغادر بسرعةٍ قريبةٍ لسرعة الضوء لترى تلك الأثار، يعمل كاشفنا ذي الـ 150 مليون حساس كآلة تصويرٍ ضخمةٍ ثلاثيّة الأبعاد، ملتقطةً صورةً لكُلِّ حدث تصادم، والذي يبلغ حتى 14 مليون مرّة في الثانية ذلك يصنع الكثير من البيانات، لكن إذا كانت البيانات الضخمة موجودةً منذ فترةٍ طويلة، لماذا الآن نسمع عنها فجأة؟ حسنًا، كما يوضّح التعبير المجازيّ القديم، الكل أكبر من مجموع أجزاءه، وهذا لم يعد سوى علمٍ يستغلُّ هذا الحقيقة التي يمكن أن نجني المزيد من المعرفة بربط المعلومات ذات الصلة معًا واكتشاف علاقات ترابط يمكنها اعلام واثراء نواحٍ عديدة من الحياة اليوميّة، إما في الوقت الحقيقي، كحركة المرور أو الظروف الماليّة، وفي التطورات قصيرة الأمد، كالخدمات الطبيّة والأرصاد الجويّة، أو في الحالات التنبؤيّة، كتوجهات الأعمال، أوالجريمة، أو الأمراض. يتحوّل كلُّ مجالٍ افتراضيًّا إلى جمع بياناتٍ ضخمة، مع شبكات الاستشعار النقّالة في جميع أنحاء العالم، وآلات التصوير على الأرض وفي الجو، ومحفوظات تخزين المعلومات المنشورة على الويب، والمسجّلات التي تستولي على نشاطات الإنترنت للمواطنين في جميع أنحاء العالم التحدي هو ابتكار أدواتٍ وتقنيّاتٍ جديدة لاستخراج هذه المخازن الواسعة، وإعلام صانعي القرار، وتحسين التشخيص الطبيّ، وغير ذلك من تلبية الحاجات والرغبات لمجتمع الغد بوسائل غير قابلةٍ للتصوّر اليوم

Big data is an elusive concept. It represents an amount of digital information, which is uncomfortable to store, transport, or analyze. Big data is so voluminous that it overwhelms the technologies of the day and challenges us to create the next generation of data storage tools and techniques. So, big data isn't new. In fact, physicists at CERN have been rangling with the challenge of their ever-expanding big data for decades. Fifty years ago, CERN's data could be stored in a single computer. OK, so it wasn't your usual computer, this was a mainframe computer that filled an entire building. To analyze the data, physicists from around the world traveled to CERN to connect to the enormous machine. In the 1970's, our ever-growing big data was distributed across different sets of computers, which mushroomed at CERN. Each set was joined together in dedicated, homegrown networks. But physicists collaborated without regard for the boundaries between sets, hence needed to access data on all of these. So, we bridged the independent networks together in our own CERNET. In the 1980's, islands of similar networks speaking different dialects sprung up all over Europe and the States, making remote access possible but torturous. To make it easy for our physicists across the world to access the ever-expanding big data stored at CERN without traveling, the networks needed to be talking with the same language. We adopted the fledgling internet working standard from the States, followed by the rest of Europe, and we established the principal link at CERN between Europe and the States in 1989, and the truly global internet took off! Physicists could easily then access the terabytes of big data remotely from around the world, generate results, and write papers in their home institutes. Then, they wanted to share their findings with all their colleagues. To make this information sharing easy, we created the web in the early 1990's. Physicists no longer needed to know where the information was stored in order to find it and access it on the web, an idea which caught on across the world and has transformed the way we communicate in our daily lives. During the early 2000's, the continued growth of our big data outstripped our capability to analyze it at CERN, despite having buildings full of computers. We had to start distributing the petabytes of data to our collaborating partners in order to employ local computing and storage at hundreds of different institutes. In order to orchestrate these interconnected resources with their diverse technologies, we developed a computing grid, enabling the seamless sharing of computing resources around the globe. This relies on trust relationships and mutual exchange. But this grid model could not be transferred out of our community so easily, where not everyone has resources to share nor could companies be expected to have the same level of trust. Instead, an alternative, more business-like approach for accessing on-demand resources has been flourishing recently, called cloud computing, which other communities are now exploiting to analyzing their big data. It might seem paradoxical for a place like CERN, a lab focused on the study of the unimaginably small building blocks of matter, to be the source of something as big as big data. But the way we study the fundamental particles, as well as the forces by which they interact, involves creating them fleetingly, colliding protons in our accelerators and capturing a trace of them as they zoom off near light speed. To see those traces, our detector, with 150 million sensors, acts like a really massive 3-D camera, taking a picture of each collision event - that's up to 14 millions times per second. That makes a lot of data. But if big data has been around for so long, why do we suddenly keep hearing about it now? Well, as the old metaphor explains, the whole is greater than the sum of its parts, and this is no longer just science that is exploiting this. The fact that we can derive more knowledge by joining related information together and spotting correlations can inform and enrich numerous aspects of everyday life, either in real time, such as traffic or financial conditions, in short-term evolutions, such as medical or meteorological, or in predictive situations, such as business, crime, or disease trends. Virtually every field is turning to gathering big data, with mobile sensor networks spanning the globe, cameras on the ground and in the air, archives storing information published on the web, and loggers capturing the activities of Internet citizens the world over. The challenge is on to invent new tools and techniques to mine these vast stores, to inform decision making, to improve medical diagnosis, and otherwise to answer needs and desires of tomorrow's society in ways that are unimagined today.

Tim Smith: Big Data

Tim Smith: Big Data

Related talks

Sajan Saini: The hidden network that makes the internet possible

Mark Liddell: How statistics can be misleading

George Zaidan: Why is ketchup so hard to pour?

Related talks

Sajan Saini: The hidden network that makes the internet possible

Mark Liddell: How statistics can be misleading

George Zaidan: Why is ketchup so hard to pour?