All engineers make decisions in the face of uncertainty. A factor of safety or equivalent is not necessary if information and knowledge relevant to design are known with exact precision and full completeness. You can manage risks cleverly with experience alone. For example, an engineer could select an allowable bearing pressure based on the soil type alone. This is design by prescriptive measures. Today we seek to understand a site better through ground investigation and predicting a site-specific response (say bearing pressure) using a physical model, but you need more data to use this approach. In particular, you need some site-specific data. You also need the right kind of data, which is chained to the input side of the model. This is design by calculation. Site-specific data will be indispensable for design verification by load tests or observations. One may conclude that engineers are clever in managing risks in the presence of imperfect knowledge and the limited information they have at hand. There’s no doubt we’ve been successful because failures are rare.
Nonetheless, it’s intriguing to ponder if these strategies were tailored to work in a data-poor environment. Some believe that digital technologies will produce big data of the order of zettabytes (i.e., 1021 bytes, or roughly the number of sand grains on all the beaches on the planet) by 2020. If this is true, it could be timely to reflect on whether existing strategies that are effective in the presence of limited data can impede the digitalization of geotechnical engineering (let’s call this Geo 4.0) when we find ourselves immersed literally overnight in an overwhelmingly data-rich environment. In an era where data is recognized as the “new oil” in many industries, I submit we need to be even more creative in seeking more insights from data to make better decisions.
In this lecture, I will review available soil/rock and load test databases and argue that we are data-rich if we compile generic data from multiple sites. A generic soil database is one type of Big Indirect Data (BID), which refers to any data that are potentially useful, but not directly applicable to the decision at hand. The decision at hand that matters most to an engineer is often related to one site. Understanding the ground in one locale is a distinct feature of geotechnical practice. At present, we are data-poor as far as characterization of a single site is concerned. It is clear that site-specific data are more challenging to deal with than simply “not enough” and “uncertain”. We now understand that there are at least seven rather than two attributes that define our data. A convenient acronym to describe our data attributes is “MUSIC-X”: Multivariate, Uncertain and Unique, Sparse, Incomplete, and potentially Corrupted with “X” denoting the spatial/temporal dimension.
One important question is how to enhance sparse site-specific data by BID. Every sensible geotechnical engineer is already doing this by referring to data from sites with comparable geology, but remember we have zettabytes coming our way. A sole reliance on engineering judgment will restrict the scope to sites that the engineer has worked in over his/her practicing lifetime. In addition, experience cannot be readily transferred. The Geo 4.0 approach will be to develop a clever data-driven algorithm to shortlist “similar” sites from BID for the engineer to refine further based on his/her experience and for the algorithm to “learn” and thus become even more discriminating in the future. It may be possible to digitize experience as well. This is called the “site challenge” in the literature. We are only beginning to learn how to address this site challenge under the realistic MUSIC-X setting for the first time. Classical statistics cannot accommodate MUSIC-X data and has no learning capacity. This gap between theory and practice has impeded adoption of probabilistic methods in the past. It is important to realise that there are many more powerful algorithms beyond classical statistics. The role of an engineer is not to code these algorithms, but to deploy them
creatively in practice to serve more complex design goals such as sustainability and resilience closer to societal needs, beyond safety, serviceability, and economy.
All engineering decisions are ultimately black and white, be it choosing the dimensions of a structure, time interval between maintenance, or issuance of an evacuation notice, notwithstanding the imperfect nature of our data, methods, and understanding of reality. There is a lot of potential in developing even more clever solutions to this age old decision making problem. I venture to suggest “Seven Es” to guide the development of such algorithms that will be of value to practice, promotes data exchange, robust, maintains alignment with current knowledge and experience, and engages engineering judgment in a meaningful way: (1) Essence: Data is the essence and therefore, algorithms must be data-centric, (2) Economic value: Focus on monetizing data, (3) Exchange: The industry is more likely to share and exchange data if client confidentiality can be guaranteed, (4) Extremes: Identification of outliers and/or robustness of algorithms against outliers are fundamental issues when it is no longer practical for engineers to pre-process BID, (5) Errors: An engineer can make a more informed decision if both bias and precision of the outcomes can be provided. It is not sufficient to provide the most likely or average outcomes, because an engineer needs to manage risks, (6) Extrapolation: Need to watch out for over-fitting and to caution users when extrapolation occurs, and (7) Explanation: It is judicious to establish a degree of connection with the existing body of knowledge and experience. Correlation is not the same causality. Engineers cannot “understand” outcomes delivered purely by a black-box algorithm and cannot meaningfully “agree” or “disagree” with such outcomes.
Geotechnical engineers have been ingenious in managing limited data. Surely, we can be even more ingenious in the coming Geo 4.0 era.